Weird seg fault problem - c++

Greetings,
I'm having a weird seg fault problem. My application dumps a core file at runtime. After digging into it I found it died in this block:
#include <lib1/c.h>
...
x::c obj;
obj.func1();
I defined class c in a library lib1:
namespace x
{
struct c
{
c();
~c();
void fun1();
vector<char *> _data;
};
}
x::c::c()
{
}
x::c::~c()
{
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
I could not figure it out for some time till I ran nm on the lib1.so file: there are more function definitions than I defined:
x::c::c()
x::c::c()
x::c::~c()
x::c::~c()
x::c::func1()
x::c::func2()
After searching in code base I found someone else defined a class with same name in same namespace, but in another library lib2 as follows:
namespace x
{
struct c
{
c();
~c();
void func2();
vector<string> strs_;
};
}
x::c::c()
{
}
x::c::~c()
{
}
My application links to lib2, which has dependency on lib1. This interesting behavior brings several questions:
Why would it even work? I would expect a "multiple definitions" error while linking against lib2 (which depends upon lib1) but never had such. The application seems to be doing what's defined in func1 except it dumps a core at runtime.
After attaching debugger, I found my application calls the ctor of class c in lib2, then calls func1 (defined in lib1). When going out of scope it calls dtor of class c in lib2, where the seg fault occurs. Can anybody teach me how this could even occur?
How can I prevent such problems from happening again? Is there any C++ syntax I can use?
Forgot to mention I'm using g++ 4.1 on RHEL4, thank you very much!

1.
Violations of the "one definition rule" don't have to be diagnosed by your compiler. In fact, they are often only going to be known at link time when you link multiple object files together.
At link time, the information about the original class definitions may not exist any more (they are not needed after the compiler step) so having multiple definitions of a class is typically not easy to flag to the user.
2.
Once you have two distinct definitions pretty much anything can happen, you are in the territory of undefined behaviour. Whatever happens, it's a possible outcome.
3.
The most sensible thing to do is to communicate with the other members of your team. Agree who's going to use which namespaces and you won't get these problems. Otherwise, you point a documentation tool or static analysis tool over your entire project. Many such tools will be able to diagnose multiple inconsistent definitions of classes.

Just a guess but I don't see any using namespace x; so perhaps it used one namespace instead of the other?

With the advent of templates it became necessary to allow multiple definitions of a body of code with the same name; there was no way for the compiler to know if the same template code had already been generated in another compilation unit i.e. source file. When the linker finds these duplicates, it assumes they are identical. The burden is on you to make sure that they are - this is called the One Definition Rule.

On the linker level this is library interpositioning. The effective symbol bound unfortunately depends on the order of object files on linker command line (this is, sigh, historical).
From what you describe it looks that lib1 comes first in linker argument list and lib2 comes second and interposes on symbols from lib1. This explains the calls to constructors and destructors from the lib2 but calls to func1 from lib1 (since there's no func1-derived symbol in lib2, so there's no "hiding", the call is bound to lib1.)
The solution to this particular problem is to reverse the order of libraries on the linker invocation command.

There's lots of answers about the one definition rule. However, to me, this looks a lot more like a missing copy constructor.
To elaborate:
If the copy constructor is called on your object, then you will get a memory leak. This is because delete will be called on the same set of pointers twice.
namespace x
{
struct c
{
c() {
}
~c() {
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
c(const c & rhs) {
for (int i=0; i< rhs.size(); ++i) {
int len = strlen(rhs[i]);
char *mem = malloc(len + 1);
strncpy(mem, rhs[i], len + 1);
_data.push_back(mem);
}
void fun1();
vector<char *> _data;
};
}

Related

For a function that takes a const struct, does the compiler not optimize the function body?

I have the following piece of code:
#include <stdio.h>
typedef struct {
bool some_var;
} model_t;
const model_t model = {
true
};
void bla(const model_t *m) {
if (m->some_var) {
printf("Some var is true!\n");
}
else {
printf("Some var is false!\n");
}
}
int main() {
bla(&model);
}
I'd imagine that the compiler has all the information required to eliminate the else clause in the bla() function. The only code path that calls the function comes from main, and it takes in const model_t, so it should be able to figure out that that code path is not being used. However:
With GCC 12.2 we see that the second part is linked in.
If I inline the function this goes away though:
What am I missing here? And is there some way I can make the compiler do some smarter work? This happens in both C and C++ with -O3 and -Os.
The compiler does eliminate the else path in the inlined function in main. You're confusing the global function that is not called anyway and will be discarded by the linker eventually.
If you use the -fwhole-program flag to let the compiler know that no other file is going to be linked, that unused segment is discarded:
[See online]
Additionally, you use static or inline keywords to achieve something similar.
The compiler cannot optimize the else path away as the object file might be linked against any other code. This would be different if the function would be static or you use whole program optimization.
The only code path that calls the function comes from main
GCC can't know that unless you tell it so with -fwhole-program or maybe -flto (link-time optimization). Otherwise it has to assume that some static constructor in another compilation unit could call it. (Including possibly in a shared library, but another .cpp that you link with could do it.) e.g.
// another .cpp
typedef struct { bool some_var; } model_t;
void bla(const model_t *m); // declare the things from the other .cpp
int foo() {
model_t model = {false};
bla(&model);
return 1;
}
int some_global = foo(); // C++ only: non-constant static initializer.
Example on Godbolt with these lines in the same compilation unit as main, showing that it outputs both Some var is false! and then Some var is true!, without having changed the code for main.
ISO C doesn't have easy ways to get init code executed, but GNU C (and GCC specifically) have ways to get code run at startup, not called by main. This works even for shared libraries.
With -fwhole-program, the appropriate optimization would be simply not emitting a definition for it at all, as it's already inlined into the call-site in main. Like with inline (In C++, a promise that any other caller in another compilation unit can see its own definition of the function) or static (private to this compilation unit).
Inside main, it has optimized away the branch after constant propagation. If you ran the program, no branch would actually execute; nothing calls the stand-alone definition of the function.
The stand-alone definition of the function doesn't know that the only possible value for m is &model. If you put that inside the function, then it could optimize like you're expecting.
Only -fPIC would force the compiler to consider the possibility of symbol-interposition so the definition of const model_t model isn't the one that is in effect after (dynamic) linking. But you're compiling code for an executable not a library. (You can disable symbol-interposition for a global variable by giving it "hidden" visibility, __attribute__((visibility("hidden"))), or use -fvisibility=hidden to make that the default).

different behavior when linking with static library vs using object files in C++

I'm working with some legacy C++ code that is behaving in a way I don't understand. I'm using the Microsoft compiler but I've tried it with g++ (on Linux) as well—same behavior.
I have 4 files listed below. In essence, it's a registry that's keeping track of a list of members. If I compile all files and link the object files into one program, it shows the correct behavior: registry.memberRegistered is true:
>cl shell.cpp registry.cpp member.cpp
>shell.exe
1
So somehow the code in member.cpp gets executed (which I don't really understand, but OK).
However, what I want is to build a static library from registry.cpp and member.cpp, and link that against the executable built from shell.cpp. But when I do this, the code in member.cpp does not get executed and registry.memberRegistered is false:
>cl registry.cpp member.cpp /c
>lib registry.obj member.obj -OUT:registry.lib
>cl shell.cpp registry.lib
>shell.exe
0
My questions: how come it works the first way and not the second and is there a way (e.g. compiler/linker options) to make it work with the second way?
registry.h:
class Registry {
public:
static Registry& get_registry();
bool memberRegistered;
private:
Registry() {
memberRegistered = false;
}
};
registry.cpp:
#include "registry.h"
Registry& Registry::get_registry() {
static Registry registry;
return registry;
}
member.cpp:
#include "registry.h"
int dummy() {
Registry::get_registry().memberRegistered = true;
return 0;
}
int x = dummy();
shell.cpp:
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
void main() {
shell *cf = new shell;
cf->init();
}
You have been hit by what is popularly known as static initialization order fiasco.
The basics is that the order of initialization of static objects across translation units is unspecified. See this
The call here Registry::get_registry().memberRegistered; in "shell.cpp" may happen before the call here int x = dummy(); in "member.cpp"
EDIT:
Well, x isn't ODR-used. Therefore, the compiler is permitted not to evaluate int x = dummy(); before or after entering main(), or even at all.
Just a quote about it from CppReference (emphasis mine)
It is implementation-defined whether dynamic initialization
happens-before the first statement of the main function (for statics)
or the initial function of the thread (for thread-locals), or deferred
to happen after.
If the initialization is deferred to happen after the first statement
of main/thread function, it happens before the first odr-use of any
variable with static/thread storage duration defined in the same
translation unit as the variable to be initialized. If no variable or function is odr-used from a given translation unit, the non-local variables defined in that translation unit may never be initialized (this models the behavior of an on-demand dynamic library)...
The only way to get your program working as you want is to make sure x is ODR-used
shell.cpp
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
extern int x; //or extern int dummy();
int main() {
shell *cf = new shell;
cf->init();
int k = x; //or dummy();
}
^ Now, your program should work as expected. :-)
This is a result of the way linkers treat libraries: they pick and choose the objects that define symbols left undefined by other objects processed so far. This helps keep executable sizes smaller, but when a static initialization has side effects, it leads to the fishy behavior you've discovered: member.obj / member.o doesn't get linked in to the program at all, although its very existence would do something.
Using g++, you can use:
g++ shell.cpp -Wl,-whole-archive registry.a -Wl,-no-whole-archive -o shell
to force the linker to put all of your library in the program. There may be a similar option for MSVC.
Thanks a lot for all the replies. Very helpful.
So both the solution proposed WhiZTiM (making x ODR-used) and aschepler (forcing linker to include the whole library) work for me. The latter has my preference since it doesn't require any changes to the code. However, there seems to be no MSVC equivalent for --whole-archive.
In Visual Studio I managed to solve the problem as follows (I have a project for the registry static library, and one for the shell executable):
In the shell project add a reference to the registry project;
In the linker properties of the shell project under General set
"Link Library Dependencies" and "Use Library Dependent Inputs" to
"Yes".
If these options are set registry.memberRegistered is properly initialized. However, after studying the compiler/linker commands I concluded that setting these options results in VS simply passing the registry.obj and member.obj files to the linker, i.e.:
>cl /c member.cpp registry.cpp shell.cpp
>lib /OUT:registry.lib member.obj registry.obj
>link /OUT:shell.exe "registry.lib" shell.obj member.obj registry.obj
>shell.exe
1
To my mind, this is essentially the first approach to my original question. If you leave out registry.lib in the linker command it works fine as well.
Anyway, for now, it's good enough for me.
I'm working with CMake so now I need to figure out how to adjust CMake settings to make sure that the object files get passed to the linker? Any thoughts?

Is there a way to detect inline function ODR violations?

So I have this code in 2 separate translation units:
// a.cpp
#include <stdio.h>
inline int func() { return 5; }
int proxy();
int main() { printf("%d", func() + proxy()); }
// b.cpp
inline int func() { return 6; }
int proxy() { return func(); }
When compiled normally the result is 10. When compiled with -O3 (inlining on) I get 11.
I have clearly done an ODR violation for func().
It showed up when I started merging sources of different dll's into fewer dll's.
I have tried:
GCC 5.1 -Wodr (which requires -flto)
gold linker with -detect-odr-violations
setting ASAN_OPTIONS=detect_odr_violation=1 before running an instrumented binary with the address sanitizer.
Asan can supposedly catch other ODR violations (global vars with different types or something like that...)
This is a really nasty C++ issue and I am amazed there isn't reliable tooling for detecting it.
Pherhaps I have misused one of the tools I tried? Or is there a different tool for this?
EDIT:
The problem remains unnoticed even when I make the 2 implementations of func() drastically different so they don't get compiled to the same amount of instructions.
This also affects class methods defined inside the class body - they are implicitly inline.
// a.cpp
struct A { int data; A() : data(5){} };
// b.cpp
struct A { int data; A() : data(6){} };
Legacy code with lots of copy/paste + minor modifications after that is a joy.
The tools are imperfect.
I think Gold's check will only notice when the symbols have different types or different sizes, which isn't true here (both functions will compile to the same number of instructions, just using a different immediate value).
I'm not sure why -Wodr doesn't work here, but I think it only works for types, not functions, i.e. it will detect two conflicting definitions of a class type T but not your func().
I don't know anything about ASan's ODR checking.
The simplest way to detect such concerns is to copy all the functions into a single compilation unit (create one temporarily if needed). Any C++ compiler will then be able to detect and report duplicate definitions when compiling that file.

undefined reference to OOLUA::Proxy_class<T>::class_name

I am using OOLUA 2.0.0 and am receiving the error undefined reference to OOLUA::Proxy_class<TestClass>::class_name.
The code is:
class TestClass
{
int test_member;
public:
void setTestMember(int x) { test_member = x; }
int getTestMember() { return test_member; }
};
OOLUA_PROXY(TestClass)
OOLUA_MEM_FUNC(void, setTestMember, int)
OOLUA_MEM_FUNC(int, getTestMember)
OOLUA_PROXY_END
int main()
{
OOLUA::Script script;
script.register_class<TestClass>();
OOLUA::run_chunk(script, "local n = TestClass.new() \n n:setTestMember(42) \n print(\"test_member is: \" .. n:getTestMember()");
return 0;
}
The documentation here does not appear to say anything about this error. I'm not sure what class_name even is. Any help is appreciated.
By the way, I'm using GCC 4.9.2 to compile it.
So this is late but hopefully it will help if anyone else runs across a similar issue. Basically your example is missing an important but subtle part that will help explain why you got the link errors you got.
TestClass.hpp
class TestClass
{
int test_member;
public:
void setTestMember(int x) { test_member = x; }
int getTestMember() const { return test_member; }
static void aStaticMember() { }
};
OOLUA_PROXY(TestClass)
OOLUA_MEM_FUNC(void, setTestMember, int)
OOLUA_MEM_FUNC_CONST(int, getTestMember)
OOLUA_SFUNC(aStaticMember)
OOLUA_PROXY_END
TestClass.cpp
OOLUA_EXPORT_FUNCTIONS(TestClass
,setTestMember
)
OOLUA_EXPORT_FUNCTIONS_CONST(TestClass
,getTestMember
)
You must always put a OOLUA_EXPORT_FUNCTIONS block in the associated .cpp file so that the declarations from the OOLUA_PROXY block are defined. Even if you only have const member functions, you must still place an empty block e.g. OOLUA_EXPORT_FUNCTIONS(TestClass) in the .cpp file. In the event that your class only has static functions, you would be required to use a slightly different setup.
In summary:
OOLUA_PROXY declares a proxy class
OOLUA_EXPORT_FUNCTIONS blocks define the members of that class
Also, if your class has only static member functions exposed, you will need to include a OOLUA_EXPORT_NO_FUNCTIONS(TestClass) in the .cpp file. For every static member you must then use special syntax for registering the static functions with the Script.
using namespace OOLUA; //NOLINT(build/namespaces)
Script vm;
vm.register_class_static<TestClass>("aStaticMember",
&OOLUA::Proxy_class<TestClass>::aStaticMember);
The documentation is not very helpful in this matter, but if you reveiw the associated test source files, they in combination with the documentation are enough to get past most issues.
It would be better posting questions about the library to the mailing list oolua.org/mailinglist
Have a look at the documentation for the library, specifically exporting class functions[1]. Personally I would use the minimalist DSL for your functions instead of the expressive, this would then make your proxied functions like the following (however the get should really be a constant function):
OOLUA_MFUNC(setTestMember)
OOLUA_MFUNC(getTestMember)
[1] https://docs.oolua.org/_o_o_lua_proxying.html

same class, different size...?

See the code, then you would understand what I'm confused.
Test.h
class Test {
public:
#ifndef HIDE_VARIABLE
int m_Test[10];
#endif
};
Aho.h
class Test;
int GetSizeA();
Test* GetNewTestA();
Aho.cpp
//#define HIDE_VARIABLE
#include "Test.h"
#include "Aho.h"
int GetSizeA() { return sizeof(Test); }
Test* GetNewTestA() { return new Test(); }
Bho.h
class Test;
int GetSizeB();
Test* GetNewTestB();
Bho.cpp
#define HIDE_VARIABLE // important!
#include "Test.h"
#include "Bho.h"
int GetSizeB() { return sizeof(Test); }
Test* GetNewTestB() { return new Test(); }
TestPrj.cpp
#include "Aho.h"
#include "Bho.h"
#include "Test.h"
int _tmain(int argc, _TCHAR* argv[]) {
int a = GetSizeA();
int b = GetSizeB();
Test* pA = GetNewTestA();
Test* pB = GetNewTestB();
pA->m_Test[0] = 1;
pB->m_Test[0] = 1;
// output : 40 1
std::cout << a << '\t' << b << std::endl;
char temp;
std::cin >> temp;
return 0;
}
Aho.cpp does not #define HIDE_VARIABLE, so GetSizeA() returns 40, but
Bho.cpp does #define HIDE_VARIABLE, so GetSizeB() returns 1.
But, Test* pA and Test* pB both have member variable m_Test[].
If the size of class Test from Bho.cpp is 1, then pB is weird, isn't it?
I don't understand what's going on, please let me know.
Thanks, in advance.
Environment:
Microsoft Visual Studio 2005 SP1 (or SP2?)
You violated the requirements of One Definition Rule (ODR). The behavior of your program is undefined. That's the only thing that's going on here.
According to ODR, classes with external linkage have to be defined identically in all translation units.
Your code exhibits undefined behavior. You are violating the one definition rule (class Test is defined differently in two places). Therefore the compiler is allowed to do whatever it wants, including "weird" behavior.
In addition to ODR.
Most of the grief is caused by including the headers only in the cpp files, allowing you to change the definition between the compilation units.
But, Test* pA and Test* pB both have member variable m_Test[].
No, pB doesn't have m_Test[] however the TestPrj compilation unit doesn't know that and is applying the wrong structure of the class so it will compile.
Unless you compile in debug with capturing of memory overrun you would most times not see a problem.
pB->m_Test[9] = 1; would cause writing to memory not assigned by pB but may or may not be a valid space for you to write.
Like many people told here, you've violated the so-called One Definition Rule (ODR).
It's important to realize how C/C++ programs are assembled. That is, the translation units (cpp files) are compiled separately, without any connection to each other. Next linker assembles the executable according to the symbols and the code pieces generated by the compiler. It doesn't have any high-level type information, hence it's unable (and should not) to detect the problem.
So that you've actually cheated the compiler, beaten yourself, shoot your foot, whatever you like.
One point that I'd like to mention is that actually ODR rule is violated very frequently due to subtle changes in the various include header files and miscellaneous defines, but usually there's no problem and people don't even realize this.
For instance a structure may have a member of LPCTSTR type, which is a pointer to either char or wchar_t, depending on the defines, includes and etc. But this type of violation is "almost ok". As long as you don't actually use this member in differently compiled translation units there's no problem.
There're also many other common examples. Some arise from the in-class implemented member functions (inlined), which actually compile into different code within different translation units (due to different compiler options for different translation units for instance).
However this is usually ok. In your case however the memory layout of the structure has changed. And here we have a real problem.