external linked variable initialized multiple times - c++

I have a little synthetic example that have behaviour I want to change, but don't quite know how.
What I have is this:
Common header statich.h that have external declaration of some variable:
#include <iostream>
struct S {
S() : x(42) {
std::cout << "S(), this=" << this << std::endl;
}
~S() {
std::cout << "~S(), this=" << this << std::endl;
}
int x;
};
extern S nakedS;
Static library libstatic.a compiled from source file statich.cpp, that have definition of that external variable:
#include "statich.h"
S nakedS;
Dynamic library libdyn.so compiled from source file dyn.cpp and linking with libstatic.a. Here's source code:
#include "statich.h"
void foo() {
std::cout << "I'm foo() from dyn! nakedS.x == " << nakedS.x << std::endl;
}
Executable supertest that compiled from source file main.cpp and linking with both of libraries, static and shared. Here's source code:
#include "statich.h"
int main() {
std::cout << "nakedS.x == " << nakedS.x << std::endl;
}
I have CMakeLists.txt file that build all that stuff for me. Here it is:
cmake_minimum_required(VERSION 2.8.12)
set(CMAKE_CXX_FLAGS
"${CMAKE_CXX_FLAGS} -fPIC"
)
add_library( static STATIC "statich.cpp" )
add_library( dyn SHARED "dyn.cpp" )
target_link_libraries( dyn static )
add_executable( supertest main.cpp )
set(DEPS
static
dyn
)
target_link_libraries( supertest ${DEPS} )
Point is, when I run cmake . && make && ./supertest I got this output:
S(), this=0x6012c4
S(), this=0x6012c4
nakedS.x == 42
~S(), this=0x6012c4
~S(), this=0x6012c4
Which means double initialization of same object, that is not what I want at all. Can I change this behaviour without replacing libdyn.so with static analogue? Maybe, some compiler/linker flags? What should I read to learn more about it? Any help would be appreciated.
Also, I got this behaviour on my specific compiler version:
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
On other machine where I have diffirent compiler:
gcc version 4.6.4 (Ubuntu/Linaro 4.6.4-1ubuntu1~12.04)
All works fine.
Thanks in advance!

This is expected behaviour. To work it around, you could define your variable as weak, e.g.
#include "statich.h"
__attribute__((weak)) S nakedS;

Related

It it correct to reexport std library from the cpp module?

I want to use C++20 modules with MSVC compiler and with CMake. It seems to be pretty hard and not unversal to setup standard library headers compilation as module units, so I did the following:
stdlib.ixx:
export module stdlib;
export import std.core; // std.core is MSVC specific thing
main.cpp:
import stdlib;
int main() {
std::vector<std::string> vec = { "Hello, world 1!", "Hello, world 2!" };
for (const auto& v : vec) {
std::cout << v << std::endl;
}
return 0;
}
CMakeLists.txt:
cmake_minimum_required(VERSION 3.24)
project(test1)
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_compile_options(/experimental:module /translateInclude)
add_executable(test1 main.cpp stdlib.ixx)
Seems that this code works right, but it is correct/good/appropriate solution accoring to modern C++?

shared library error about "undefined symbol" when using `dlopen`

First thing first, here's my minimum reproducible program:
CMakeLists.txt:
cmake_minimum_required(VERSION 3.17)
project(untitled4)
set(CMAKE_CXX_STANDARD 17)
add_library(lib1 SHARED lib1.cpp)
add_executable(untitled4 main.cpp)
target_link_libraries(untitled4 PRIVATE dl)
main.cpp:
#include "iostream"
#include <dlfcn.h>
int Test() {
return 123456;
}
int main() {
auto* handler = dlopen("/home/liu/source/untitled4/cmake-build-debug/liblib1.so", RTLD_LAZY|RTLD_GLOBAL);
if (!handler) {
std::cerr << dlerror();
exit(1);
}
}
and lib1.cpp:
#include "iostream"
extern int Test();
class Foo {
public:
Foo() {
std::cout << Test() << std::endl;
}
};
Foo foo;
now let me explain:
As you can see I defined a function called Test in main.cpp, and I want to the shared library liblib1.so call it when it is loaded.
But when I run the main() function, there's an error log said:
/home/liu/source/untitled4/cmake-build-debug/liblib1.so: undefined symbol: _Z4Testv
I check the symbol by using nm untitled4 |grep Test and the symbol seems exist:
0000000000000b87 t _GLOBAL__sub_I__Z4Testv
0000000000000aea T _Z4Testv
So what did I do wrong? How to fix this?
An important thing to be notice is that in the real case, the build of lib1 and the build of main.cpp are totally separated, the two build don't know each other. But I can make them into one build (very difficult) if this can fix the problem(if there's no other way).
P.S. I tried using extern "C" to wrap around the Test() in both files, but not working, seems not the C/C++ function naming problem.
With the addition of the linker option -rdynamic your code does not fail with "undefined symbol".
You can set that with set_target_properties(untitled4 PROPERTIES ENABLE_EXPORTS 1) or target_link_options(untitled4 BEFORE PRIVATE "-rdynamic").
Example:
cmake_minimum_required(VERSION 3.17)
project(untitled4)
set(CMAKE_CXX_STANDARD 17)
add_library(lib1 SHARED lib1.cpp)
add_executable(untitled4 main.cpp)
set_target_properties(untitled4 PROPERTIES ENABLE_EXPORTS 1)
target_link_libraries(untitled4 PRIVATE dl)

Dynamically linking a shared library from a pybind11-wrapped code

I am trying to add python bindings to a medium-sized C++ scientific code (some tens of thousands LOCs). I have managed to make it work without too many issues, but I have now incurred in an issue which I am incapable of solving myself. The code is organized as follows:
All the classes and data structures are compiled in a library libcommon.a
Executables are created by linking this library
pybind11 is used to create a core.so python module
The bindings for the "main" parts work fine. Indeed, simulations launched from the standalone code or from python give the exact same results.
However, the code also supports a plugin-like system which can load shared libraries at runtime. These shared libraries contain classes that inherit from interfaces defined in the main code. It turns out that if I try to link these shared libraries from python I get the infamous "undefined symbol" errors. I have checked that these symbols are in the core.so module (using nm -D). In fact, simulations that perform the dynamic linking with the standalone code works perfectly (within the same folder and with the same input). Somehow, the shared lib cannot find the right symbols when called through python, but it has no issues when loaded by the standalone code. I am using CMake to build the system.
What follows is a MCE. Copy each file in a folder, copy (or link) the pybind11 folder in the same place and use the following commands:
mkdir build
cd build
cmake ..
make
which will generate a standalone binary and a python module. The standalone executable will produce the correct output. By contrast, using the following commands in python3 (that, at least in my head, should be equivalent) yields an error:
import core
b = core.load_plugin()
main.cpp
#include "Base.h"
#include "plugin_loader.h"
#include <iostream>
int main() {
Base *d = load_plugin();
if(d == NULL) {
std::cerr << "No lib found" << std::endl;
return 1;
}
d->foo();
return 0;
}
Base.h
#ifndef BASE
#define BASE
struct Base {
Base();
virtual ~Base();
virtual void foo();
};
#endif
Base.cpp
#include "Base.h"
#include <iostream>
Base::Base() {}
Base::~Base() {}
void Base::foo() {
std::cout << "Hey, it's Base!" << std::endl;
}
plugin_loader.h
#ifndef LOADER
#define LOADER
#include "Base.h"
Base *load_plugin();
#endif
plugin_loader.cpp
#include "plugin_loader.h"
#include <dlfcn.h>
#include <iostream>
typedef Base* make_base();
Base *load_plugin() {
void *handle = dlopen("./Derived.so", RTLD_LAZY | RTLD_GLOBAL);
const char *dl_error = dlerror();
if(dl_error != nullptr) {
std::cerr << "Caught an error while opening shared library: " << dl_error << std::endl;
return NULL;
}
make_base *entry = (make_base *) dlsym(handle, "make");
return (Base *) entry();
}
Derived.h
#include "Base.h"
struct Derived : public Base {
Derived();
virtual ~Derived();
void foo() override;
};
extern "C" Base *make() {
return new Derived();
}
Derived.cpp
#include "Derived.h"
#include <iostream>
Derived::Derived() {}
Derived::~Derived() {}
void Derived::foo() {
std::cout << "Hey, it's Derived!" << std::endl;
}
bindings.cpp
#include <pybind11/pybind11.h>
#include "Base.h"
#include "plugin_loader.h"
PYBIND11_MODULE(core, m) {
pybind11::class_<Base, std::shared_ptr<Base>> base(m, "Base");
base.def(pybind11::init<>());
base.def("foo", &Base::foo);
m.def("load_plugin", &load_plugin);
}
CMakeLists.txt
PROJECT(foobar)
# compile the library
ADD_LIBRARY(common SHARED Base.cpp plugin_loader.cpp)
TARGET_LINK_LIBRARIES(common ${CMAKE_DL_LIBS})
SET_TARGET_PROPERTIES(common PROPERTIES POSITION_INDEPENDENT_CODE ON)
# compile the standalone code
ADD_EXECUTABLE(standalone main.cpp)
TARGET_LINK_LIBRARIES(standalone common)
# compile the "plugin"
SET(CMAKE_SHARED_LIBRARY_PREFIX "")
ADD_LIBRARY(Derived SHARED Derived.cpp)
# compile the bindings
ADD_SUBDIRECTORY(pybind11)
INCLUDE_DIRECTORIES( ${PROJECT_SOURCE_DIR}/pybind11/include )
FIND_PACKAGE( PythonLibs 3 REQUIRED )
INCLUDE_DIRECTORIES( ${PYTHON_INCLUDE_DIRS} )
ADD_LIBRARY(_oxpy_lib STATIC bindings.cpp)
TARGET_LINK_LIBRARIES(_oxpy_lib ${PYTHON_LIBRARIES} common)
SET_TARGET_PROPERTIES(_oxpy_lib PROPERTIES POSITION_INDEPENDENT_CODE ON)
pybind11_add_module(core SHARED bindings.cpp)
TARGET_LINK_LIBRARIES(core PRIVATE _oxpy_lib)
You are right, symbols from imported library are not visible because core loaded without RTLD_GLOBAL flag set. You can fix that with a couple of extra lines on python side:
import sys, os
sys.setdlopenflags(os.RTLD_GLOBAL | os.RTLD_LAZY)
import core
b = core.load_plugin()
From sys.setdlopenflags() doc:
To share symbols across extension modules, call as sys.setdlopenflags(os.RTLD_GLOBAL). Symbolic names for the flag values can be found in the os module (RTLD_xxx constants, e.g. os.RTLD_LAZY).

Clang's UBSan & Function Pointer: Is this illegal?

I'm trying to call some C++ functions through a function pointer table which is exported as a C symbol from a shared object. The code is actually working but Clang's undefined behavior sanitizer (= UBSan) sees the call I made is illegal as follows:
==11410==WARNING: Trying to symbolize code, but external symbolizer is not initialized!
path/to/HelloWorld.cpp:25:13: runtime error: call to function (unknown) through pointer to incorrect function type 'foo::CBar &(*)()'
(./libFoo.so+0x20af0): note: (unknown) defined here
Due to Clang's undefined behavior sanitizer, it is legal to indirectly call a function which returns a reference of a C++ standard class object through a function pointer but it's illegal for a user-defined class. Somebody could you please tell me what's wrong with it?
I've been trying to build the project on Ubuntu 14.04 with Clang-llvm 3.4-1ubuntu3 and CMake 2.8.12.2. To reproduce the phenomenon, please place the following 5 files in the same directory and invoke build.sh. It will create a makefile and build the project, and run the executable.
Foo.h
#ifndef FOO_H
#define FOO_H
#include <string>
//
#define EXPORT __attribute__ ((visibility ("default")))
namespace foo {
class CBar
{
// empty
};
class CFoo
{
public:
static CBar& GetUdClass();
static std::string& GetStdString();
};
// function pointer table.
typedef struct
{
CBar& (*GetUdClass)();
std::string& (*GetStdString)();
} fptr_t;
//! function pointer table which is exported.
extern "C" EXPORT const fptr_t FptrInFoo;
}
#endif
Foo.cpp
#include "Foo.h"
#include <iostream>
using namespace std;
namespace foo
{
// returns reference of a static user-defined class object.
CBar& CFoo::GetUdClass()
{
cout << "CFoo::GetUdClass" << endl;
return *(new CBar);
}
// returns reference of a static C++ standard class object.
std::string& CFoo::GetStdString()
{
cout << "CFoo::GetStdString" << endl;
return *(new string("Hello"));
}
// function pointer table which is to be dynamically loaded.
const fptr_t FptrInFoo = {
CFoo::GetUdClass,
CFoo::GetStdString,
};
}
HelloWorld.cpp
#include <iostream>
#include <string>
#include <dirent.h>
#include <dlfcn.h>
#include "Foo.h"
using namespace std;
using namespace foo;
int main()
{
// Retrieve a shared object.
const string LibName("./libFoo.so");
void *pLibHandle = dlopen(LibName.c_str(), RTLD_LAZY);
if (pLibHandle != 0) {
cout << endl;
cout << "Info: " << LibName << " found at " << pLibHandle << endl;
// Try to bind a function pointer table:
const string SymName("FptrInFoo");
const fptr_t *DynLoadedFptr = static_cast<const fptr_t *>(dlsym(pLibHandle, SymName.c_str()));
if (DynLoadedFptr != 0) {
cout << "Info: " << SymName << " found at " << DynLoadedFptr << endl;
cout << endl;
// Do something with the functions in the function table pointer.
DynLoadedFptr->GetUdClass(); // Q1. Why Clang UBSan find this is illegal??
DynLoadedFptr->GetStdString(); // Q2. And why is this legal??
} else {
cout << "Warning: Not found symbol" << endl;
cout << dlerror() << endl;
}
} else {
cout << "Warning: Not found library" << endl;
cout << dlerror() << endl;
}
cout << endl;
return 0;
}
CMakeLists.txt
project (test)
if(COMMAND cmake_policy)
cmake_policy(SET CMP0003 NEW)
endif(COMMAND cmake_policy)
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,-rpath,$ORIGIN")
add_library(Foo SHARED Foo.cpp)
add_executable(HelloWorld HelloWorld.cpp)
target_link_libraries (HelloWorld dl)
build.sh
#!/bin/bash
# 1. create a build directory.
if [ -d _build ]; then
rm -rf _build
fi
mkdir _build
cd _build
# 2. generate a makefile.
CC=clang CXX=clang++ CXXFLAGS="-fvisibility=hidden -fsanitize=undefined -O0 -g3" cmake ..
# 3. build.
make
# 4. and run the executable.
./HelloWorld
I've been trying to find a clue to dig into the issue and realized the issue was caught by "function" option of the sanitizer (-fsanitize=function) but it's not so much documented. I'd appreciate if you guys could give me a reasonable explanation for such a runtime error message which looks like coming from another planet. Thanks.
What was Clang pointing out as "unknown" in the output?
Below is the output from addr2line to check what was "unknown" for the sanitizer:
$ addr2line -Cfe _build/libFoo.so 0x20af0
foo::CFoo::GetUdClass()
path/to/Foo.cpp:12
Hmm, it really looks like the function I was expecting to call for me. Can you guess how did it look different for Clang?
CBar's typeinfo needs to have default visibility for the function's type be considered the same by Clang on Linux across the executable and the dynamic library; change Foo.h to:
class EXPORT CBar
{
...
}

Dynamic loaded libraries and shared global symbols

Since I observed some strange behavior of global variables in my dynamically loaded libraries, I wrote the following test.
At first we need a statically linked library: The header test.hpp
#ifndef __BASE_HPP
#define __BASE_HPP
#include <iostream>
class test {
private:
int value;
public:
test(int value) : value(value) {
std::cout << "test::test(int) : value = " << value << std::endl;
}
~test() {
std::cout << "test::~test() : value = " << value << std::endl;
}
int get_value() const { return value; }
void set_value(int new_value) { value = new_value; }
};
extern test global_test;
#endif // __BASE_HPP
and the source test.cpp
#include "base.hpp"
test global_test = test(1);
Then I wrote a dynamically loaded library: library.cpp
#include "base.hpp"
extern "C" {
test* get_global_test() { return &global_test; }
}
and a client program loading this library: client.cpp
#include <iostream>
#include <dlfcn.h>
#include "base.hpp"
typedef test* get_global_test_t();
int main() {
global_test.set_value(2); // global_test from libbase.a
std::cout << "client: " << global_test.get_value() << std::endl;
void* handle = dlopen("./liblibrary.so", RTLD_LAZY);
if (handle == NULL) {
std::cout << dlerror() << std::endl;
return 1;
}
get_global_test_t* get_global_test = NULL;
void* func = dlsym(handle, "get_global_test");
if (func == NULL) {
std::cout << dlerror() << std::endl;
return 1;
} else get_global_test = reinterpret_cast<get_global_test_t*>(func);
test* t = get_global_test(); // global_test from liblibrary.so
std::cout << "liblibrary.so: " << t->get_value() << std::endl;
std::cout << "client: " << global_test.get_value() << std::endl;
dlclose(handle);
return 0;
}
Now I compile the statically loaded library with
g++ -Wall -g -c base.cpp
ar rcs libbase.a base.o
the dynamically loaded library
g++ -Wall -g -fPIC -shared library.cpp libbase.a -o liblibrary.so
and the client
g++ -Wall -g -ldl client.cpp libbase.a -o client
Now I observe: The client and the dynamically loaded library possess a different version of the variable global_test. But in my project I'm using cmake. The build script looks like this:
CMAKE_MINIMUM_REQUIRED(VERSION 2.6)
PROJECT(globaltest)
ADD_LIBRARY(base STATIC base.cpp)
ADD_LIBRARY(library MODULE library.cpp)
TARGET_LINK_LIBRARIES(library base)
ADD_EXECUTABLE(client client.cpp)
TARGET_LINK_LIBRARIES(client base dl)
analyzing the created makefiles I found that cmake builds the client with
g++ -Wall -g -ldl -rdynamic client.cpp libbase.a -o client
This ends up in a slightly different but fatal behavior: The global_test of the client and the dynamically loaded library are the same but will be destroyed two times at the end of the program.
Am I using cmake in a wrong way? Is it possible that the client and the dynamically loaded library use the same global_test but without this double destruction problem?
g++ -Wall -g -ldl -rdynamic client.cpp libbase.a -o client
CMake adds -rdynamic option allowing loaded library to resolve symbols in the loading executable... So you can see that this is what you don't want. Without this option it just misses this symbol by accident.
But... You should not do any stuff like that there. Your libraries and executable should
not share symbols unless they are really should be shared.
Always think of dynamic linking as static linking.
If using shared libraries you must define the stuff you want to export with macro like here. See DLL_PUBLIC macro definition in there.
By default, the linker won't combine a global variable (a 'D') in the base executable with one in a shared library. The base executable is special. There might be an obscure way to do this with one of those obscure control files that ld reads, but I sort of doubt it.
--export-dynamic will cause a.out 'D' symbols to be available to shared libs.
However, consider the process. Step 1: you create a DSO from a .o with a 'U' and a .a with a 'D'. So, the linker incorporates the symbol in the DSO. Step 2, you create the executable with a 'U' in one of the .o files, and 'D' in both a .a and the DSO. It will try to resolve using the left-to-right rule.
Variables, as opposed to functions, pose certain difficulties for the linker across modules in any case. A better practice is to avoid global var references across module boundaries, and use function calls. However, that would still fail for you if you put the same function in both the base executable and a shared lib.
My first question is if there is any particular reason for which you both statically and dynamically (via dlopen) link the same code?
For your problem: -rdynamic will export the symbols from your program and what probably is happening is that dynamic linker resolves all references to your global variable to the first symbol it encounters in symbol tables. Which one is that I don't know.
EDIT: given your purpose I would link your program that way:
g++ -Wall -g -ldl client.cpp -llibrary -L. -o client
You may need to fix the order.
I would advise to use a dlopen(... RTLD_LAZY|RTLD_GLOBAL); to merge global symbol tables.
I would propose to compile any .a static library which you plan to link to a dinamic library, with -fvisibility=hidden parameter, so:
g++ -Wall -fvisibility=hidden -g -c base.cpp