MSVC 2017 creates copies of template function in shared libraries - c++

While trying to replicate the behavior in this question in Visual Studio 2017 I found that instead of linking &FuncTemplate<C> to the exact same address the function template<> FuncTemplate<C>() {} gets copied into dllA and dllB so that the corresponding test program always returns not equal.
The solution was setup fresh with 3 Win32Projects, one as ConsoleApplication, the others as DLL. To link the DLLs I added them as reference to the console project (linking manually didn't work either). The only change in code I made was adding the __declspec(dllexport) to a() and b().
Is this behavior standard conforment? It seems like the ODR should be used here to collapse the copies of the function. Is there a way to get the same behavior seen in the other question?
Template.h
#pragma once
typedef void (*FuncPtr)();
template<typename T>
void FuncTemplate() {}
class C {};
a.cpp - dll project 1
#include "Template.h"
__declspec(dllexport) FuncPtr a() {
return &FuncTemplate<C>;
}
b.cpp - dll project 2
#include "Template.h"
__declspec(dllexport )FuncPtr b() {
return &FuncTemplate<C>;
}
main.cpp - console project
#include <iostream>
#include "i.h"
// seems like there is no __declspec(dllimport) needed here
FuncPtr a();
FuncPtr b();
int main() {
std::cout << (a() == b() ? "equal" : "not equal") << std::endl;
return 0;
}

C++ compilation is generally split into two parts, the compiler itself and the linker. It is the job of the linker to find and consolidate all the compilations of an identical function into a single unit and throw away the duplicates. At the end of a linking step, every function should either be part of the linker output or flagged as needing to be resolved at execution time from another DLL. Each DLL will contain a copy of the function if it is being used within that DLL or exported from it.
The process of resolving dynamic links at execution time is outside of the C++ tool chain, it happens at the level of the OS. It doesn't have the ability to consolidate duplicates like the linker does.
I think as far as ODR is concerned, each DLL is considered a separate executable.

Related

std::any across shared library bounding in mingw

I stumbled about an issue while using libstdc++'s std::any implementation with mingw across a shared library boundary. It produces a std::bad_any_cast where it obviously should not (i believe).
I use mingw-w64, gcc-7 and compile the code with -std=c++1z.
The simplified code:
main.cpp:
#include <any>
#include <string>
// prototype from lib.cpp
void do_stuff_with_any(const std::any& obj);
int main()
{
do_stuff_with_any(std::string{"Hello World"});
}
lib.cpp:
Will be compiled into a shared library and linked with the executable from main.cpp.
#include <any>
#include <iostream>
void do_stuff_with_any(const std::any& obj)
{
std::cout << std::any_cast<const std::string&>(obj) << "\n";
}
This triggers a std::bad_any_cast although the any passed to do_stuff_with_any does contain a string. I digged into gcc's any implementation and it seems to use comparison of the address of a static inline member function (a manager chosen from a template struct depending on the type of the stored object) to check if the any holds an object of the requested type.
And the address of this function seems to change across the shared library boundary.
Isn't std::any guaranteed to work across shared library boundaries? Does this code trigger UB somewhere? Or is this a bug in the gcc implementation? I am pretty sure it works on linux so is this only a bug in mingw? Is it known or should i report it somewhere if so? Any ideas for (temporary) workarounds?
While it is true that this is an issue on how Windows DLLs work, and that as of GCC 8.2.0, the issue still remains, this can be easily worked around by changing the __any_caster function inside the any header to this:
template<typename _Tp>
void* __any_caster(const any* __any)
{
if constexpr (is_copy_constructible_v<decay_t<_Tp>>)
{
#if __cpp_rtti
if (__any->type().hash_code() == typeid(_Tp).hash_code())
#else
if (__any->_M_manager == &any::_Manager<decay_t<_Tp>>::_S_manage)
#endif
{
any::_Arg __arg;
__any->_M_manager(any::_Op_access, __any, &__arg);
return __arg._M_obj;
}
}
return nullptr;
}
Or something similar, the only relevant part is the comparison line wrapped in the #if.
To elaborate, there is 2 copies of the manager function one on the exe and one on the dll, the passed object contains the address of the exe because that's where it was created, but once it reaches the dll side, the pointer gets compared to the one in the dll address space, which will never match, so, instead type info hash_codes should be compared instead.

different behavior when linking with static library vs using object files in C++

I'm working with some legacy C++ code that is behaving in a way I don't understand. I'm using the Microsoft compiler but I've tried it with g++ (on Linux) as well—same behavior.
I have 4 files listed below. In essence, it's a registry that's keeping track of a list of members. If I compile all files and link the object files into one program, it shows the correct behavior: registry.memberRegistered is true:
>cl shell.cpp registry.cpp member.cpp
>shell.exe
1
So somehow the code in member.cpp gets executed (which I don't really understand, but OK).
However, what I want is to build a static library from registry.cpp and member.cpp, and link that against the executable built from shell.cpp. But when I do this, the code in member.cpp does not get executed and registry.memberRegistered is false:
>cl registry.cpp member.cpp /c
>lib registry.obj member.obj -OUT:registry.lib
>cl shell.cpp registry.lib
>shell.exe
0
My questions: how come it works the first way and not the second and is there a way (e.g. compiler/linker options) to make it work with the second way?
registry.h:
class Registry {
public:
static Registry& get_registry();
bool memberRegistered;
private:
Registry() {
memberRegistered = false;
}
};
registry.cpp:
#include "registry.h"
Registry& Registry::get_registry() {
static Registry registry;
return registry;
}
member.cpp:
#include "registry.h"
int dummy() {
Registry::get_registry().memberRegistered = true;
return 0;
}
int x = dummy();
shell.cpp:
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
void main() {
shell *cf = new shell;
cf->init();
}
You have been hit by what is popularly known as static initialization order fiasco.
The basics is that the order of initialization of static objects across translation units is unspecified. See this
The call here Registry::get_registry().memberRegistered; in "shell.cpp" may happen before the call here int x = dummy(); in "member.cpp"
EDIT:
Well, x isn't ODR-used. Therefore, the compiler is permitted not to evaluate int x = dummy(); before or after entering main(), or even at all.
Just a quote about it from CppReference (emphasis mine)
It is implementation-defined whether dynamic initialization
happens-before the first statement of the main function (for statics)
or the initial function of the thread (for thread-locals), or deferred
to happen after.
If the initialization is deferred to happen after the first statement
of main/thread function, it happens before the first odr-use of any
variable with static/thread storage duration defined in the same
translation unit as the variable to be initialized. If no variable or function is odr-used from a given translation unit, the non-local variables defined in that translation unit may never be initialized (this models the behavior of an on-demand dynamic library)...
The only way to get your program working as you want is to make sure x is ODR-used
shell.cpp
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
extern int x; //or extern int dummy();
int main() {
shell *cf = new shell;
cf->init();
int k = x; //or dummy();
}
^ Now, your program should work as expected. :-)
This is a result of the way linkers treat libraries: they pick and choose the objects that define symbols left undefined by other objects processed so far. This helps keep executable sizes smaller, but when a static initialization has side effects, it leads to the fishy behavior you've discovered: member.obj / member.o doesn't get linked in to the program at all, although its very existence would do something.
Using g++, you can use:
g++ shell.cpp -Wl,-whole-archive registry.a -Wl,-no-whole-archive -o shell
to force the linker to put all of your library in the program. There may be a similar option for MSVC.
Thanks a lot for all the replies. Very helpful.
So both the solution proposed WhiZTiM (making x ODR-used) and aschepler (forcing linker to include the whole library) work for me. The latter has my preference since it doesn't require any changes to the code. However, there seems to be no MSVC equivalent for --whole-archive.
In Visual Studio I managed to solve the problem as follows (I have a project for the registry static library, and one for the shell executable):
In the shell project add a reference to the registry project;
In the linker properties of the shell project under General set
"Link Library Dependencies" and "Use Library Dependent Inputs" to
"Yes".
If these options are set registry.memberRegistered is properly initialized. However, after studying the compiler/linker commands I concluded that setting these options results in VS simply passing the registry.obj and member.obj files to the linker, i.e.:
>cl /c member.cpp registry.cpp shell.cpp
>lib /OUT:registry.lib member.obj registry.obj
>link /OUT:shell.exe "registry.lib" shell.obj member.obj registry.obj
>shell.exe
1
To my mind, this is essentially the first approach to my original question. If you leave out registry.lib in the linker command it works fine as well.
Anyway, for now, it's good enough for me.
I'm working with CMake so now I need to figure out how to adjust CMake settings to make sure that the object files get passed to the linker? Any thoughts?

Is there a way to detect inline function ODR violations?

So I have this code in 2 separate translation units:
// a.cpp
#include <stdio.h>
inline int func() { return 5; }
int proxy();
int main() { printf("%d", func() + proxy()); }
// b.cpp
inline int func() { return 6; }
int proxy() { return func(); }
When compiled normally the result is 10. When compiled with -O3 (inlining on) I get 11.
I have clearly done an ODR violation for func().
It showed up when I started merging sources of different dll's into fewer dll's.
I have tried:
GCC 5.1 -Wodr (which requires -flto)
gold linker with -detect-odr-violations
setting ASAN_OPTIONS=detect_odr_violation=1 before running an instrumented binary with the address sanitizer.
Asan can supposedly catch other ODR violations (global vars with different types or something like that...)
This is a really nasty C++ issue and I am amazed there isn't reliable tooling for detecting it.
Pherhaps I have misused one of the tools I tried? Or is there a different tool for this?
EDIT:
The problem remains unnoticed even when I make the 2 implementations of func() drastically different so they don't get compiled to the same amount of instructions.
This also affects class methods defined inside the class body - they are implicitly inline.
// a.cpp
struct A { int data; A() : data(5){} };
// b.cpp
struct A { int data; A() : data(6){} };
Legacy code with lots of copy/paste + minor modifications after that is a joy.
The tools are imperfect.
I think Gold's check will only notice when the symbols have different types or different sizes, which isn't true here (both functions will compile to the same number of instructions, just using a different immediate value).
I'm not sure why -Wodr doesn't work here, but I think it only works for types, not functions, i.e. it will detect two conflicting definitions of a class type T but not your func().
I don't know anything about ASan's ODR checking.
The simplest way to detect such concerns is to copy all the functions into a single compilation unit (create one temporarily if needed). Any C++ compiler will then be able to detect and report duplicate definitions when compiling that file.

Strange link error when #ifdef #else is defined

I face a very strange link problem with VC 2010. Now I am developing a C++ library, and in order to make debug much easier, for some functions the library provides two function interfaces. For example,
class Object
{
public:
int fun(std::vector<int> &auxiliary_variable_for_debug_purpose);
int fun();
}
It is also possible to reorganize this class in this way:
class Object
{
public:
#ifdef DEBUG_INDICATOR
int fun(std::vector<int> &auxiliary_variable_for_debug_purpose);
#else
int fun();
#endif
}
By doing so I except to give a clear interface to the user.
The problem I face now is both int fun(std::vector<int> &auxiliary_variable_for_debug_purpose); and int fun(); will invoke another function called void help_function(), which is declared and defined in separated files.
file.h
void help_function()
and
file.cpp
void help_function()
{
// do something
}
As you can see void help_function() is the same regardless whether DEBUG_INDICATOR is defined or not. If I defined DEBUG_INDICATOR, I can compile the class with int fun() function without any problem. However, when I undefined DEBUG_INDICATOR, the error LNK2001 error happens, suggesting unresolved external symbol void help_function(). I have tried every possible means to figure it out, but failed. Any ideas will be appreciated.
EDIT
The library I have built is a dynamic library. Regardless whether DEBUG_INDICATOR is defined, the library can be built, and the link error only happens when the library is invoked.
Since you've not posted the exact error message you are getting, this MSDN link might help you.
Tip: Be specific while asking your question if you wish to receive accurate answers.

Weird seg fault problem

Greetings,
I'm having a weird seg fault problem. My application dumps a core file at runtime. After digging into it I found it died in this block:
#include <lib1/c.h>
...
x::c obj;
obj.func1();
I defined class c in a library lib1:
namespace x
{
struct c
{
c();
~c();
void fun1();
vector<char *> _data;
};
}
x::c::c()
{
}
x::c::~c()
{
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
I could not figure it out for some time till I ran nm on the lib1.so file: there are more function definitions than I defined:
x::c::c()
x::c::c()
x::c::~c()
x::c::~c()
x::c::func1()
x::c::func2()
After searching in code base I found someone else defined a class with same name in same namespace, but in another library lib2 as follows:
namespace x
{
struct c
{
c();
~c();
void func2();
vector<string> strs_;
};
}
x::c::c()
{
}
x::c::~c()
{
}
My application links to lib2, which has dependency on lib1. This interesting behavior brings several questions:
Why would it even work? I would expect a "multiple definitions" error while linking against lib2 (which depends upon lib1) but never had such. The application seems to be doing what's defined in func1 except it dumps a core at runtime.
After attaching debugger, I found my application calls the ctor of class c in lib2, then calls func1 (defined in lib1). When going out of scope it calls dtor of class c in lib2, where the seg fault occurs. Can anybody teach me how this could even occur?
How can I prevent such problems from happening again? Is there any C++ syntax I can use?
Forgot to mention I'm using g++ 4.1 on RHEL4, thank you very much!
1.
Violations of the "one definition rule" don't have to be diagnosed by your compiler. In fact, they are often only going to be known at link time when you link multiple object files together.
At link time, the information about the original class definitions may not exist any more (they are not needed after the compiler step) so having multiple definitions of a class is typically not easy to flag to the user.
2.
Once you have two distinct definitions pretty much anything can happen, you are in the territory of undefined behaviour. Whatever happens, it's a possible outcome.
3.
The most sensible thing to do is to communicate with the other members of your team. Agree who's going to use which namespaces and you won't get these problems. Otherwise, you point a documentation tool or static analysis tool over your entire project. Many such tools will be able to diagnose multiple inconsistent definitions of classes.
Just a guess but I don't see any using namespace x; so perhaps it used one namespace instead of the other?
With the advent of templates it became necessary to allow multiple definitions of a body of code with the same name; there was no way for the compiler to know if the same template code had already been generated in another compilation unit i.e. source file. When the linker finds these duplicates, it assumes they are identical. The burden is on you to make sure that they are - this is called the One Definition Rule.
On the linker level this is library interpositioning. The effective symbol bound unfortunately depends on the order of object files on linker command line (this is, sigh, historical).
From what you describe it looks that lib1 comes first in linker argument list and lib2 comes second and interposes on symbols from lib1. This explains the calls to constructors and destructors from the lib2 but calls to func1 from lib1 (since there's no func1-derived symbol in lib2, so there's no "hiding", the call is bound to lib1.)
The solution to this particular problem is to reverse the order of libraries on the linker invocation command.
There's lots of answers about the one definition rule. However, to me, this looks a lot more like a missing copy constructor.
To elaborate:
If the copy constructor is called on your object, then you will get a memory leak. This is because delete will be called on the same set of pointers twice.
namespace x
{
struct c
{
c() {
}
~c() {
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
c(const c & rhs) {
for (int i=0; i< rhs.size(); ++i) {
int len = strlen(rhs[i]);
char *mem = malloc(len + 1);
strncpy(mem, rhs[i], len + 1);
_data.push_back(mem);
}
void fun1();
vector<char *> _data;
};
}