symbol name in shared object differs from function in .cpp file - c++

In a project environment, I wanted to change a source file for a shared object from c to cpp. I made sure to change its entry in the CMakeLists.txt, too:
add_library(*name* SHARED *mysource*.cpp)
target_link_libraries(*name as target* *item*)
The build process runs fine. Unfortunately, when I try to use it I get an error that the functions inside the .so can not be found.
After checking the dynamic symbol table inside the shared object with objdump -T, I found out that the names of the symbols differ from the ones in the source file.
e.g.
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx);
becomes
_Z17sr_plugin_init_cbP16sr_session_ctx_sPPv
Inside my visual studio code it says that it can build the object and link the shared library correctly, and it also changed from C to CXX in the output and gives me no errors even though some code is c++ only.
Why do the symbol names change?

Why do the symbol names change?
C++ has a feature called function overload. Basically what happens is that you declare two function that are named the same, but slightly differ:
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx);
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx, int some_arg);
or a little worse case:
struct A {
# each of these functions can be different depending on the object
void func();
void func() const;
void func() volatile;
void func() volatile const;
};
Functions are named the same. Linker doesn't see the C++ source, but it still has to differentiate between the two functions to link with them. So C++ compiler "mangles" the function names, so that linker can differentiate between them. For big simplicity it could look like:
sr_plugin_init_cb_that_doesnt_take_int_arg
sr_plugin_init_cb_that_takes_int_arg
A_func
A_func_but_object_is_const
A_func_but_object_is_volatile
A_func_but_object_is_volatile_and_const
The rules of name mangling are complicated, to make the names as short as possible. They have to take into account any number of templates, arguments, objects, names, qualifers, lambdas, overloads, operators etc. and generate an unique name and they have to use only characters that are compatible with the linker on a specific architecture. For example here is a reference for name mangling used by gnu g++ compiler.
The symbol name _Z17sr_plugin_init_cbP16sr_session_ctx_sPPv is the mangled by your compiler name of your function.

Thank You very much for the detailled answer. I now understand the issue.
After a quick search I found a solution for my problem. Encapsulating the function prototypes like this avoids the name mangling.
extern "C" {
// Function prototypes
};

Related

NASM call for external C++ function

I am trying to call external C++ function from NASM. As I was searching on google I did not find any related solution.
C++
void kernel_main()
{
char* vidmem = (char*)0xb8000;
/* And so on... */
}
NASM
;Some calls before
section .text
;nothing special here
global start
extern kernel_main ;our problem
After running compiling these two files I am getting this error: kernel.asm(.text+0xe): undefined reference to kernel_main'
What is wrong here? Thanks.
There is no standardized method of calling C++ functions from assembly, as of now. This is due to a feature called name-mangling. The C++ compiler toolchain does not emit symbols with the names exactly written in the code. Therefore, you don't know what the name will be for the symbol representing the function coded with the name kernel_main or kernelMain, whatever.
Why is name-mangling required?
You can declare multiple entities (classes, functions, methods, namespaces, etc.) with the same name in C++, but under different parent namespaces. This causes symbol conflicts if two entities with the name local name (e.g. local name of class SomeContainer in namespace SymbolDomain is SomeContainer but global name is SymbolDomain::SomeContainer, atleast to talk in this answer, okay) have the same symbol name.
Conflicts also occur with method overloading, therefore, the types of each argument are also emitted (in some form) for methods of classes. To cope with this, the C++ toolchain will somehow mangle the actual names in the ELF binary object.
So, can't I use the C++ mangled name in assembly?
Yes, this is one solution. You can use readelf -s fileName with the object-file for kernel_main. You'll have to search for a symbol having some similarity with kernel_main. Once you think you got it, then confirm that with echo _ZnSymbolName | c++filt which should output kernel_main.
You use this name in assembly instead of kernel_main.
The problem with this solution is that, if for some reason, you change the arguments, return value, or anything else (we don't know what affects name-mangling), your assembly code may break. Therefore, you have to be careful about this. On the other hand, this is not a good practice, as your going into non-standard stuff.
Note that name-mangling is not standardized, and varies from toolchain to toolchain. By depending on it, your sticking to the same compiler too.
Can't I do something standardized?
Yep. You could use a C function in C++ by declaring the function extern "C" like this
extern "C" void kernelMain(void);
This is the best solution in your case, as your kernel_main is already a C-style function with no parent class and namespace. Note that, the C function is written in C++ and still uses C++ features (internally).
Other solutions include using a macro indirection, where a C function calls the C++ function, if you really need to. Something like this -
///
/// Simple class containing a method to illustrate the concept of
/// indirection.
///
class SomeContainer
{
public:
int execute(int y)
{
}
}
#define _SepArg_ , // Comma macro, to pass into args, comma not used directly
///
/// Indirection for methods having return values and arguments (other than
/// this). For methods returning void or having no arguments, make something
/// similar).
///
#define _Generate_Indirection_RetEArgs(ret, name, ThisType, thisArg, eargs) \
extern "C" ret name ( ThisType thisArg, eargs ) \
{ \
return thisArg -> name ( eargs ); \
} \
_Generate_Indirection_RetEArgs(int, execute, SomeContainer, x, int y);

Linker removing static initialiser [duplicate]

I am working on a factory that will have types added to them, however, if the class is not explicitly instiated in the .exe that is exectured (compile-time), then the type is not added to the factory. This is due to the fact that the static call is some how not being made. Does anyone have any suggestions on how to fix this? Below is five very small files that I am putting into a lib, then an .exe will call this lib. If there is any suggestions on how I can get this to work, or maybe a better design pattern, please let me know. Here is basically what I am looking for
1) A factory that can take in types
2) Auto registration to go in the classes .cpp file, any and all registration code should go in the class .cpp (for the example below, RandomClass.cpp) and no other files.
BaseClass.h : http://codepad.org/zGRZvIZf
RandomClass.h : http://codepad.org/rqIZ1atp
RandomClass.cpp : http://codepad.org/WqnQDWQd
TemplateFactory.h : http://codepad.org/94YfusgC
TemplateFactory.cpp : http://codepad.org/Hc2tSfzZ
When you are linking with a static library, you are in fact extracting from it the object files which provide symbols which are currently used but not defined. In the pattern that you are using, there is probably no undefined symbols provided by the object file which contains the static variable which triggers registration.
Solutions:
use explicit registration
have somehow an undefined symbol provided by the compilation unit
use the linker arguments to add your static variables as a undefined symbols
something useful, but this is often not natural
a dummy one, well it is not natural if it is provided by the main program, as a linker argument it main be easier than using the mangled name of the static variable
use a linker argument stating that all the objects of a library have to be included
dynamic libraries are fully imported, thus don't have that problem
As a general rule of thumb, an application do not include static or global variables from a library unless they are implicitly or explicitly used by the application.
There are hundred different ways this can be refactored. One method could be to place the static variable inside function, and make sure the function is called.
To expand on one of #AProgrammer's excellent suggestions, here is a portable way to guarantee the calling program will reference at least one symbol from the library.
In the library code declare a global function that returns an int.
int make_sure_compilation_unit_referenced() { return 0; }
Then in the header for the library declare a static variable that is initialized by calling the global function:
extern int make_sure_compilation_unit_referenced();
static int never_actually_used = make_sure_compilation_unit_referenced();
Every compilation unit that includes the header will have a static variable that needs to be initialized by calling a (useless) function in the library.
This is made a little cleaner if your library has its own namespace encapsulating both of the definitions, then there's less chance of name collisions between the bogus function in your library with other libraries, or of the static variable with other variables in the compilation unit(s) that include the header.

How to force compiler to mangle C names to C++ names

I have .obj with function that has everything it needs to be linked as C++ member function. Problem is, it's in C and thus the class using it expects something uglier than it's normal name. So I figure this can be done in only 2 ways: either mangle the name of the C function and/or add additional symbol to the symbol table that would have the mangled name anyway, but I could still use it's original name too.. so mangle the name basically. Any ideas how to do this or have some completely other way of solving this? Please share but do consider usefulness of saying extern "C" in this particular case :)) thx
Your (C-based) object file has a symbol and you cannot redefine that symbol to have a different name -- that would be a task for the compiler generating that object file. The C compiler doesn't know about C++ and it cannot be made to emit a symbol with C++ linkage and name mangling. So the only way to use that symbol (your C function) is to call it by the symbol it is know for.
You can, of course, use that function to implement a C++ (member) function (the additional level of indirection is optimised away if the call is inline) as in
extern "C" { int my_C_func(void*, int); } // could be in an included header
struct A {
// implement the followind member using the C function
int operator()(int i) { return my_C_func(this,i); }
};
If the C++ class is already declared and its declaration cannot be touched, then you can still implement the member function in the same way in a separate source file. However, this cannot be inline and hence comes at the cost of an additional function call:
in file.cpp:
extern "C" { int my_C_func(void*, int); } // could be in an included header
int A::operator()(int i) { return my_C_func(this,i); }
From your reply to me comment, I conclude that you actually have control of the implementation of the C function. So, why do you need to implement this in C? Why can't you simply implement it in C++? Then you will get the correct linkage and name mangling and you can directly implement the desired member function.

How does linkage and name mangling work?

Lets take this code sample
//header
struct A { };
struct B { };
struct C { };
extern C c;
//code
A myfunc(B&b){ A a; return a; }
void myfunc(B&b, C&c){}
C c;
Lets do this line by line starting from the code section.
When the compiler sees the first myfunc method it does not care about A or B because its use is internal. Each c++ file will know what it takes in, what it returns. Although there needs to be a name for each of the two overload so how is that chosen and how does the linker know which means what?
Next is C c; I once had a bug were the linker wouldnt reconize thus allow me access to C in other C++ files. It was because that cpp didnt know c was extern and i had to mark it as extern in the header before i could link successfully. Now i am not sure if the class type has any involvement with the linker and the variable C. I dont know how RTTI will be involved but i do know C needs to be visible by other files.
How does the linker work and name mangling and such.
We first need to understand where compilation ends and linking begins. Compilation involves taking a compilation unit (a C or C++ source file) and turning it into an object file. Simplistically, this involves generating snippets of machine code for each function as well as a symbol table for all functions and static (global) variables. Placeholders are used for any symbols needed by the compilation unit that are external to the said compilation unit.
The linker is then responsible for loading all the object files and resolving all the place-holder symbols with real addresses (or offsets for machine independent code). This is placed into various sections that can be read by the operating system's dynamic loader when loading an executable.
So for the specifics. In order to avoid errors during linking, the compiler requires you to declare all external symbols that will be used by the current compilation unit. For global variables one must use the extern keyword, for functions this is optional.
All functions and global variables defined in a compilation unit have external linkage (i.e., can be referenced by other compilation units) unless one declares that with the static keyword (or the unnamed namespace in C++). In C++, the vtable will also have a symbol needed for linkage.
Now in C++, since functions can be overloaded, the parameters also form part of the function name. Since machine-code is just addresses and registers, extra information needs to be added to the function name in the symbols table. This extra parameter information comes in the form of a mangled name and ensures that the linker links to the correct version of an overloaded function.
If you really are interested in the gory details take a look at the ELF file format (PDF) used extensively on Linux. Windows has a different format but the principles can be expected to be the same.
Name mangling on the Itanuim (and ARM) platforms can be found here.
http://en.wikipedia.org/wiki/Name_mangling

Static variable initialization over a library

I am working on a factory that will have types added to them, however, if the class is not explicitly instiated in the .exe that is exectured (compile-time), then the type is not added to the factory. This is due to the fact that the static call is some how not being made. Does anyone have any suggestions on how to fix this? Below is five very small files that I am putting into a lib, then an .exe will call this lib. If there is any suggestions on how I can get this to work, or maybe a better design pattern, please let me know. Here is basically what I am looking for
1) A factory that can take in types
2) Auto registration to go in the classes .cpp file, any and all registration code should go in the class .cpp (for the example below, RandomClass.cpp) and no other files.
BaseClass.h : http://codepad.org/zGRZvIZf
RandomClass.h : http://codepad.org/rqIZ1atp
RandomClass.cpp : http://codepad.org/WqnQDWQd
TemplateFactory.h : http://codepad.org/94YfusgC
TemplateFactory.cpp : http://codepad.org/Hc2tSfzZ
When you are linking with a static library, you are in fact extracting from it the object files which provide symbols which are currently used but not defined. In the pattern that you are using, there is probably no undefined symbols provided by the object file which contains the static variable which triggers registration.
Solutions:
use explicit registration
have somehow an undefined symbol provided by the compilation unit
use the linker arguments to add your static variables as a undefined symbols
something useful, but this is often not natural
a dummy one, well it is not natural if it is provided by the main program, as a linker argument it main be easier than using the mangled name of the static variable
use a linker argument stating that all the objects of a library have to be included
dynamic libraries are fully imported, thus don't have that problem
As a general rule of thumb, an application do not include static or global variables from a library unless they are implicitly or explicitly used by the application.
There are hundred different ways this can be refactored. One method could be to place the static variable inside function, and make sure the function is called.
To expand on one of #AProgrammer's excellent suggestions, here is a portable way to guarantee the calling program will reference at least one symbol from the library.
In the library code declare a global function that returns an int.
int make_sure_compilation_unit_referenced() { return 0; }
Then in the header for the library declare a static variable that is initialized by calling the global function:
extern int make_sure_compilation_unit_referenced();
static int never_actually_used = make_sure_compilation_unit_referenced();
Every compilation unit that includes the header will have a static variable that needs to be initialized by calling a (useless) function in the library.
This is made a little cleaner if your library has its own namespace encapsulating both of the definitions, then there's less chance of name collisions between the bogus function in your library with other libraries, or of the static variable with other variables in the compilation unit(s) that include the header.