NASM call for external C++ function - c++

I am trying to call external C++ function from NASM. As I was searching on google I did not find any related solution.
C++
void kernel_main()
{
char* vidmem = (char*)0xb8000;
/* And so on... */
}
NASM
;Some calls before
section .text
;nothing special here
global start
extern kernel_main ;our problem
After running compiling these two files I am getting this error: kernel.asm(.text+0xe): undefined reference to kernel_main'
What is wrong here? Thanks.

There is no standardized method of calling C++ functions from assembly, as of now. This is due to a feature called name-mangling. The C++ compiler toolchain does not emit symbols with the names exactly written in the code. Therefore, you don't know what the name will be for the symbol representing the function coded with the name kernel_main or kernelMain, whatever.
Why is name-mangling required?
You can declare multiple entities (classes, functions, methods, namespaces, etc.) with the same name in C++, but under different parent namespaces. This causes symbol conflicts if two entities with the name local name (e.g. local name of class SomeContainer in namespace SymbolDomain is SomeContainer but global name is SymbolDomain::SomeContainer, atleast to talk in this answer, okay) have the same symbol name.
Conflicts also occur with method overloading, therefore, the types of each argument are also emitted (in some form) for methods of classes. To cope with this, the C++ toolchain will somehow mangle the actual names in the ELF binary object.
So, can't I use the C++ mangled name in assembly?
Yes, this is one solution. You can use readelf -s fileName with the object-file for kernel_main. You'll have to search for a symbol having some similarity with kernel_main. Once you think you got it, then confirm that with echo _ZnSymbolName | c++filt which should output kernel_main.
You use this name in assembly instead of kernel_main.
The problem with this solution is that, if for some reason, you change the arguments, return value, or anything else (we don't know what affects name-mangling), your assembly code may break. Therefore, you have to be careful about this. On the other hand, this is not a good practice, as your going into non-standard stuff.
Note that name-mangling is not standardized, and varies from toolchain to toolchain. By depending on it, your sticking to the same compiler too.
Can't I do something standardized?
Yep. You could use a C function in C++ by declaring the function extern "C" like this
extern "C" void kernelMain(void);
This is the best solution in your case, as your kernel_main is already a C-style function with no parent class and namespace. Note that, the C function is written in C++ and still uses C++ features (internally).
Other solutions include using a macro indirection, where a C function calls the C++ function, if you really need to. Something like this -
///
/// Simple class containing a method to illustrate the concept of
/// indirection.
///
class SomeContainer
{
public:
int execute(int y)
{
}
}
#define _SepArg_ , // Comma macro, to pass into args, comma not used directly
///
/// Indirection for methods having return values and arguments (other than
/// this). For methods returning void or having no arguments, make something
/// similar).
///
#define _Generate_Indirection_RetEArgs(ret, name, ThisType, thisArg, eargs) \
extern "C" ret name ( ThisType thisArg, eargs ) \
{ \
return thisArg -> name ( eargs ); \
} \
_Generate_Indirection_RetEArgs(int, execute, SomeContainer, x, int y);

Related

symbol name in shared object differs from function in .cpp file

In a project environment, I wanted to change a source file for a shared object from c to cpp. I made sure to change its entry in the CMakeLists.txt, too:
add_library(*name* SHARED *mysource*.cpp)
target_link_libraries(*name as target* *item*)
The build process runs fine. Unfortunately, when I try to use it I get an error that the functions inside the .so can not be found.
After checking the dynamic symbol table inside the shared object with objdump -T, I found out that the names of the symbols differ from the ones in the source file.
e.g.
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx);
becomes
_Z17sr_plugin_init_cbP16sr_session_ctx_sPPv
Inside my visual studio code it says that it can build the object and link the shared library correctly, and it also changed from C to CXX in the output and gives me no errors even though some code is c++ only.
Why do the symbol names change?
Why do the symbol names change?
C++ has a feature called function overload. Basically what happens is that you declare two function that are named the same, but slightly differ:
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx);
int sr_plugin_init_cb(sr_session_ctx_t *session, void **private_ctx, int some_arg);
or a little worse case:
struct A {
# each of these functions can be different depending on the object
void func();
void func() const;
void func() volatile;
void func() volatile const;
};
Functions are named the same. Linker doesn't see the C++ source, but it still has to differentiate between the two functions to link with them. So C++ compiler "mangles" the function names, so that linker can differentiate between them. For big simplicity it could look like:
sr_plugin_init_cb_that_doesnt_take_int_arg
sr_plugin_init_cb_that_takes_int_arg
A_func
A_func_but_object_is_const
A_func_but_object_is_volatile
A_func_but_object_is_volatile_and_const
The rules of name mangling are complicated, to make the names as short as possible. They have to take into account any number of templates, arguments, objects, names, qualifers, lambdas, overloads, operators etc. and generate an unique name and they have to use only characters that are compatible with the linker on a specific architecture. For example here is a reference for name mangling used by gnu g++ compiler.
The symbol name _Z17sr_plugin_init_cbP16sr_session_ctx_sPPv is the mangled by your compiler name of your function.
Thank You very much for the detailled answer. I now understand the issue.
After a quick search I found a solution for my problem. Encapsulating the function prototypes like this avoids the name mangling.
extern "C" {
// Function prototypes
};

dlsym Functions inside a class c++ Linux [duplicate]

I am trying to learn and understand name mangling in C++. Here are some questions:
(1) From devx
When a global function is overloaded, the generated mangled name for each overloaded version is unique. Name mangling is also applied to variables. Thus, a local variable and a global variable with the same user-given name still get distinct mangled names.
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables ?
(2) From Wiki
The need arises where the language allows different entities to be named with the same identifier as long as they occupy a different namespace (where a namespace is typically defined by a module, class, or explicit namespace directive).
I don't quite understand why name mangling is only applied to the cases when the identifiers belong to different namespaces, since overloading functions can be in the same namespace and same-name global and local variables can also be in the same space. How to understand this?
Do variables with same name but in different scopes also use name mangling?
(3) Does C have name mangling? If it does not, how can it deal with the case when some global and local variables have the same name? C does not have overloading functions, right?
Thanks and regards!
C does not do name mangling, though it does pre-pend an underscore to function names, so the printf(3) is actually _printf in the libc object.
In C++ the story is different. The history of it is that originally Stroustrup created "C with classes" or cfront, a compiler that would translate early C++ to C. Then rest of the tools - C compiler and linker would we used to produce object code. This implied that C++ names had to be translated to C names somehow. This is exactly what name mangling does. It provides a unique name for each class member and global/namespace function and variable, so namespace and class names (for resolution) and argument types (for overloading) are somehow included in the final linker names.
This is very easy to see with tools like nm(1) - compile your C++ source and look at the generated symbols. The following is on OSX with GCC:
namespace zoom
{
void boom( const std::string& s )
{
throw std::runtime_error( s );
}
}
~$ nm a.out | grep boom
0000000100001873 T __ZN4zoom4boomERKSs
In both C and C++ local (automatic) variables produce no symbols, but live in registers or on stack.
Edit:
Local variables do not have names in resulting object file for mere reason that linker does not need to know about them. So no name, no mangling. Everything else (that linker has to look at) is name-mangled in C++.
Mangling is simply how the compiler keeps the linker happy.
In C, you can't have two functions with the same name, no matter what. So that's what the linker was written to assume: unique names. (You can have static functions in different compilation units, because their names aren't of interest to the linker.)
In C++, you can have two functions with the same name as long as they have different parameter types. So C++ combines the function name with the types in some way. That way the linker sees them as having different names.
The exact manner of mangling is not significant to the programmer, only the compiler, and in fact every compiler does it differently. All that matters is that every function with the same base name is somehow made unique for the linker.
You can see now that adding namespaces and templates to the mix keeps extending the principle.
Technically, it's "decorating". It sounds less crude but also mangling sort of implies that CreditInterest might get rearranged into IntCrederestit whereas what actually happens is more like _CreditInterest#4 which is, fair to say, "decorated" more than mangled. That said, I call it mangling too :-) but you'll find more technical info and examples if you search for "C++ name decoration".
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables?
C++ mangles all symbols, always. It's just easier for the compiler. Typically the mangling encodes something about the parameter list or types as these are the most common causes of mangling being needed.
C does not mangle. Scoping is used to control access to local and global variables of the same name.
Source:http://sickprogrammersarea.blogspot.in/2014/03/technical-interview-questions-on-c_6.html
Name mangling is the process used by C++ compilers give each function in your program a unique name. In C++, generally programs have at-least a few functions with the same name. Thus name mangling can be considered as an important aspect in C++.
Example:
Commonly, member names are uniquely generated by concatenating the name of the member with that of the class e.g. given the declaration:
class Class1
{
public:
int val;
...
};
val becomes something like:
// a possible member name mangling
val__11Class1
agner has more information on what is a name mangling and how it is done in different compilers.
Name mangling (also called name decoration) is a method used by C++
compilers to add additional information to the names of functions and
objects in object files. This information is used by linkers when a
function or object defined in one module is referenced from another
module. Name mangling serves the following purposes:
Make it possible for linkers to distinguish between different versions of overloaded functions.
Make it possible for linkers to check that objects and functions are declared in exactly the same way in all modules.
Make it possible for linkers to give complete information about the type of unresolved references in error messages.
Name mangling was invented to fulfill purpose 1. The other purposes
are secondary benefits not fully supported by all compilers. The
minimum information that must be supplied for a function is the name
of the function and the types of all its parameters as well as any
class or namespace qualifiers. Possible additional information
includes the return type, calling convention, etc. All this
information is coded into a single ASCII text string which looks
cryptic to the human observer. The linker does not have to know what
this code means in order to fulfill purpose 1 and 2. It only needs to
check if strings are identical.

How to force compiler to mangle C names to C++ names

I have .obj with function that has everything it needs to be linked as C++ member function. Problem is, it's in C and thus the class using it expects something uglier than it's normal name. So I figure this can be done in only 2 ways: either mangle the name of the C function and/or add additional symbol to the symbol table that would have the mangled name anyway, but I could still use it's original name too.. so mangle the name basically. Any ideas how to do this or have some completely other way of solving this? Please share but do consider usefulness of saying extern "C" in this particular case :)) thx
Your (C-based) object file has a symbol and you cannot redefine that symbol to have a different name -- that would be a task for the compiler generating that object file. The C compiler doesn't know about C++ and it cannot be made to emit a symbol with C++ linkage and name mangling. So the only way to use that symbol (your C function) is to call it by the symbol it is know for.
You can, of course, use that function to implement a C++ (member) function (the additional level of indirection is optimised away if the call is inline) as in
extern "C" { int my_C_func(void*, int); } // could be in an included header
struct A {
// implement the followind member using the C function
int operator()(int i) { return my_C_func(this,i); }
};
If the C++ class is already declared and its declaration cannot be touched, then you can still implement the member function in the same way in a separate source file. However, this cannot be inline and hence comes at the cost of an additional function call:
in file.cpp:
extern "C" { int my_C_func(void*, int); } // could be in an included header
int A::operator()(int i) { return my_C_func(this,i); }
From your reply to me comment, I conclude that you actually have control of the implementation of the C function. So, why do you need to implement this in C? Why can't you simply implement it in C++? Then you will get the correct linkage and name mangling and you can directly implement the desired member function.

Why I need C++ linkage for a template?

Sometimes I try to follow the logic of some rules, sometimes the logic of why things are happening the way they do defeats any law that I know of.
Typically a template it's described as something that lives only during the compilation phase and it's exactly equivalent to hand-writing some function foo for any given type T .
So why this code doesn't compile ( I'm using C++11 with gcc and clang at the moment but I don't think it's that relevant in this case ) ?
#include <iostream>
#include <cstdint>
#include <cstdlib>
extern "C" {
template <typename T>
T foo(T t)
{
return t;
}
}
int main()
{
uint32_t a = 42;
std::cout << foo(a) << '\n';
return EXIT_SUCCESS;
}
And the thing that defeats all the logic is that the complain is about the linkage, and the implicit message is that this code doesn't generate a function, it generates something else that after compilation it's not suitable for a C style linkage.
What is the technical reason why this code doesn't compile ?
Let's look at this from a simple perspective. At the very least, using extern "C" will remove the C++ name mangling. So, we then have your template, and we'll instantiate it twice.
int foo(int val);
float foo(float val);
Under C's naming rules, these are required to have the same name foo from the perspective of the linker. If they have the same name though, we can't distinguish between them, and we'll have an error.
Under C++, the rules for how names are mangled is implementation defined. So C++ compilers will apply a name mangling to these two functions to differentiate them. Perhaps we'll call them foo_int and foo_float.
Because C++ can do this, we have no issues. But extern "C" requires the compiler to apply the C naming rules.
"linkage" is a slightly misleading term. The main thing that extern "C" changes is name mangling. That is, it produces symbol names in the object files that are compatible with the sort of symbols that equivalent C code would produce. That way it can link with C object code.... but it's a different sort of thing than specifying static or extern linkage.
But templates don't have a C equivalent, and name mangling is used to make sure that different instantiations of a given templated function result in different symbol names (so that the linker knows which one to use in a given place).
So there's no way to give templates C linkage; you're asking the compiler to do two fundamentally incompatible things.
Well other answers have explained why it doesn't work from the C++ side.
From the C side there are work-rounds but they are not portable.
You can simply not use the extern "C" keyword, create name-mangled functions and then in the C code link to the actual mangled names.
To make that easier you could also use GCC's abi::__cxa_demangle() function combined with a look-up table so you don't need to know what the mangled function names are (just their demangled signature).
But it all a bit of a bodge really.
Of course if you only call the template functions from C code, they'll never get instantiated to begin with. So you would need to make sure they get called in the C++ code to make sure they're present in the object file.

C/C++ linkage convention

When calling C++ algorithms like copy_if, transform etc which take a unary or binary function as the last argument, can I pass a C library function like atoi or tolower.
For e.g. below calls work fine and give the correct output (tried in ideone)
1) transform (foo, foo+5, bar, atoi);
2) transform (foo, foo+5, bar, ptr_fun(atoi));
3) transform(s.begin(),s.end(),s.begin(), static_cast<int (*)(int)>(tolower));
Is this usage guaranteed to work with all C++ compilers ?
The book thinking in C++ mentions "This works with some compilers, but it is not required to." The reason mentioned is (as I understand it) transform is C++ function and expects its last argument to have same calling convention.
The book also suggests a solution for this problem which is to create a wrapper function like this in a separate cpp file and do not include iostreams header file.
// tolower_wrapper.cpp
string strTolower(string s) {
transform(s.begin(), s.end(), s.begin(), tolower);
return s;
}
This works fine, but I did not understand how this resolves the calling convention issue ?
transform is still a c++ function and tolower is still a C function in the strTolower, so how this different calling conventions are handled here.
The first thing to note, which is not actually part of your question but which might help explain for someone reading this, is that the algorithms can take either a function pointer or a function object as an argument.
A function pointer is just that - a pointer to a function which expects to take a specific set of parameters and return a specific type.
A function object is an instance of a class which has overridden operator().
When expanding the algorithm template, the compiler will be able to see which of the two cases applies and will generate appropriate calling code.
In the case of a C function being used as a binary function in an algorithm, it is a function pointer that you are supplying. You can call a C function from C++, as long as it is declared extern C { ... }.
Many compilers come with header files for the C library functions which include something like this:
#ifdef __cplusplus
extern "C" {
#endif
/* function declarations here */
#ifdef __cplusplus
}
#endif
so that if you include a C library header from a C++ program, the contained functions will all be magically available to you. That part, however, is not guaranteed by the standard, which is why your book states it may not work with all compilers.
Another wrinkle is that you are not allowed to cast function pointers to a type with a different language linkage, which in at least some of your examples you are doing, although some compilers do seem to allow that anyway - for example see this GCC Bug.
The other catch, which applies specifically to tolower for example, is that some of the names of C library functions are also names of functions or templates in the C++ std library. For example, the name tolower is also defined in <locale>. This specific case is discussed in this GCC bug report. The use of a wrapper, compiled in a separate compilation unit which does not include the conflicting declarations, would resolve this issue.