I am trying to learn and understand name mangling in C++. Here are some questions:
(1) From devx
When a global function is overloaded, the generated mangled name for each overloaded version is unique. Name mangling is also applied to variables. Thus, a local variable and a global variable with the same user-given name still get distinct mangled names.
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables ?
(2) From Wiki
The need arises where the language allows different entities to be named with the same identifier as long as they occupy a different namespace (where a namespace is typically defined by a module, class, or explicit namespace directive).
I don't quite understand why name mangling is only applied to the cases when the identifiers belong to different namespaces, since overloading functions can be in the same namespace and same-name global and local variables can also be in the same space. How to understand this?
Do variables with same name but in different scopes also use name mangling?
(3) Does C have name mangling? If it does not, how can it deal with the case when some global and local variables have the same name? C does not have overloading functions, right?
Thanks and regards!
C does not do name mangling, though it does pre-pend an underscore to function names, so the printf(3) is actually _printf in the libc object.
In C++ the story is different. The history of it is that originally Stroustrup created "C with classes" or cfront, a compiler that would translate early C++ to C. Then rest of the tools - C compiler and linker would we used to produce object code. This implied that C++ names had to be translated to C names somehow. This is exactly what name mangling does. It provides a unique name for each class member and global/namespace function and variable, so namespace and class names (for resolution) and argument types (for overloading) are somehow included in the final linker names.
This is very easy to see with tools like nm(1) - compile your C++ source and look at the generated symbols. The following is on OSX with GCC:
namespace zoom
{
void boom( const std::string& s )
{
throw std::runtime_error( s );
}
}
~$ nm a.out | grep boom
0000000100001873 T __ZN4zoom4boomERKSs
In both C and C++ local (automatic) variables produce no symbols, but live in registers or on stack.
Edit:
Local variables do not have names in resulting object file for mere reason that linker does not need to know about them. So no name, no mangling. Everything else (that linker has to look at) is name-mangled in C++.
Mangling is simply how the compiler keeps the linker happy.
In C, you can't have two functions with the same name, no matter what. So that's what the linker was written to assume: unique names. (You can have static functions in different compilation units, because their names aren't of interest to the linker.)
In C++, you can have two functions with the same name as long as they have different parameter types. So C++ combines the function name with the types in some way. That way the linker sees them as having different names.
The exact manner of mangling is not significant to the programmer, only the compiler, and in fact every compiler does it differently. All that matters is that every function with the same base name is somehow made unique for the linker.
You can see now that adding namespaces and templates to the mix keeps extending the principle.
Technically, it's "decorating". It sounds less crude but also mangling sort of implies that CreditInterest might get rearranged into IntCrederestit whereas what actually happens is more like _CreditInterest#4 which is, fair to say, "decorated" more than mangled. That said, I call it mangling too :-) but you'll find more technical info and examples if you search for "C++ name decoration".
Are there other examples that are using name mangling, besides overloading functions and same-name global and local variables?
C++ mangles all symbols, always. It's just easier for the compiler. Typically the mangling encodes something about the parameter list or types as these are the most common causes of mangling being needed.
C does not mangle. Scoping is used to control access to local and global variables of the same name.
Source:http://sickprogrammersarea.blogspot.in/2014/03/technical-interview-questions-on-c_6.html
Name mangling is the process used by C++ compilers give each function in your program a unique name. In C++, generally programs have at-least a few functions with the same name. Thus name mangling can be considered as an important aspect in C++.
Example:
Commonly, member names are uniquely generated by concatenating the name of the member with that of the class e.g. given the declaration:
class Class1
{
public:
int val;
...
};
val becomes something like:
// a possible member name mangling
val__11Class1
agner has more information on what is a name mangling and how it is done in different compilers.
Name mangling (also called name decoration) is a method used by C++
compilers to add additional information to the names of functions and
objects in object files. This information is used by linkers when a
function or object defined in one module is referenced from another
module. Name mangling serves the following purposes:
Make it possible for linkers to distinguish between different versions of overloaded functions.
Make it possible for linkers to check that objects and functions are declared in exactly the same way in all modules.
Make it possible for linkers to give complete information about the type of unresolved references in error messages.
Name mangling was invented to fulfill purpose 1. The other purposes
are secondary benefits not fully supported by all compilers. The
minimum information that must be supplied for a function is the name
of the function and the types of all its parameters as well as any
class or namespace qualifiers. Possible additional information
includes the return type, calling convention, etc. All this
information is coded into a single ASCII text string which looks
cryptic to the human observer. The linker does not have to know what
this code means in order to fulfill purpose 1 and 2. It only needs to
check if strings are identical.
Related
Is there any way to explicitly do name mangling (also called name decoration) in a library written in c(or cpp).I want all the symbols of my shared library to have their names mangled.
Consider this question:
Two library of different versions in an application
In this if I can explicitly have all their names mangled , I think i can resolve that issue.May be there is some option in gcc compiler itself to do this.
Your question is:
Is there any way to explicitly do name mangling (also called name decoration) in a library written in c(or cpp).I want all the symbols of my shared library to have their names mangled.
However, I suspect that you're using the term name mangling inappropriately. Name mangling has nothing to do with library release version. If you mean to version each object exported in your library, then there are plenty of questions to answer that. Personally, I would use a versioned namespace -- but only because I haven't (yet) been bitten by it. Here's a quick example:
namespace mylibrary {
namespace v1 {
class foo {};
}
using foo = v1::foo;
}
mylibrary::foo f; // mylibrary::v1::foo
...then on a later release...
namespace mylibrary {
namespace v1 {
class foo {};
}
namespace v2 {
class foo;
}
using foo = v2::foo;
}
mylibrary::foo newer_f; // mylibrary::v2::foo
mylibrary::v1::foo older_f;
There are of course many permutations you could have. And there are a lot of caveats, especially if you have templated code or make use of ADL. If you release version 1 of the library with one definition of class foo but then version 2 has a different definition, then the two libraries will not be compatible! That's rather the whole point though.
If however I am incorrect and you truly do want to enforce C++ name mangling in your C++ library (which is odd, because it should be done by default), then the answer is twofold. First, take a look at some related questions:
Why would you use 'extern "C++"'?
"Undefined reference to" error while linking object files
"Undefined reference to" error when linking static C library with C++ code
The reading is related but not causal. The related questions are answering your question in reverse.
Many operating systems are written in C and that is typically why you'd see extern "C" when including system headers. It's also why you sometimes see the linker complaining about missing functions when you try to use things declared in a header whose library was compiled with C instead of C++.
So to go in the other direction (in your direction): in your header file, you can declare your exports to be extern "C++". That tells the compiler to specifically use mangled names when importing or exporting the object.
Using extern "C++" won't by itself be your magic trick. There are some GCC options which control some of the more specific functionality about name mangling. So, secondly, take a look at those. The (external link) to the GCC manual page is here: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html
Any option which mentions ABI, such as -fabi, might impact you. "-fabi" flags relate to "Application Binary Interface". You might want to learn more about these terms too. What is an application binary interface has some excellent answers describing what an ABI is and how you can start to reason about them. "-Wabi" will tell GCC to emit warnings when it detects potential ABI conflicts. But, like all things C++, it's not foolproof. I would not be surprised if there are name mangling issues which might not be detected by it. That's particularly true if you ever mix heterogenous compiler vendors or versions.
Importantly: mixing ABIs is likely going to be a big headache. I'd be very concerned about ABI incompatibilities being forced together and causing very difficult-to-debug undefined behaviors!
I am trying to call external C++ function from NASM. As I was searching on google I did not find any related solution.
C++
void kernel_main()
{
char* vidmem = (char*)0xb8000;
/* And so on... */
}
NASM
;Some calls before
section .text
;nothing special here
global start
extern kernel_main ;our problem
After running compiling these two files I am getting this error: kernel.asm(.text+0xe): undefined reference to kernel_main'
What is wrong here? Thanks.
There is no standardized method of calling C++ functions from assembly, as of now. This is due to a feature called name-mangling. The C++ compiler toolchain does not emit symbols with the names exactly written in the code. Therefore, you don't know what the name will be for the symbol representing the function coded with the name kernel_main or kernelMain, whatever.
Why is name-mangling required?
You can declare multiple entities (classes, functions, methods, namespaces, etc.) with the same name in C++, but under different parent namespaces. This causes symbol conflicts if two entities with the name local name (e.g. local name of class SomeContainer in namespace SymbolDomain is SomeContainer but global name is SymbolDomain::SomeContainer, atleast to talk in this answer, okay) have the same symbol name.
Conflicts also occur with method overloading, therefore, the types of each argument are also emitted (in some form) for methods of classes. To cope with this, the C++ toolchain will somehow mangle the actual names in the ELF binary object.
So, can't I use the C++ mangled name in assembly?
Yes, this is one solution. You can use readelf -s fileName with the object-file for kernel_main. You'll have to search for a symbol having some similarity with kernel_main. Once you think you got it, then confirm that with echo _ZnSymbolName | c++filt which should output kernel_main.
You use this name in assembly instead of kernel_main.
The problem with this solution is that, if for some reason, you change the arguments, return value, or anything else (we don't know what affects name-mangling), your assembly code may break. Therefore, you have to be careful about this. On the other hand, this is not a good practice, as your going into non-standard stuff.
Note that name-mangling is not standardized, and varies from toolchain to toolchain. By depending on it, your sticking to the same compiler too.
Can't I do something standardized?
Yep. You could use a C function in C++ by declaring the function extern "C" like this
extern "C" void kernelMain(void);
This is the best solution in your case, as your kernel_main is already a C-style function with no parent class and namespace. Note that, the C function is written in C++ and still uses C++ features (internally).
Other solutions include using a macro indirection, where a C function calls the C++ function, if you really need to. Something like this -
///
/// Simple class containing a method to illustrate the concept of
/// indirection.
///
class SomeContainer
{
public:
int execute(int y)
{
}
}
#define _SepArg_ , // Comma macro, to pass into args, comma not used directly
///
/// Indirection for methods having return values and arguments (other than
/// this). For methods returning void or having no arguments, make something
/// similar).
///
#define _Generate_Indirection_RetEArgs(ret, name, ThisType, thisArg, eargs) \
extern "C" ret name ( ThisType thisArg, eargs ) \
{ \
return thisArg -> name ( eargs ); \
} \
_Generate_Indirection_RetEArgs(int, execute, SomeContainer, x, int y);
I have been searching through various posts regarding whether symbol table for a C++ code contains functions' name along with the class name. Something which i could find on a post is that it depends on the type of compiler,
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
I could not understand whether it is actually compiler dependent or not? I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes? I don't have such a great/deep knowledge.
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
Most compiler textbooks will tell you about symbol tables, and often show you details about a modest complexity langauge such as Pascal. You won't find information about C++ symbol tables in a textbook; it is too arcane.
We offer a complete C++14 front end for our DMS Software Reengineering Toolkit. It parses C++, builds detailed ASTs, and performs name-and-type resolution, which includes building a precise symbol table.
What follows are slides from our tutorial on how to use DMS, focused on the C++ symbol table structures.
OP asked specifically for a view of what happens with classes. The following diagram shows this for the tiny C++ program in the upper left corner. The rest of the diagram shows boxes, which represent what we call "symbol spaces" (or "scopes"), which are essentially hash tables mapping symbol names (each box lists the symbols it owns) to the information that DMS knows about that symbol (source file location of definition, list of AST nodes that reference the definition, and a complex union that represents the type, and that may in turn point to other types). The arrows show how symbol spaces are connected; an arrow from space A to space B means "scope A is contained within scope B". Typically the symbol space lookup process, searching scope A for a symbol x, will continue the search in scope B if x is not found in A. You'll note the arrows are numbered with an integer; this tells the search machinery to look in the least-numbered parent scope first, before trying to search scopes using arrows with larger numbers. This is how scopes are ordered (note Class C inherits from A and B; any lookup of a field in class C such as "b" will be forced to first look in the scope for A, and then in the scope for B. In this way, the C++ lookup rules are achieved.
Note the the class names are recorded in the (unique) global namespace because they is declared at top level. If they had been defined in some explicit namespace, then the namespace would have a corresponding symbol space of its own that recorded the declared classes, and the namespace itself would be recorded in the global symbol space.
OP did not ask what the symbol table looks like for function bodies, but I just so happen to have an illustrative slide for that that, too, below.
The symbol spaces work the same way. What is shown in this slide is the linkage between a symbol space, and the scoped region it represents. That linkage is actually implemented by a pointer associated with the symbol space, to the corresponding AST(s, namespace definitions can be scattered around in multiple places).
Note that in this case, the function name is recorded in the global namespace because it is declared at top level. If it had been defined inside the scope of a class, the function name would have been recorded in the symbol space for the class body (on previous diagram).
As a general rule, the details of how the symbol table is organized is completely dependent on the compiler, and the choices the designers made. In our case, we designed a very general symbol table management package because we planned (and have) used the same package to handle multiple languages (C, C++, Java, COBOL, several legacy languages) in a uniform way.
However, the abstract structures of symbol spaces and inheritance will have to implemented in essentially equivalent ways across C++ compilers; after all, they have to model the same information. I'd expect similar structures in the GCC and Clang compilers (well, the integer-numbered inheritance arcs, maybe not :)
As a practical matter, it doesn't matter how many "passes" your compiler has. It pretty much has to build these structures to remember what it knows about the symbols, within a pass, and across passes.
While building a C++ parser is very hard by itself, building such a symbol table is much harder. The effort dwarfs the effort to build the C++ parser. Our C++ name resolver is some 250K SLOC of attribute-grammar code compiled and executed by DMS. Getting the details rights is an enormous headache; the C++ reference manual is enormous, confusing, the facts are scattered everywhere across the document, and in a variety of places it is contradictory (we try to send complaints about this to the committee) and or inconsistent between compilers (we have versions for GCC and Visual Studio 201x).
Update March 2017: Now have symbol tables for C++2014.
Update June 2018: Now have symbol tables for C++2017.
A symbol table maps names to constructs within the program. As such it is used to record the names of classes, functions, variables, and anything else that has a user-specified name within the program.
(There are two common kinds of symbol table - one that the compiler maintains when it is compiling your program, and another that exists in object file so that it can be linked to other objects. The two are strongly related, but need not have similar representation internally. Typically only some of the symbols from the compiler's symbol table will be output into the object).
Part of what you say makes no sense:
if it compiles code in one-pass then it will not need to store class name and subroutine names in your symbol table
How can the compiler determine to what construct a name refers if it cannot look it up in the symbol table?
but if it is a multi-pass compiler, it could add information about the class(es) it encounters and their subroutines so that it could do argument type checking and issue meaningful error messages.
There's no reason it could not do this in a single pass.
I could not understand whether it is actually compiler dependent or not?
All compilers are going to use a symbol table, but its use will be hidden inside the implementation.
I was assuming that compiler(for C++ code) would put function names with class names in the table whether it is single pass or multi pass compiler. How is it dependent on the passes?
How is what dependent on the passes? All names go in the symbol table - that's what it's for - and usually symbol resolution is important for just about everything else the compiler does, so it needs to be done early (i.e. in the first pass - and in fact the main purpose of the first pass in a multi-pass compiler compiler may well be just to build the symbol table!).
Moreover, could anyone show a sample symbol table for a simple C++ class, how would it look like (function names with class name)?
I'll give it a stab:
class A
{
int a;
void f(int, int);
};
Will yield a symbol table containing symbols "A", "a", and "f". Typically "a" and "f" would be marked with a scope to simplify lookup, eg:
"A" -> (class)
"A::a" -> (class variable member)
"A::f(int,int)" -> (class function member)
It's also possible that the a and f symbols will not be stored in the top-level symbol table, but rather that each name space (including C++ namespaces and classes) will have its own symbol table, containing the symbols defined inside it. But this is, arguably, just a data structure choice. You can still abstractly view the symbol table as a flat table, where a name maps to a construct.
In general the "A::a" symbol would not be output to the object file, since it is not required for linking.
Short answer: yes, using 'nm --demangle' on linux
Long answer: The functions in the symbol table contain the function name plus the return value and if it is belongs to a class, the class name too. But the names,types (not always) and classes are not written with it's fulls names to use less space. This strings called demangle. But you know that this short name is unique and you can parse the full class name from it. To view the symbol table of your program you can use 'nm' on linux.
http://linux.about.com/library/cmd/blcmdl1_nm.htm
It got the --demangle flag to view the original names. You can compile random short programs to see what comes out.
Lets take this code sample
//header
struct A { };
struct B { };
struct C { };
extern C c;
//code
A myfunc(B&b){ A a; return a; }
void myfunc(B&b, C&c){}
C c;
Lets do this line by line starting from the code section.
When the compiler sees the first myfunc method it does not care about A or B because its use is internal. Each c++ file will know what it takes in, what it returns. Although there needs to be a name for each of the two overload so how is that chosen and how does the linker know which means what?
Next is C c; I once had a bug were the linker wouldnt reconize thus allow me access to C in other C++ files. It was because that cpp didnt know c was extern and i had to mark it as extern in the header before i could link successfully. Now i am not sure if the class type has any involvement with the linker and the variable C. I dont know how RTTI will be involved but i do know C needs to be visible by other files.
How does the linker work and name mangling and such.
We first need to understand where compilation ends and linking begins. Compilation involves taking a compilation unit (a C or C++ source file) and turning it into an object file. Simplistically, this involves generating snippets of machine code for each function as well as a symbol table for all functions and static (global) variables. Placeholders are used for any symbols needed by the compilation unit that are external to the said compilation unit.
The linker is then responsible for loading all the object files and resolving all the place-holder symbols with real addresses (or offsets for machine independent code). This is placed into various sections that can be read by the operating system's dynamic loader when loading an executable.
So for the specifics. In order to avoid errors during linking, the compiler requires you to declare all external symbols that will be used by the current compilation unit. For global variables one must use the extern keyword, for functions this is optional.
All functions and global variables defined in a compilation unit have external linkage (i.e., can be referenced by other compilation units) unless one declares that with the static keyword (or the unnamed namespace in C++). In C++, the vtable will also have a symbol needed for linkage.
Now in C++, since functions can be overloaded, the parameters also form part of the function name. Since machine-code is just addresses and registers, extra information needs to be added to the function name in the symbols table. This extra parameter information comes in the form of a mangled name and ensures that the linker links to the correct version of an overloaded function.
If you really are interested in the gory details take a look at the ELF file format (PDF) used extensively on Linux. Windows has a different format but the principles can be expected to be the same.
Name mangling on the Itanuim (and ARM) platforms can be found here.
http://en.wikipedia.org/wiki/Name_mangling
Is it possible to export a function from a C++Builder DLL with a specific mangled name?
I am trying to build a C++Builder DLL to replace an existing VC++ DLL. The problem is that the application that uses the DLL expects one of the functions to have a specific mangled name.
That is, it expects the function to be called:
"?_FUNCTIONNAME_##YAHPAU_PARAM1_##PAU_PARAM2_###Z"
Of course I can export function from my DLL with a mangled name but CodeGear insists on using its own name mangling scheme.
Is it possible to force C++Builder to:
Use a specific mangled name for a function?
or
Use VC++ name mangling for a specific function?
Note that I only want to change the mangling for a specific function, not all the functions in the DLL.
Name mangling is just one of many aspects of the ABI. And it fact it would be the more easy to standardize. Compilers are using purposefully different name manglings when they are using different ABI in order to prevent linking incompatible objects.
One thing you may try for a given function is to mark it extern "C", then it will use the calling convention of C and the same mangling (commonly no mangling at all or just an initial _). Obviously that won't take care of other issues (handling of exceptions, precise content of vtbls, which registers are used for passing parameters, the precise definition of the standard library, ...)
Use a .def file to specify your own naming for the exported function.