In C++ how is function overloading typically implemented? - c++

If there is no function overloading, the function name serves as the address of the function code, and when a function is being called, its address is easy to find using its name. However with function overloading, how exactly can the program find the correct function address? Is there a hidden table similar to virtual tables that stores the overloaded functions with their address? Thanks a lot!

Name mangling.
It's all done at compile time. The C++ compiler actually modifies the function names you give it internally, so that a function like
int foo(int a, float b, char c)
internally gets a name equivalent to
func_foo_int_float_char()
(the real symbol is usually some gobbledygook like ?CFoo#Foo##QAAX_N#Z ).
As you can see, the name is decorated depending on the exact number and types of parameters passed. So, when you call a function, it's easy for the compiler to look at the parameters you are passing, decorate the function name with them, and come up with the correct symbol. For example,
int a, b; float f; char c;
foo(a,f,c) ; // compiler looks for an internal symbol called func_foo_int_float_char
foo(a,b,c) ; // compiler looks for a symbol called func_foo_int_int_char
Again, it's all done completely at compile time.

The compiler can look at the call, and match that against the known existing overloaded implementations, and pick the right one. No need for a dynamic table, it's all perfectly doable statically at compile-time.
Update: removed my attempt at illustrating the concept by showing differently-named functions that the compiler can choose between.

If you are talking about overloaded methods of the same class, like so:
void function(int n);
void function(char *s);
...
objectInstance->function("Hello World")
It is a compile time thingy. The compiler knows (or in some situations, makes a best guess) at this point which method to call.
A comment I made in the question, I repeat here.
People who suggest name mangling are misguided I think. It is not as if the compiler mangles the name and just does a lookup among the mangled names. It needs to infer the proper types from the available methods. Once it does that, it already knows which method to call. It then uses the mangled name as the last step. Name mangling is not a prerequisite for determining which overloaded function to call.

Overloaded functions are resolved at compile-time. The compiler finds a suitable match for the given set of parameters and simply calls the corresponding function by its address (void foo(int) and void foo() are practically two totally independent functions - if you have foo(4) in your code, the compiler knows which function to call).

It is, I believe, achieved through name mangling:
the functions you know as foo(int) and foo(double) are actually named something like int_foo() and double_foo() (or similar, I'm not entirely sure of the particular semantics employed for C++). This means that C++ symbols are usually an order of magnitude larger than the names they are given in code.

C++ compilers use name mangling (different name for each overload) to distinguish between the functions in the object file. For example
int test(int a){}
int test(float a,float b){}
int test(double a){}
int testbam(double a){}
would produce the symbol names __Z4testi, __Z4testff, __Z4testd, __Z7testbamd. This name mangling is highly compiler-dependent (sadly) and one of many reasons why often C is preferred over C++.
When calling the function test, the compiler matches the given argument types and number of arguments against each function overload. The function prototypes are then used to find out which one should be called.

The function signature is composed of the function name + parameter(s) type(s)

Even if no function overload, compilers usually mangle function and variable names. It is called name mangling. It happens in both C and C++. Function name can be decorated by notably (1) calling convention, (2) C++ function overloading, (3) class member function.
GNU binutil c++filt can undecorate this mangled name, and in Windows, there is UnDecorateSymbolName

Related

Why can't C functions be name-mangled?

I had an interview recently and one question asked was what is the use of extern "C" in C++ code. I replied that it is to use C functions in C++ code as C doesn't use name-mangling. I was asked why C doesn't use name-mangling and to be honest I couldn't answer.
I understand that when the C++ compiler compiles functions, it gives a special name to the function mainly because we can have overloaded functions of the same name in C++ which must be resolved at compile time. In C, the name of the function will stay the same, or maybe with an _ before it.
My query is: what's wrong with allowing the C++ compiler to mangle C functions also? I would have assumed that it doesn't matter what names the compiler gives to them. We call functions in the same way in C and C++.
It was sort of answered above, but I'll try to put things into context.
First, C came first. As such, what C does is, sort of, the "default". It does not mangle names because it just doesn't. A function name is a function name. A global is a global, and so on.
Then C++ came along. C++ wanted to be able to use the same linker as C, and to be able to link with code written in C. But C++ could not leave the C "mangling" (or, lack there of) as is. Check out the following example:
int function(int a);
int function();
In C++, these are distinct functions, with distinct bodies. If none of them are mangled, both will be called "function" (or "_function"), and the linker will complain about the redefinition of a symbol. C++ solution was to mangle the argument types into the function name. So, one is called _function_int and the other is called _function_void (not actual mangling scheme) and the collision is avoided.
Now we're left with a problem. If int function(int a) was defined in a C module, and we're merely taking its header (i.e. declaration) in C++ code and using it, the compiler will generate an instruction to the linker to import _function_int. When the function was defined, in the C module, it was not called that. It was called _function. This will cause a linker error.
To avoid that error, during the declaration of the function, we tell the compiler it is a function designed to be linked with, or compiled by, a C compiler:
extern "C" int function(int a);
The C++ compiler now knows to import _function rather than _function_int, and all is well.
It's not that they "can't", they aren't, in general.
If you want to call a function in a C library called foo(int x, const char *y), it's no good letting your C++ compiler mangle that into foo_I_cCP() (or whatever, just made up a mangling scheme on the spot here) just because it can.
That name won't resolve, the function is in C and its name does not depend on its list of argument types. So the C++ compiler has to know this, and mark that function as being C to avoid doing the mangling.
Remember that said C function might be in a library whose source code you don't have, all you have is the pre-compiled binary and the header. So your C++ compiler can't do "it's own thing", it can't change what's in the library after all.
what's wrong with allowing the C++ compiler to mangle C functions also?
They wouldn't be C functions any more.
A function is not just a signature and a definition; how a function works is largely determined by factors like the calling convention. The "Application Binary Interface" specified for use on your platform describes how systems talk to each other. The C++ ABI in use by your system specifies a name mangling scheme, so that programs on that system know how to invoke functions in libraries and so forth. (Read the C++ Itanium ABI for a great example. You'll very quickly see why it's necessary.)
The same applies for the C ABI on your system. Some C ABIs do actually have a name mangling scheme (e.g. Visual Studio), so this is less about "turning off name mangling" and more about switching from the C++ ABI to the C ABI, for certain functions. We mark C functions as being C functions, to which the C ABI (rather than the C++ ABI) is pertinent. The declaration must match the definition (be it in the same project or in some third-party library), otherwise the declaration is pointless. Without that, your system simply won't know how to locate/invoke those functions.
As for why platforms don't define C and C++ ABIs to be the same and get rid of this "problem", that's partially historical — the original C ABIs weren't sufficient for C++, which has namespaces, classes and operator overloading, all of which need to somehow be represented in a symbol's name in a computer-friendly manner — but one might also argue that making C programs now abide by the C++ is unfair on the C community, which would have to put up with a massively more complicated ABI just for the sake of some other people who want interoperability.
MSVC in fact does mangle C names, although in a simple fashion. It sometimes appends #4 or another small number. This relates to calling conventions and the need for stack cleanup.
So the premise is just flawed.
It's very common to have programs which are partially written in C and partially written in some other language (often assembly language, but sometimes Pascal, FORTRAN, or something else). It's also common to have programs contain different components written by different people who may not have the source code for everything.
On most platforms, there is a specification--often called an ABI [Application Binary Interface] which describes what a compiler must do to produce a function with a particular name which accepts arguments of some particular types and returns a value of some particular type. In some cases, an ABI may define more than one "calling convention"; compilers for such systems often provide a means of indicating which calling convention should be used for a particular function. For example, on the Macintosh, most Toolbox routines use the Pascal calling convention, so the prototype for something like "LineTo" would be something like:
/* Note that there are no underscores before the "pascal" keyword because
the Toolbox was written in the early 1980s, before the Standard and its
underscore convention were published */
pascal void LineTo(short x, short y);
If all of the code in a project was compiled using the same compiler, it
wouldn't matter what name the compiler exported for each function, but in
many situations it will be necessary for C code to call functions that were
compiled using other tools and cannot be recompiled with the present compiler
[and may very well not even be in C]. Being able to define the linker name
is thus critical to the use of such functions.
I'll add one other answer, to address some of the tangential discussions that took place.
The C ABI (application binary interface) originally called for passing arguments on the stack in reverse order (i.e. - pushed from right to left), where the caller also frees the stack storage. Modern ABI actually uses registers for passing arguments, but many of the mangling considerations go back to that original stack argument passing.
The original Pascal ABI, in contrast, pushed the arguments from left to right, and the callee had to pop the arguments. The original C ABI is superior to the original Pascal ABI in two important points. The argument push order means that the stack offset of the first argument is always known, allowing functions that have an unknown number of arguments, where the early arguments control how many other arguments there are (ala printf).
The second way in which the C ABI is superior is the behavior in case the caller and callee do not agree on how many arguments there are. In the C case, so long as you don't actually access arguments past the last one, nothing bad happens. In Pascal, the wrong number of arguments is popped from the stack, and the entire stack is corrupted.
The original Windows 3.1 ABI was based on Pascal. As such, it used the Pascal ABI (arguments in left to right order, callee pops). Since any mismatch in argument number might lead to stack corruption, a mangling scheme was formed. Each function name was mangled with a number indicating the size, in bytes, of its arguments. So, on 16 bit machine, the following function (C syntax):
int function(int a)
Was mangled to function#2, because int is two bytes wide. This was done so that if the declaration and definition mismatch, the linker will fail to find the function rather than corrupt the stack at run time. Conversely, if the program links, then you can be sure the correct number of bytes is popped from the stack at the end of the call.
32 bit Windows and onward use the stdcall ABI instead. It is similar to the Pascal ABI, except push order is like in C, from right to left. Like the Pascal ABI, the name mangling mangles the arguments byte size into the function name to avoid stack corruption.
Unlike claims made elsewhere here, the C ABI does not mangle the function names, even on Visual Studio. Conversely, mangling functions decorated with the stdcall ABI specification isn't unique to VS. GCC also supports this ABI, even when compiling for Linux. This is used extensively by Wine, that uses it's own loader to allow run time linking of Linux compiled binaries to Windows compiled DLLs.
C++ compilers use name mangling in order to allow for unique symbol names for overloaded functions whose signature would otherwise be the same. It basically encodes the types of arguments as well, which allows for polymorphism on a function-based level.
C does not require this since it does not allow for the overloading of functions.
Note that name mangling is one (but certainly not the only!) reason that one cannot rely on a 'C++ ABI'.
C++ wants to be able to interop with C code that links against it, or that it links against.
C expects non-name-mangled function names.
If C++ mangled it, it would not find the exported non-mangled functions from C, or C would not find the functions C++ exported. The C linker must get the name it itself expects, because it does not know it is coming from or going to C++.
Mangling the names of C functions and variables would allow their types to be checked at link time. Currently, all (?) C implementations allow you to define a variable in one file and call it as a function in another. Or you can declare a function with a wrong signature (e.g. void fopen(double) and then call it.
I proposed a scheme for the type-safe linkage of C variables and functions through the use of mangling back in 1991. The scheme was never adopted, because, as other have noted here, this would destroy backward compatibility.

Why can I call a function in C without declaring it but not in C++?

In C++, it is a compiler error to call a function before it is declared. But in C, it may compile.
#include<stdio.h>
int main()
{
foo(); // foo() is called before its declaration/definition
}
int foo()
{
printf("Hello");
return 0;
}
I have tried and know that it is correct but I can't get the reason behind it. Can anyone please explain how the compilation process actually takes place and differs in both the languages.
The fact that the code "compiles" as a c program doesn't mean you can do it. The compiler should warn about implicit declaration of the function foo().
In this particular case implicit declaration would declare an identical foo() and nothing bad will happen.
But suppose the following, say this is
main.c
/* Don't include any header, why would you include it if you don't
need prototypes? */
int main(void)
{
printf("%d\n", foo()); // Use "%d" because the compiler will
// implicitly declare `foo()` as
//
// int foo()
//
// Using the "correct" specifier, would
// invoke undefined behavior "too".
return 0;
}
Now suppose foo() was defined in a different compilation unit1 foo.c as
foo.c
double foo()
{
return 3.5;
}
does it work as expected?
You could imagine what would happen if you use malloc() without including stdio.h, which is pretty much the same situation I try to explain above.
So doing this will invoke undefined behavior2, thus the term "Works" is not applicable in the understandable sense in this situation.
The reason this could compile is because in the very old days it was allowed by the c standard, namely the c89 standard.
The c++ standard has never allowed this so you can't compile a c++ program if you call a function that has no prototype ("declaration") in the code before it's called.
Modern c compilers warn about this because of the potential for undefined behavior that can easily occur, and since it's not that hard to forget to add a prototype or to include the appropriate header it's better for the programmer if the compiler can warn about this instead of suddenly having a very unexplicable bug.
1It can't be compiled in the same file because it would be defined with a different return type, since it was already implicitly declared
2Starting with the fact that double and int are different types, there will be undefined behavior because of this.
When C was developed, the function name was all you needed to be able to call it. Matching arguments to function parameters was strictly the business of the programmer; the compiler didn't care if you passed three floats to something that needed just an integer.
However, that turned out to be rather error prone, so later iterations of the C language added function prototypes as a (still optional) additional restriction. In C++ these restrictions have been tightened further: now a function prototype is always mandatory.
We can speculate on why, but in part this is because in C++ it is no longer enough to simply know the function name. There can be multiple functions with the same name, but with different arguments, and the compiler must figure out which one to call. It also has to figure out how to call (direct or virtual?), and it may even have to generate code in case of a template function.
In light of all that I think it makes sense to have the language demand that the function prototype be known at the point where the function is called.
Originally, C had no function prototypes, and C++ did not exist.
If you said
extern double atof();
this said that atof was a function returning double. (Nothing was said about its arguments.)
If you then said
double d = atof("1.3");
it would work. If you said
double d2 = atof(); /* whoops, forgot the argument to atof() */
the compiler would not complain, but something weird would happen if you tried to run it.
In those days, if you wanted to catch errors related to calling functions with the wrong number or type of arguments, that was the job of a separate program, lint, not the C compiler.
Also in those days, if you just called a function out of the blue that the compiler had never heard of before, like this:
int i = atoi("42");
the compiler basically pretended that earlier you had said
extern int atoi();
This was what was called an implicit function declaration. Whenever the compiler saw a call to a function whose name it didn't know, the compiler assumed it was a function returning int.
Fast forward a few years. C++ invents the function prototypes that we know today. Among other things, they let you declare the number and types of the arguments expected by a function, not just its return type.
Fast forward a few more years, C adopts function prototypes, optionally. You can use them if you want, but if you don't, the compiler will still do an implicit declaration on any unknown function call it sees.
Fast forward a few more years, to C11. Now implicit int is finally gone. A compiler is required to complain if you call a function without declaring it first.
But even today, you may be using a pre-C11 compiler that's still happy with implicit int. And a C11-complaint compiler may choose to issue a mere warning (not a compilation-killing error) if you forget to declare a function before calling it. And a C11-compliant compiler may offer an option to turn off those warnings, and quietly accept implicit ints. (For example, when I use the very modern clang, I arrange to invoke it with -Wno-implicit-int, meaning that I don't want warnings about implicit int, because I've still got lots of old, working code that I don't feel like rewriting.)
Why can I call a function in C without declaring it?
Because in C, but not in C++, a function without a prototype is assumed to return an int. This is an implicit declaration of that function. If that assumption turns out to be true (the function is declared later on with return-type of int), then the program compiles just fine.
If that assumption turns out to be false (it was assumed to return an int, but then actually is found to return a double, for example) then you get a compiler error that two functions cannot have the same name. (eg. int foo() and double foo() cannot both exist in the same program)
Note that all of this is C only.In C++, implicit declarations are not allowed. Even if they were, the error message would be different because C++ has function overloading. The error would say that overloads of a function cannot differ only by return type. (overloading happens in the parameter list, not the return-type)

Why do function prototypes include parameter names when they're not required?

I always thought that a function prototype must contain the parameters of the function and their names. However, I just tried this out:
int add(int,int);
int main()
{
std::cout << add(3,1) << std::endl;
}
int add(int x, int y)
{
return x + y;
}
And it worked! I even tried compiling with extreme over-caution:
g++ -W -Wall -Werror -pedantic test.cpp
And it still worked. So my question is, if you don't need parameter names in function prototypes, why is it so common to do so? Is there any purpose to this? Does it have something to do with the signature of the function?
No, these are not necessary, and are mostly ignored by the compiler. You can even give them different names in different declarations; the following is entirely legal:
int foo(int bar);
int foo(int biz);
int foo(int qux) {
...
}
(The compiler does check that each name is used only once in the same argument list: int foo(int bar, int bar); is rejected.)
The reason to put them in is documentation:
If someone reads your header file, they can tell at a glance what each parameter is used for.
If you use a fancy IDE, it can show you the parameter names when you begin typing the function call.
Documentation tools like Doxygen can parse the parameter names and show them in the documentation.
Parameter names are completely optional, and have no effect on compilation. They may be placed there for better readability of code.
You don't need parameter names in declarations. They are purely documentation.
You don't even need names in definitions:
int f(int)
{
return 0;
}
compiles just fine in C++ (though not in C). This is sometimes useful for e.g. inheritance, overloading, function pointers.
You do not need to include parameter names in function prototypes. You only need the complete signature, which includes the types of parameters and (in the case of function template specializations) return values.
However, it is still common to include parameter names in order to write self-documenting code. That is, a function with this declaration:
void foo(int number_of_foos_desired, string foo_replacement);
is easier to understand by looking only at the prototype, perhaps, than this one:
void foo(int, string);
Many modern IDEs will also pop up the parameter names as you type when you write code that calls this function. They may not be able to pop up this information if you dont include the parameter names in the prototype.
It has alot to do with the signature of the function.
One of the benefits of using .h files is that when someone comes along and wants to get a sense of what your program/api does, they can look at your header file and get a sense of what operations are being carried out, their inputs and outputs, how everything is going together, etc.
If you were to come across a method like
int doStuff(int,int)
this would be alot less telling than a method with a signature of say:
int doStuff(int firstNumberToAdd, int secondNumberToAdd);
with the second, you at least get some idea of the operations that are being carried out, and what is happening. This is the idea behind writing self documenting code.
If your interested, you may check out Code Complete by Steve McConnell.
There are different scenarios to it. I found it very helpful in dealing with inheritance and virtual functions. If you're using a virtual function that generates unused warning in sub class, you can omit the variable names.

calling qsort with a pointer to a c++ function

I have found a book that states that if you want to use a function from the C standard library which takes a function pointer as an argument (for example qsort), the function of which you want to pass the function pointer needs to be a C function and therefore declared as extern "C".
e.g.
extern "C" {
int foo(void const* a, void const* b) {...}
}
...
qsort(some_array, some_num, some_size, &foo);
I would not be surprised if this is just wrong information, however - I'm not sure, so: is this correct?
A lot depends on whether you're interested in a practical answer for the compiler you're using right now, or whether you care about a theoretical answer that covers all possible conforming implementations of C++. In theory it's necessary. In reality, you can usually get by without it.
The real question is whether your compiler uses a different calling convention for calling a global C++ function than when calling a C function. Most compilers use the same calling convention either way, so the call will work without the extern "C" declaration.
The standard doesn't guarantee that though, so in theory there could be a compiler that used different calling conventions for the two. At least offhand, I don't know of a compiler like that, but given the number of compilers around, it wouldn't surprise me terribly if there was one that I don't know about.
OTOH, it does raise another question: if you're using C++, why are you using qsort at all? In C++, std::sort is almost always preferable -- easier to use and usually faster as well.
This is incorrect information.
extern C is needed when you need to link a C++ library into a C binary; it allows the C linker to find the function names. This is not an issue with function pointers (as the function is not referenced by name in the C code).

Is there any way to replace a function in a library?

I work with a library which defines its internal division operator for a scripting language. Unfortunately it does not zero-check the divisor. Which leads to lot of headaches. I know the signature of the operator.
double ScriptClass::Divide(double&, double&);
Sadly it isn't even a C function. Is there any way I could make my application use my own Divide function instead of ScriptClass::Divide function?
EDIT:
I was aware of dlopen(NULL,..) and replacing "C" functions with user defined ones. Can this be done for class member functions (Without resorting to using mangled names)?
Various linkers and dynamic linker implementations will provide something that looks like a solution to this, as others have mentioned.
However, if you redefine one C++ function using any of those features (GNU ld's --wrap, ld.so's LD_PRELOAD, etc.), you are violating the one-definition rule and are thus invoking undefined behaviour.
While compiling your library, the compiler is allowed to inline the function in question in any way that it sees fit, which means that your redefinition of the function might not be invoked in all cases.
Consider the following code:
class A
{
public:
void foo();
void bar();
};
void A::foo()
{
std::cout << "Old version.\n";
}
void A::bar()
{
foo();
}
GCC 4.5, when invoked with -O3, will actually decide to inline the definition of foo() into bar(). If you somehow made your linker replace this definition of A::foo() with a definition of your own, A::bar() would still output the string "Old version.\n".
So, in a word: don't.
Generally speaking it's up to the programmer, not the underlying divide operator to prevent division by zero. If you're dividing by zero a lot that seems to indicate a possible flaw in the algorithm being used. Consider reworking the algorithm, or if that's not an option, guard calls to divide with a zero check. You could even do that inside a protected_divide type function.
All that being said, assuming that since it looks like a C++ function you have a C++ library compiled with all the same options you're using to build your application so name mangling matches you might be able to redefine the function into a .so and use LD_PRELOAD to force it to load. If you link statically, I think you can create the function into your own .o file and linking that prior to the library itself will cause the linker to pick up your version.
LD_PRELOAD is your friend. As an example, see:
https://web.archive.org/web/20090130063728/http://ibm.com/developerworks/linux/library/l-glibc.html
There's no getting away from the mangled names, I don't think, but you can use ld's --wrap option to cause a particular function to be given a new name based on its old name. You can then write a new version of it, and forward to the old version too if you like.
Quick overview here:
http://linux.die.net/man/1/ld
I've used this in the past to hook into malloc (etc.) without having to recompile the runtime library, though this wasn't on Linux (it was an embedded thing with no runtime loading). I didn't use it to wrap C++ functions, but if you can handle the C++ calling convention somehow, and you can create a function with the original function's mangled name, and get the compiler to accept a call to a function that has some ugly name with funny chars in it... I don't see why it shouldn't be possible to make it work.
Just short Q,
Cant you just wrap the class with your own code?
It'll be some headache at the start but after than you can simplify a lot of functions.
(Or even just wrap the function with a macro)