calling c code from c++ - c++

how does a call from c++ to c work internally??

The C++ compiler 'does the right thing' and uses the correct calling convention for the C function - a lame sounding answer but I don't know that there's much more that can be said!

It is a heavy duty implementation detail. But most C++ compilers I know don't try to do anything special to differentiate a C function from a non-instance C++ function. Just the plain olden cdecl calling convention for both.
Kinda important because the CRT implementation, with functions like printf(), are just as usable by a C compiler as they are by the C++ compiler from the same vendor. Nobody wants to maintain two versions of it.

Related

Why can't C functions be name-mangled?

I had an interview recently and one question asked was what is the use of extern "C" in C++ code. I replied that it is to use C functions in C++ code as C doesn't use name-mangling. I was asked why C doesn't use name-mangling and to be honest I couldn't answer.
I understand that when the C++ compiler compiles functions, it gives a special name to the function mainly because we can have overloaded functions of the same name in C++ which must be resolved at compile time. In C, the name of the function will stay the same, or maybe with an _ before it.
My query is: what's wrong with allowing the C++ compiler to mangle C functions also? I would have assumed that it doesn't matter what names the compiler gives to them. We call functions in the same way in C and C++.
It was sort of answered above, but I'll try to put things into context.
First, C came first. As such, what C does is, sort of, the "default". It does not mangle names because it just doesn't. A function name is a function name. A global is a global, and so on.
Then C++ came along. C++ wanted to be able to use the same linker as C, and to be able to link with code written in C. But C++ could not leave the C "mangling" (or, lack there of) as is. Check out the following example:
int function(int a);
int function();
In C++, these are distinct functions, with distinct bodies. If none of them are mangled, both will be called "function" (or "_function"), and the linker will complain about the redefinition of a symbol. C++ solution was to mangle the argument types into the function name. So, one is called _function_int and the other is called _function_void (not actual mangling scheme) and the collision is avoided.
Now we're left with a problem. If int function(int a) was defined in a C module, and we're merely taking its header (i.e. declaration) in C++ code and using it, the compiler will generate an instruction to the linker to import _function_int. When the function was defined, in the C module, it was not called that. It was called _function. This will cause a linker error.
To avoid that error, during the declaration of the function, we tell the compiler it is a function designed to be linked with, or compiled by, a C compiler:
extern "C" int function(int a);
The C++ compiler now knows to import _function rather than _function_int, and all is well.
It's not that they "can't", they aren't, in general.
If you want to call a function in a C library called foo(int x, const char *y), it's no good letting your C++ compiler mangle that into foo_I_cCP() (or whatever, just made up a mangling scheme on the spot here) just because it can.
That name won't resolve, the function is in C and its name does not depend on its list of argument types. So the C++ compiler has to know this, and mark that function as being C to avoid doing the mangling.
Remember that said C function might be in a library whose source code you don't have, all you have is the pre-compiled binary and the header. So your C++ compiler can't do "it's own thing", it can't change what's in the library after all.
what's wrong with allowing the C++ compiler to mangle C functions also?
They wouldn't be C functions any more.
A function is not just a signature and a definition; how a function works is largely determined by factors like the calling convention. The "Application Binary Interface" specified for use on your platform describes how systems talk to each other. The C++ ABI in use by your system specifies a name mangling scheme, so that programs on that system know how to invoke functions in libraries and so forth. (Read the C++ Itanium ABI for a great example. You'll very quickly see why it's necessary.)
The same applies for the C ABI on your system. Some C ABIs do actually have a name mangling scheme (e.g. Visual Studio), so this is less about "turning off name mangling" and more about switching from the C++ ABI to the C ABI, for certain functions. We mark C functions as being C functions, to which the C ABI (rather than the C++ ABI) is pertinent. The declaration must match the definition (be it in the same project or in some third-party library), otherwise the declaration is pointless. Without that, your system simply won't know how to locate/invoke those functions.
As for why platforms don't define C and C++ ABIs to be the same and get rid of this "problem", that's partially historical — the original C ABIs weren't sufficient for C++, which has namespaces, classes and operator overloading, all of which need to somehow be represented in a symbol's name in a computer-friendly manner — but one might also argue that making C programs now abide by the C++ is unfair on the C community, which would have to put up with a massively more complicated ABI just for the sake of some other people who want interoperability.
MSVC in fact does mangle C names, although in a simple fashion. It sometimes appends #4 or another small number. This relates to calling conventions and the need for stack cleanup.
So the premise is just flawed.
It's very common to have programs which are partially written in C and partially written in some other language (often assembly language, but sometimes Pascal, FORTRAN, or something else). It's also common to have programs contain different components written by different people who may not have the source code for everything.
On most platforms, there is a specification--often called an ABI [Application Binary Interface] which describes what a compiler must do to produce a function with a particular name which accepts arguments of some particular types and returns a value of some particular type. In some cases, an ABI may define more than one "calling convention"; compilers for such systems often provide a means of indicating which calling convention should be used for a particular function. For example, on the Macintosh, most Toolbox routines use the Pascal calling convention, so the prototype for something like "LineTo" would be something like:
/* Note that there are no underscores before the "pascal" keyword because
the Toolbox was written in the early 1980s, before the Standard and its
underscore convention were published */
pascal void LineTo(short x, short y);
If all of the code in a project was compiled using the same compiler, it
wouldn't matter what name the compiler exported for each function, but in
many situations it will be necessary for C code to call functions that were
compiled using other tools and cannot be recompiled with the present compiler
[and may very well not even be in C]. Being able to define the linker name
is thus critical to the use of such functions.
I'll add one other answer, to address some of the tangential discussions that took place.
The C ABI (application binary interface) originally called for passing arguments on the stack in reverse order (i.e. - pushed from right to left), where the caller also frees the stack storage. Modern ABI actually uses registers for passing arguments, but many of the mangling considerations go back to that original stack argument passing.
The original Pascal ABI, in contrast, pushed the arguments from left to right, and the callee had to pop the arguments. The original C ABI is superior to the original Pascal ABI in two important points. The argument push order means that the stack offset of the first argument is always known, allowing functions that have an unknown number of arguments, where the early arguments control how many other arguments there are (ala printf).
The second way in which the C ABI is superior is the behavior in case the caller and callee do not agree on how many arguments there are. In the C case, so long as you don't actually access arguments past the last one, nothing bad happens. In Pascal, the wrong number of arguments is popped from the stack, and the entire stack is corrupted.
The original Windows 3.1 ABI was based on Pascal. As such, it used the Pascal ABI (arguments in left to right order, callee pops). Since any mismatch in argument number might lead to stack corruption, a mangling scheme was formed. Each function name was mangled with a number indicating the size, in bytes, of its arguments. So, on 16 bit machine, the following function (C syntax):
int function(int a)
Was mangled to function#2, because int is two bytes wide. This was done so that if the declaration and definition mismatch, the linker will fail to find the function rather than corrupt the stack at run time. Conversely, if the program links, then you can be sure the correct number of bytes is popped from the stack at the end of the call.
32 bit Windows and onward use the stdcall ABI instead. It is similar to the Pascal ABI, except push order is like in C, from right to left. Like the Pascal ABI, the name mangling mangles the arguments byte size into the function name to avoid stack corruption.
Unlike claims made elsewhere here, the C ABI does not mangle the function names, even on Visual Studio. Conversely, mangling functions decorated with the stdcall ABI specification isn't unique to VS. GCC also supports this ABI, even when compiling for Linux. This is used extensively by Wine, that uses it's own loader to allow run time linking of Linux compiled binaries to Windows compiled DLLs.
C++ compilers use name mangling in order to allow for unique symbol names for overloaded functions whose signature would otherwise be the same. It basically encodes the types of arguments as well, which allows for polymorphism on a function-based level.
C does not require this since it does not allow for the overloading of functions.
Note that name mangling is one (but certainly not the only!) reason that one cannot rely on a 'C++ ABI'.
C++ wants to be able to interop with C code that links against it, or that it links against.
C expects non-name-mangled function names.
If C++ mangled it, it would not find the exported non-mangled functions from C, or C would not find the functions C++ exported. The C linker must get the name it itself expects, because it does not know it is coming from or going to C++.
Mangling the names of C functions and variables would allow their types to be checked at link time. Currently, all (?) C implementations allow you to define a variable in one file and call it as a function in another. Or you can declare a function with a wrong signature (e.g. void fopen(double) and then call it.
I proposed a scheme for the type-safe linkage of C variables and functions through the use of mangling back in 1991. The scheme was never adopted, because, as other have noted here, this would destroy backward compatibility.

C vs C++ function questions

I am learning C, and after starting out learning C++ as my first compiled language, I decided to "go back to basics" and learn C.
There are two questions that I have concerning the ways each language deals with functions.
Firstly, why does C "not care" about the scope that functions are defined in, whereas C++ does?
For example,
int main()
{
donothing();
return 0;
}
void donothing() { }
the above will not compile in a C++ compiler, whereas it will compile in a C compiler. Why is this? Isn't C++ mostly just an extension on C, and should be mostly "backward compatible"?
Secondly, the book that I found (Link to pdf) does not seem to state a return type for the main function. I check around and found other books and websites and these also commonly do not specify return types for the main function. If I try to compile a program that does not specify a return type for main, it compiles fine (although with some warnings) in a C compiler, but it doesn't compile in a C++ compiler. Again, why is that? Is it better style to always specify the return type as an integer rather than leaving it out?
Thanks for any help, and just as a side note, if anyone can suggest a better book that I should buy that would be great!
Firstly, why does C "not care" about the scope that functions are defined in, whereas C++ does?
Actually, C does care. It’s just that C89 allows implicitly declared functions and infers its return type as int and its parameters from usage. C99 no longer allows this.
So in your example it’s as if you had declared a prototype as
int dosomething();
The same goes for implicit return types: missing return types are inferred as int in C89 but not C99. Compiling your code with gcc -std=c99 -pedantic-errors yields something similar to the following:
main.c: In function 'main':
main.c:2:5: error: implicit declaration of function 'donothing' [-Wimplicit-function-declaration]
main.c: At top level:
main.c:5:6: error: conflicting types for 'donothing'
main.c:2:5: note: previous implicit declaration of 'donothing' was her
For the record, here’s the code I’ve used:
int main() {
donothing();
return 0;
}
void donothing() { }
It's because C++ supports optional parameters. When C++ sees donothing(); it can't tell if donothing is:
void donothing(void);
or
void donothing(int j = 0);
It has to pass different parameters in these two cases. It's also because C++ is more strongly typed than C.
int main() {
donothing();
return 0;
}
void donothing() { }
Nice minimum working example.
With gcc 4.2.1, the above code gets a warning regarding the conflicting types for void donothing() with default compiler settings. That's what the C89 standard says to do with this kind of problem. With clang, the above code fails on void donothing(). The C99 standard is a bit stricter.
It's a good idea to compile your C++ code with warnings enabled and set to a high threshold. This becomes even more important in C. Compile with warnings enabled and treat implicit function declarations as an error.
Another difference between C and C++: In C++ there is no difference between the declarations void donothing(void); and void donothing(); There is a huge difference between these two in C. The first is a function that takes no parameters. The latter is a function with an unspecified calling sequence.
Never use donothing() to specify a function that takes no arguments. The compiler has no choice but to accept donothing(1,2,3) with this form. It knows to reject donothing(1,2,3) when the function is declared as void donothing(void).
he above will not compile in a C++ compiler, whereas it will compile in a C compiler. Why is this?
Because C++ requires a declaration (or definition) of the function to be in scope at the point of the call.
Isn't C++ mostly just an extension on C
Not exactly. It was originally based on a set of C extensions, and it refers to the C standard (with a few modifications) for the definitions of the contents of standard headers from C. The C++ "language itself" is similar to C but is not an extension of it.
and should be mostly "backward compatible"?
Emphasis on "mostly". Most C features are available in C++, and a lot of the ones removed were to make C++ a more strictly typed language than C. But there's no particular expectation that C code will compile as C++. Even when it does, it doesn't always have the same meaning.
I check around and found other books and websites and these also commonly do not specify return types for the main function
The C and C++ standards have always said that main returns int.
In C89, if you omit the return type of a function it is assumed to be int. C++ and C99 both lack this implicit int return type, but a lot of C tutorial books and tutorials (and compilers and code) still use the C89 standard.
C has some allowances for implementations to accept other return types, but not for portable programs to demand them. Both languages have a concept of a "freestanding implementation", which can define program entry and exit any way it likes -- again, because this is specific to an implementation it's not suitable for general teaching of C.
IMO, even if you're going to use a C89 compiler it's worth writing your code to also be valid C99 (especially if you have a C99 compiler available to check it). The features removed in C99 were considered harmful in some way. It's not worth even trying to write code that's both C and C++, except in header files intended for inter-operation between the languages.
I decided to "go back to basics" and learn C.
You shouldn't think of C as a prerequisite or "basic form" of C++, because it isn't. It is a simpler language, though, with fewer features for higher-level programming. This is often cited as an advantage of C by users of C. And an advantage of C++ by users of C++. Sometimes those users are the same people using the languages for different purposes.
Typical coding style in C is different from typical coding style in C++, and so you might well learn certain basics more readily in C than in C++. It is possible to learn low-level programming using C++, and the code you write when you do so may or may not end up looking a lot like C code.
So, what you learn while learning C may or may not inform the way you write C++. If it does, that may or may not be for the better.
C++ has changed these rules on purpose, to make C++ a more typesafe language.
C.1.4 Clause 5: expressions [diff.expr]
5.2.2
Change: Implicit declaration of functions is not allowed
Rationale: The type-safe nature of C++.
Effect on original feature: Deletion of semantically well-defined feature. Note: the original feature was
labeled as “obsolescent” in ISO C.
Difficulty of converting: Syntactic transformation. Facilities for producing explicit function declarations
are fairly widespread commercially.
How widely used: Common.
You can find other similar changes in appendix C of this Draft C++ standard
Isn't C++ mostly just an extension on C
No. If you think of C++ as "C with Classes", you're doing it very, very wrong. Whilst strictly, most valid C is valid C++, there's virtually no good C that's good C++. The reality is that good C++ code is vastly different to what you'd see as good C code.
Firstly, why does C "not care" about the scope that functions are
defined in, whereas C++ does?
Essentially, because not enforcing the same rules as C++ makes doing this in C hideously unsafe and in fact, nobody sane should ever do that. C99 tightened this up, along with implicit-int and other defects in the C language.

different programming language and calling convention

According to the wiki:
Different programming languages use different calling conventions, and
so can different platforms (CPU architecture + operating system). This
can sometimes cause problems when combining modules written in
multiple languages
So am I supposed to be careful when I call C/C++ functions (exported from .so/.dll) in Python?
If yes, what should I be careful of?
Calling between Python and C is a solved problem, so you generally won't have to worry about anything -- not least because Python is written in C.
The problem described there is more an issue when multiple languages on a platform are all developed independently, individually bootstrapped from assembler. For example, there used to famously be problems calling between C and FORTRAN, and between C and Pascal, at a time when all three of those languages coexisted as rough equals. The old Mac Toolbox was written mostly in assembler using Pascal calling conventions, and early application developers used Borland Pascal. But then C compilers like Symatec's THINK C appeared, and those programmers had to specifically worry about how to translate argument types and string conventions (Pascal strings carry a length byte at the beginning, and of course C strings have a 0 at the end.)
The calling convention is not an issue when calling C from python because the python interpreter itself is written in C, and thus uses the same calling convention as the C code.
It becomes an issue when combining code written in different compiled languages, for example C and Ada.
Usually C is the system programming language, and thus the standard. This means that a compiler for any language will generally have special support to inter-operate with C. See here for an example of interoperability between C and Ada. If there is no special support wrappers must be written at the assembly level.
If you need to call C++/Ada from python you'll have to follow a two steps process. The first step is to write a C wrapper around the C++/Ada functions. The second step is to call the C wrapper from python.
Take for example the following C++ class.
class Foo
{
public:
Foo ();
int bar ();
...
};
If you want to make it available to python, you first need to wrap it in C:
extern "C" {
typedef void *FooPtr;
FooPtr foo_new () { return (FooPtr)new Foo(); }
int foo_bar (FooPtr foo) { return ((Foo*)foo)->bar(); }
...
}
Then you can call that from python. (Note, in real life there are tools to automate this, like Boost.Python).
Note that there are two aspects of writing a wrapper: converting between calling conventions and converting between type representations. The later part is usually the most difficult because as soon as you go beyond basic types languages tend to have very different representations of equivalent types.
C and C++ have different calling conventions. C doesn't support calling C++ functions but C++ support calling C functions and supports the implementation of C functions in C++. That is, you can declare functions as extern "C" in C++. I don't know how the python bindings are calling C++ but you should be careful to call C and C++ functions with their respective calling conventions. Generally, C calling conventions are used when calling between different languages. This may mean that you might need to write a wrapper function using C style conventions for C++ style functions.
The calling conventions refers to how functions calls are generated in the resulting assembly. One convention is for the caller to be responsible for creating and cleaning up the stack space for the local variables. The other convention is for the called function to create and clean up the stack space for the local variables. It doesn't matter which one you use, but obviously they have to match or the stack will get corrupted.
You can see more details here: http://en.wikipedia.org/wiki/Calling_convention

calling qsort with a pointer to a c++ function

I have found a book that states that if you want to use a function from the C standard library which takes a function pointer as an argument (for example qsort), the function of which you want to pass the function pointer needs to be a C function and therefore declared as extern "C".
e.g.
extern "C" {
int foo(void const* a, void const* b) {...}
}
...
qsort(some_array, some_num, some_size, &foo);
I would not be surprised if this is just wrong information, however - I'm not sure, so: is this correct?
A lot depends on whether you're interested in a practical answer for the compiler you're using right now, or whether you care about a theoretical answer that covers all possible conforming implementations of C++. In theory it's necessary. In reality, you can usually get by without it.
The real question is whether your compiler uses a different calling convention for calling a global C++ function than when calling a C function. Most compilers use the same calling convention either way, so the call will work without the extern "C" declaration.
The standard doesn't guarantee that though, so in theory there could be a compiler that used different calling conventions for the two. At least offhand, I don't know of a compiler like that, but given the number of compilers around, it wouldn't surprise me terribly if there was one that I don't know about.
OTOH, it does raise another question: if you're using C++, why are you using qsort at all? In C++, std::sort is almost always preferable -- easier to use and usually faster as well.
This is incorrect information.
extern C is needed when you need to link a C++ library into a C binary; it allows the C linker to find the function names. This is not an issue with function pointers (as the function is not referenced by name in the C code).

How does this function definition work?

I generated a hash function with gperf couple of days ago. What I saw for the hash function was alien to me. It was something like this (I don't remember the exact syntax) :
unsigned int
hash(str, size)
register char* str;
register unsigned int size;
{
//Definition
}
Now, when I tried to compile with a C++ compiler (g++) it threw errors at me for not having str and size declared. But this compiled on the C compiler (gcc). So, questions:
I thought C++ was a superset of C. If its so, this should compile with a C++ compiler as well right?
How does the C compiler understand the definition? str and size are undeclared when they first appear.
What is the purpose of declaring str and size after function signature but before function body rather than following the normal approach of doing it in either of the two places?
How do I get this function to compile on g++ so I can use it in my C++ code? Or should I try generating C++ code from gperf? Is that possible?
1. C++ is not a superset, although this is not standard C either.
2/3. This is a K&R function declaration. See What are the major differences between ANSI C and K&R C?
.
4. gperf does in fact have an option, -L, to specify the language. You can just use -L C++ to use C++.
The Old C syntax for the declaration of a function's formal arguments is still supported by some compilers.
For example
int func (x)
int x
{
}
is old style (K&R style) syntax for defining a function.
I thought C++ was a superset of C. If its so, this should compile with a C++ compiler as well right?
Nopes! C++ is not a superset of C. This style(syntax) of function declaration/definition was once a part of C but has never been a part of C++. So it shouldn't compile with a C++ compiler.
This appears to be "old-school" C code. Declaring the types of the parameters outside of the parentheses but before the open curl-brace of the code block is a relic of the early days of C programming (I'm not sure why but I guess it has something to do with variable management on the stack and/or compiler design).
To answer your questions:
Calling C++ a "superset" of C is somewhat a misnomer. While they share basic syntax features, and you can even make all sorts of C library calls from C++, they have striking differences with respect to type safety, warnings vs. errors (C is more permissible), and compiler/preprocessor options.
Most contemporary C compilers understand legacy code (such as this appears to be). The C compiler holds the function parameter names sort of like "placeholders" until their type can be declared immediately following the function header name.
No real "purpose" other than again, this appears to be ancient code, and the style back in the day was like this. The "normal" approach is IMO the better, more intuitive way.
My suggestion:
unsigned int hash(register char *str, register unsigned int size)
{
// Definition
}
A word of advice: Consider abandoning the register keyword - this was used in old C programs as a way of specifying that the variable would be stored in a memory register (for speed/efficiency), but nowadays compilers are better at optimizing away this need. I believe that modern compilers ignore it. Also, you cannot use the & (address of) operator in C/C++ on a register variable.