Is the C++ calling convention constrained by the standard, since the return type of a function does not need to be defined when the fn is declared? - c++

While studying the One Definition Rule in Wikipedia, I became stuck on the following example in the Examples section:
struct S; // declaration of S
...
S f(); // ok, no definition required
...
I know that space on the stack needs to be allotted for the return value, but seeing this example made me think that C++ calling conventions might dictate that stack management for the return value is handled by the code block in which the function is defined, rather than the code block in which it is called. So I investigated "C vs. C++ calling convention" (recalling that the issue of stack return value allocation might be a primary difference), and came across this answer, which indicates that "calling convention" is not defined by the standard.
However, given the apparent requirement that the above code snippet is valid, it seems to me that there must be some constraints on calling convention in order to support the above code snippet.
Am I right? Does the C++ standard implicitly require that stack management for the return value of a function be handled by the code that defines the function, in order to support the syntax above?

As mentioned in the comments
As you have written your example, Both Struct S and function f are forward declarations. The Compiler Will indeed complain if you attempt to use either
** EDIT as noted by Steven Sudit, function f is not a forward declaration but a function prototype**
and
Also, I believe that default calling convention ( and optional calling conventions ) are explicitly implementation dependent with the exception of those with external linkage. If you search the c++ standard for "calling convention". It is mentioned only once in section 7.5 Linkage Specifications
As to your specific question
Am I right? Does the C++ standard implicitly require that stack management for the return value of a function be handled by the code that defines the function, in order to support the syntax above?
Definitely not, as many compilers support calling conventions where the values are not even passed/returned on the stack (FASTCALL) or microsofts version of (thiscall) where the caller cleans the stack.

The C/C++ standard does not define calling conventions. That is the job of compiler vendors to implement on their own, as evident by the fact that calling convention keywords start with underscores indicating they are vendor-provided extensions.
The C/C++ standard defines the base rules (how to assign values to parameters and return values, pass by-value vs by-reference, etc), but the calling conventions dictate how to accomplish those rules in different ways (passing parameters via stack or registers, in which order, which registers, who cleans up the stack, etc).
In the casev of x86, vendors have agreed on the semantics of the __cdecl and __stdcall calling conventions for interoperability (although there are some slight variations in __cdecl implementations in some cases), but other calling conventions are vendor-specific (Microsoft's __fastcall/__thiscall, Borland's __fastcall/__safecall/__msfastcall, etc).
In the case of x64, there is only one calling convention, dictated by x64 itself. Calling convention keywords are silently ignored by x64 compiler so existing code will still compile and work correctly (as long as it is not using inline assembly to access/manipulate the call stack directly).

Related

Why can't C functions be name-mangled?

I had an interview recently and one question asked was what is the use of extern "C" in C++ code. I replied that it is to use C functions in C++ code as C doesn't use name-mangling. I was asked why C doesn't use name-mangling and to be honest I couldn't answer.
I understand that when the C++ compiler compiles functions, it gives a special name to the function mainly because we can have overloaded functions of the same name in C++ which must be resolved at compile time. In C, the name of the function will stay the same, or maybe with an _ before it.
My query is: what's wrong with allowing the C++ compiler to mangle C functions also? I would have assumed that it doesn't matter what names the compiler gives to them. We call functions in the same way in C and C++.
It was sort of answered above, but I'll try to put things into context.
First, C came first. As such, what C does is, sort of, the "default". It does not mangle names because it just doesn't. A function name is a function name. A global is a global, and so on.
Then C++ came along. C++ wanted to be able to use the same linker as C, and to be able to link with code written in C. But C++ could not leave the C "mangling" (or, lack there of) as is. Check out the following example:
int function(int a);
int function();
In C++, these are distinct functions, with distinct bodies. If none of them are mangled, both will be called "function" (or "_function"), and the linker will complain about the redefinition of a symbol. C++ solution was to mangle the argument types into the function name. So, one is called _function_int and the other is called _function_void (not actual mangling scheme) and the collision is avoided.
Now we're left with a problem. If int function(int a) was defined in a C module, and we're merely taking its header (i.e. declaration) in C++ code and using it, the compiler will generate an instruction to the linker to import _function_int. When the function was defined, in the C module, it was not called that. It was called _function. This will cause a linker error.
To avoid that error, during the declaration of the function, we tell the compiler it is a function designed to be linked with, or compiled by, a C compiler:
extern "C" int function(int a);
The C++ compiler now knows to import _function rather than _function_int, and all is well.
It's not that they "can't", they aren't, in general.
If you want to call a function in a C library called foo(int x, const char *y), it's no good letting your C++ compiler mangle that into foo_I_cCP() (or whatever, just made up a mangling scheme on the spot here) just because it can.
That name won't resolve, the function is in C and its name does not depend on its list of argument types. So the C++ compiler has to know this, and mark that function as being C to avoid doing the mangling.
Remember that said C function might be in a library whose source code you don't have, all you have is the pre-compiled binary and the header. So your C++ compiler can't do "it's own thing", it can't change what's in the library after all.
what's wrong with allowing the C++ compiler to mangle C functions also?
They wouldn't be C functions any more.
A function is not just a signature and a definition; how a function works is largely determined by factors like the calling convention. The "Application Binary Interface" specified for use on your platform describes how systems talk to each other. The C++ ABI in use by your system specifies a name mangling scheme, so that programs on that system know how to invoke functions in libraries and so forth. (Read the C++ Itanium ABI for a great example. You'll very quickly see why it's necessary.)
The same applies for the C ABI on your system. Some C ABIs do actually have a name mangling scheme (e.g. Visual Studio), so this is less about "turning off name mangling" and more about switching from the C++ ABI to the C ABI, for certain functions. We mark C functions as being C functions, to which the C ABI (rather than the C++ ABI) is pertinent. The declaration must match the definition (be it in the same project or in some third-party library), otherwise the declaration is pointless. Without that, your system simply won't know how to locate/invoke those functions.
As for why platforms don't define C and C++ ABIs to be the same and get rid of this "problem", that's partially historical — the original C ABIs weren't sufficient for C++, which has namespaces, classes and operator overloading, all of which need to somehow be represented in a symbol's name in a computer-friendly manner — but one might also argue that making C programs now abide by the C++ is unfair on the C community, which would have to put up with a massively more complicated ABI just for the sake of some other people who want interoperability.
MSVC in fact does mangle C names, although in a simple fashion. It sometimes appends #4 or another small number. This relates to calling conventions and the need for stack cleanup.
So the premise is just flawed.
It's very common to have programs which are partially written in C and partially written in some other language (often assembly language, but sometimes Pascal, FORTRAN, or something else). It's also common to have programs contain different components written by different people who may not have the source code for everything.
On most platforms, there is a specification--often called an ABI [Application Binary Interface] which describes what a compiler must do to produce a function with a particular name which accepts arguments of some particular types and returns a value of some particular type. In some cases, an ABI may define more than one "calling convention"; compilers for such systems often provide a means of indicating which calling convention should be used for a particular function. For example, on the Macintosh, most Toolbox routines use the Pascal calling convention, so the prototype for something like "LineTo" would be something like:
/* Note that there are no underscores before the "pascal" keyword because
the Toolbox was written in the early 1980s, before the Standard and its
underscore convention were published */
pascal void LineTo(short x, short y);
If all of the code in a project was compiled using the same compiler, it
wouldn't matter what name the compiler exported for each function, but in
many situations it will be necessary for C code to call functions that were
compiled using other tools and cannot be recompiled with the present compiler
[and may very well not even be in C]. Being able to define the linker name
is thus critical to the use of such functions.
I'll add one other answer, to address some of the tangential discussions that took place.
The C ABI (application binary interface) originally called for passing arguments on the stack in reverse order (i.e. - pushed from right to left), where the caller also frees the stack storage. Modern ABI actually uses registers for passing arguments, but many of the mangling considerations go back to that original stack argument passing.
The original Pascal ABI, in contrast, pushed the arguments from left to right, and the callee had to pop the arguments. The original C ABI is superior to the original Pascal ABI in two important points. The argument push order means that the stack offset of the first argument is always known, allowing functions that have an unknown number of arguments, where the early arguments control how many other arguments there are (ala printf).
The second way in which the C ABI is superior is the behavior in case the caller and callee do not agree on how many arguments there are. In the C case, so long as you don't actually access arguments past the last one, nothing bad happens. In Pascal, the wrong number of arguments is popped from the stack, and the entire stack is corrupted.
The original Windows 3.1 ABI was based on Pascal. As such, it used the Pascal ABI (arguments in left to right order, callee pops). Since any mismatch in argument number might lead to stack corruption, a mangling scheme was formed. Each function name was mangled with a number indicating the size, in bytes, of its arguments. So, on 16 bit machine, the following function (C syntax):
int function(int a)
Was mangled to function#2, because int is two bytes wide. This was done so that if the declaration and definition mismatch, the linker will fail to find the function rather than corrupt the stack at run time. Conversely, if the program links, then you can be sure the correct number of bytes is popped from the stack at the end of the call.
32 bit Windows and onward use the stdcall ABI instead. It is similar to the Pascal ABI, except push order is like in C, from right to left. Like the Pascal ABI, the name mangling mangles the arguments byte size into the function name to avoid stack corruption.
Unlike claims made elsewhere here, the C ABI does not mangle the function names, even on Visual Studio. Conversely, mangling functions decorated with the stdcall ABI specification isn't unique to VS. GCC also supports this ABI, even when compiling for Linux. This is used extensively by Wine, that uses it's own loader to allow run time linking of Linux compiled binaries to Windows compiled DLLs.
C++ compilers use name mangling in order to allow for unique symbol names for overloaded functions whose signature would otherwise be the same. It basically encodes the types of arguments as well, which allows for polymorphism on a function-based level.
C does not require this since it does not allow for the overloading of functions.
Note that name mangling is one (but certainly not the only!) reason that one cannot rely on a 'C++ ABI'.
C++ wants to be able to interop with C code that links against it, or that it links against.
C expects non-name-mangled function names.
If C++ mangled it, it would not find the exported non-mangled functions from C, or C would not find the functions C++ exported. The C linker must get the name it itself expects, because it does not know it is coming from or going to C++.
Mangling the names of C functions and variables would allow their types to be checked at link time. Currently, all (?) C implementations allow you to define a variable in one file and call it as a function in another. Or you can declare a function with a wrong signature (e.g. void fopen(double) and then call it.
I proposed a scheme for the type-safe linkage of C variables and functions through the use of mangling back in 1991. The scheme was never adopted, because, as other have noted here, this would destroy backward compatibility.

What is the state of the registers after a function call?

I have limited knowledge in assembly, but I can at least read through it and match with the corresponding C or C++ code. I can see that the function arguments are passed either by pushing them to the stack or by registers, and the function body uses some registers to do its operations. But it also seems to use the same registers that were used in the caller. Does this mean that the caller has no guarantee that the state of the registers will be the same after a function call? What if the whole body of the function is unknown during compilation? How does the compiler deal with this?
The compiler-generated assembler code follows some calling convention. A calling convention typically specifies
how are arguments passed to the function
how return values are passed from the called function to the caller
which registers should be saved within a function call and which can be modified
If all functions being called follow the same calling convention, no problems with using the same registers should occur.
As the comments allude to, the fact is that there is no standard for this. It is left entirely to the implementors of the particular c++ compiler you are using.
A more explicit question, like this: "when compiling on version N of compiler A with compiler options B, calling a function signature of C, for target CPU D, using ABI E, what are the guarantees vis-a-vis register preservation?"
In which case an expert (or the manual) on that particular toolset can answer.
As you can probably infer, for any kind of industrial-strength project, it's the wrong question to ask, because as your compiler evolves the answer will change, and you don't want that fact to impact the reliability of your program.
It's a good question, because it's nice to know what the compiler is doing under the hood - it aids learning.
But on the whole, the golden rule is to express clear uncomplicated logic to the compiler in your program, and allow the compiler to handle the details of turning that logic into optimised machine code, at which modern compilers are excellent.

The standard C function declaration syntax (WINAPI) [duplicate]

This question already has answers here:
What does "WINAPI" in main function mean?
(4 answers)
Closed 7 years ago.
I know you might think this question already been answered but it is not, or at least it was not very clear to me.
int WINAPI WinMain (){}
This is a pseudo form of the famous winmain function.
My question is about the calling convention WINAPI, in particular its placement between the "return type" and the "function name". Is this Standard C? Because I referenced the Brian W. Kernighan and Dennis M. Ritchie book and I didn't see this form.
I also have searched for its meaning and they said it's a macro to place _stdcall instead. So please don't tell me the question is duplicated.
And here is one of the questions that might be very close to mine
What does "WINAPI" in main function mean?
I want a clear answer for this WINAPI: Is it standard C? So I can place a calling convention after the return type in any function declaration and I then give it to any C compiler in the world? Or is it something will work only on Microsoft compilers? And if so, can anyone impose their rules on the C syntax?
I'm sorry I know my question might be trivial for many of you, but I searched everywhere about the functions declaration syntax and all sources denied this calling convention place.
The essential answer: No. A function declaration, as defined by the C language standard, has no elements between the return type and the function name. So int __bootycall myFunc(int qux) is not standard C (or C++), even though C implementations are allowed to reserve __customIdentifiers for their own exclusive use.
However.
The need for calling-convention specifiers (e.g. __cdecl) is clear; a lot of (especially early non-UNIX [especially MS-DOS]) platforms had more than one calling convention to choose from, and specifying the calling convention of a function was as important as, if not more important than, the parameter list of that function. Hence the need to slot a little something extra in there.
At that time (even before C89), there was no provision made for architecture-specific function attributes (presumably because C, designed for the sole purpose of implementing UNIX utilities, didn't need any). This would later be remedied in C99, and if C99 had existed at that point, it's likely that __cdecl et al. would have been function attributes, not random identifiers shoved in there. But as it was, when the need arose to specify non-default calling conventions, there were four reasonable places to put it: Before the return type, between the return type and the function name, between the function name and the opening parenthesis of the argument list, and after the argument list.
I'm speculating here, but it seems like the second option would have made the most sense. This was pre-C++, remember; there was no post-arglist-const, and the only thing that could show up before the return type was static, which specified linkage rather than anything about the function per se. That left before or after the function name, and separating the function name from its argument list would have reduced readability. That left the slightly unusual position between the return type and the function name as the best of a bad bunch.
The rest is history. Later compilers took advantage of the nascent __attribute__ syntax to put the calling convention keyword in a more appropriate place, but DOS-based compilers (of which Microsoft C was one of the first) shoved it after the return type.
Both the __stdcall and the position of the keyword are Microsoft specific. Any compiler vendor is able to add non-standard syntax to their implementation.
At the very top of this MSDN article:
Microsoft Specific
It also mentions the WINAPI macro at the end of the page:
In the following example, use of __stdcall results in all WINAPI function types being handled as a standard call: [...]`
This form works on both Microsoft C++ Compiler and the MinGW toolchain, which implements GCC for Windows.
But in general GCC uses this other form using it's attributes:
int WinMain() __attribute__((stdcall)) // or WINAPI if using the macro
{}
It's possible however that in the future we have those in a more standard syntax (the stdcall part still being platform specific) by using the recent C++11 generalized attributes such as.
[[ms::stdcall]]
int WinMain() {}
In fact both GCC and Clang already supports the standard generalized attributes as an alternative to the compiler specific attribute syntax.
The answer to your clear question:
WINAPI, Is it a standard C?
is No. __stdcall is a Microsoft extension.
WINAPI is a macro defined in windows.h, which expands to __stdcall. Both windows.h and __stdcall are Windows-specific -- no industry-wide standard defines any aspect of their meaning.
The C and C++ standards do define keywords that have related effects on a function definition: inline, _Noreturn (C2011), and static. All of these keywords are normally placed before the return type, but if I'm reading C2011 correctly, this is not actually required by the syntax: you could perfectly well write
int static foo(void) { return 42; }
These keywords are called function specifiers and storage class specifiers.
Do not confuse them with type specifiers and type qualifiers, which can also appear in this position, but modify the return type when they do.

In C++, do variadic functions (those with ... at the end of the parameter list) necessarily follow the __cdecl calling convention?

I know that __stdcall functions can't have ellipses, but I want to be sure there are no platforms that support the stdarg.h functions for calling conventions other than __cdecl or __stdcall.
The calling convention has to be one where the caller clears the arguments from the stack (because the callee doesn't know what will be passed).
That doesn't necessarily correspond to what Microsoft calls "__cdecl" though. Just for example, on a SPARC, it'll normally pass the arguments in registers, because that's how the SPARC is designed to work -- its registers basically act as a call stack that gets spilled to main memory if the calls get deep enough that they won't fit into register anymore.
Though I'm less certain about it, I'd expect roughly the same on IA64 (Itanium) -- it also has a huge register set (a couple hundred if memory serves). If I'm not mistaken, it's a bit more permissive about how you use the registers, but I'd expect it to be used similarly at least a lot of the time.
Why does this matter to you? The point of using stdarg.h and its macros is to hide differences in calling convention from your code, so it can work with variable arguments portably.
Edit, based on comments: Okay, now I understand what you're doing (at least enough to improve the answer). Given that you already (apparently) have code to handle the variations in the default ABI, things are simpler. That only leaves the question of whether variadic functions always use the "default ABI", whatever that happens to be for the platform at hand. With "stdcall" and "default" as the only options, I think the answer to that is yes. Just for example, on Windows, wsprintf and wprintf break the rule of thumb, and uses cdecl calling convention instead of stdcall.
The most definitive way that you can determine this is to analyze the calling conventions. For variadic functions to work, your calling convention needs a couple of attributes:
The callee must be able to access the parameters that aren't part of the variable argument list from a fixed offset from the top of the stack. This requires that the compiler push the parameters onto the stack from right to left. (This includes such things as the first parameter to printf, the format specification. Also, the address of the variable argument list itself must also be derived from a known location.)
The caller must be responsible for removing the parameters off the stack once the function has returned, because only the compiler, while generating the code for the caller, knows how many parameters were pushed onto the stack in the first place. The variadic function itself does not have this information.
stdcall won't work because the callee is responsible for popping parameters off the stack. In the old 16-bit Windows days, pascal wouldn't work because it pushed parameters onto the stack from left to right.
Of course, as the other answers have alluded to, many platforms don't give you any choice in terms of calling convention, making this question irrelevant for those ones.
Consider the following function on an x86 system:
void __stdcall something(char *, ...);
The function declares itself as __stdcall, which is a callee-clean convention. But a variadic function cannot be callee-clean since the callee does not know how many parameters were passed, so it doesn’t know how many it should clean.
The Microsoft Visual Studio C/C++ compiler resolves this conflict by silently converting the calling convention to __cdecl, which is the only supported variadic calling convention for functions that do not take a hidden this parameter.
Why does this conversion take place silently rather than generating a warning or error?
My guess is that it’s to make the compiler options /Gr (set default calling convention to __fastcall) and /Gz (set default calling convention to __stdcall) less annoying.
Automatic conversion of variadic functions to __cdecl means that you can just add the /Gr or /Gz command line switch to your compiler options, and everything will still compile and run (just with the new calling convention).
Another way of looking at this is not by thinking of the compiler as converting variadic __stdcall to __cdecl but rather by simply saying “for variadic functions, __stdcall is caller-clean.”
click here
AFAIK, the diversity of calling conventions is unique to DOS/Windows on x86. Most other platforms had compilers come with the OS and standardize the convention.
Do you mean 'platforms supported by MSVC" or as a general rule? Even if you confine yourself to the platforms supported by MSVC, you still have situations like IA64 and AMD64 where there is only "one" calling convention, and that calling convention is called __stdcall, but it's certainly not the same __stdcall you get on x86.

Switching callstack for C++ functions

Here's my previous question about switching C callstacks. However, C++ uses a different calling convention (thiscall) and may require some different asm code. Can someone explain the differences and point to or supply some code snippets that switch C++ callstacks (preferably in GCC inline asm)?
Thanks,
James
The code given in the previous question should work fine.
The thiscall calling convention differs only in who is responsible for popping the arguments off the stack. Under the thiscall calling convention, the callee pops the arguments (and additionally, the this pointer is passed in ecx); under the C calling convention, the caller pops the arguments. This does not affect context switches.
However, if you're going to do context switches yourself, note that you need to save and restore the registers as well (probably on the stack) in addition to switching stacks.
Note, by the way, that C++ doesn't always use thiscall -- it's only used for methods with a fixed number of arguments (and apart from that, it's a Microsoftism... g++ doesn't use it).
Note the ABI for C++ is not explicitly defined.
The idea was that compiler manufactures are able to use the optimal calling convention for the situation and thus make C++ faster.
The down side of this is that each compiler has its own calling convention thus code from different compilers are not compatable (even code form different versions (or even different optimization flags) of the same compiler can be incompatable).