I've come across calling conventions whilst studying states for game making with C++.
In a previous question someone stated that MSDN doesn't explain _stdcall very well - I agree.
What are the primary purposes for calling conventions like _stdcall? Does it matter what order the arguments are placed on the stack? How does it reduce the size of the code in X86 (as someone else stated)?
The reason for having some calling convention is pretty simple: so that the caller and the callee agree on how things will work. Without it, the caller doesn't know where to put arguments when it's calling a particular function.
As for why Microsoft decided on the specific details of _stdcall, that's largely historical. On MS-DOS, all calls were register based, so all OS calls required assembly language, or strange extensions to most higher-level languages.
When they first did Windows, they used the cdecl calling convention, mostly because that's what the compiler did by default. At least according to rumor, shortly before they got ready to release Windows 1.0, they switched to the Pascal calling convention because it was enough more efficient that (among other things) it allowed Windows to fit on one fewer floppy disc. Regardless of the precise details, the Pascal calling convention did make code a little smaller, because the called function cleaned up the arguments from the stack instead of needing to clean them up everywhere the function was called. For any function that was called from at least 2 different places, that's a win (and if it's tie anywhere else).
Then they started work on OS/2, and invented yet another calling convention (syscall).
Then, of course, came Win32. There wasn't really a lot wrong with syscall from a technical viewpoint, but (I'd guess) everything associated with OS/2 was considered tainted, so syscall had to go. The result was something just enough different to justify a new name. In fairness, that's a little bit of an exaggeration: they did add one truly useful addition: they encoded the number of bytes of arguments into each function name, so if (for example) you supplied an incorrect prototype for a function, the code wouldn't link rather than ending up with a mismatch between caller and callee that could lead to much more serious problems.
For the most part, it really comes back to the original point though: the exact details of the calling convention don't matter all that much, as long as you don't make a complete mess of it. Most of what matters is that the caller and callee agree on the same thing, so if a compiler knows what parameters a function accepts, it knows how to generate code to get those parameters to the function correctly (and, likewise, they both agree on how stack cleanup is handled, etc.)
Related
We have a lot of VCL-based applications written in C++. All the VCL methods (under the __published class modifier require the __fastcall calling convention. However, for whatever reason, developers have been adding __fastcall to other non-VCL functions which are private, protected, or public.
Based on this article, this makes no sense to me as it unnecessarily complexifies the code and might even be a performance hit (probably neglible though). Nonetheless, after suggesting we remove it in some places I was told we've always done it that way so be consistent and it's just a question of style. I think it actually confuses people if it isn't necessary, so it's bad practice.
My question is, when is it appropriate to use the __fastcall calling convention?
A good optimizing compiler that supports whole-program optimization (aka link-time code generation) doesn't care about the calling convention for internal functions*. It will use whatever calling convention is the fastest/best in that situation, including inventing a custom calling convention or inlining the function altogether.
The only time a calling convention matters is for functions that form part of a public API. And in that case, __fastcall is probably a poor choice. Use a more standard calling convention like __cdecl or __stdcall, widely supported by Windows toolchains. __fastcall is an especially poor choice for interoperability, since it was never standardized and therefore is implemented differently by different vendors. This becomes a nightmare the minute you try to use your DLL with an application compiled with a different toolchain, much less in a different language.
Except, of course, when you're working with the VCL APIs that are documented as requiring the __fastcall convention. For example, the documentation says that member functions for VCL classes use the __fastcall convention, so you need to use the same calling convention in all of your overrides.
Or when you need caller clean-up, e.g., to support variadic arguments. Then you need __cdecl.
If you do want to use a particular calling convention for internal functions (i.e., those that are not part of a public API), you should really prefer to specify that globally with a compiler switch. This will then specify the calling convention to be used for all functions whose prototypes do not specifically override it. This has several advantages. For one, it avoids cluttering your code with a bunch of calling-convention boilerplate. Second, it allows you to easily make changes later (for example, if profiling reveals that your original choice of calling convention is a bottleneck that the optimizer is unable to resolve).
Anecdotally, __stdcall is superior to __cdecl because of a reduction of binary size, made possible by the fact that the callee adjusts the stack instead of the caller (and there are fewer callees than callers), but as the article you linked mentions, __fastcall may not always be faster than __stdcall. The article doesn't go into any technical details, but the issue is basically the extremely limited numbers of registers available on 32-bit x86. Passing values in registers instead of on the stack is generally a performance win, but can become a pessimization in certain cases when the function is large and runs out of registers, forcing it to spill the arguments back to the stack, doing double work (which evokes a speed penalty) and further inflating the code (which evokes a cache penalty and, indirectly, a speed penalty). It is also a pessimization in cases where the values are already on the stack, but need to be moved into registers in order to make a function call, hindering the optimization potential in both places.
Do note that this all becomes irrelevant when you start targeting 64-bit x86 architectures. The calling convention is finally standardized there for all Windows applications, regardless of vendor. The x64 calling convention is somewhat akin to __fastcall, but works much better there because of the larger number of available registers. The optimizer is not required to go through as many contortions to free up registers for passing parameters as it is on x86-32.
* Note that when I say "internal" functions here, I refer not to a particular access modifier, but rather to functions that are within a single compiland and/or those that are never called into by external code.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The question is NOT about the Linux kernel. It is NOT a C vs. C++ debate either.
I did a research and it seems to me that C++ lacks tool support when it comes to exception handling and memory allocation for embedded systems:
Why is the linux kernel not implemented in C++?
Beside the accepted answer see also Ben Collins' answer.
Linus Torvalds on C++:
"[...] anybody who designs his kernel modules for C++ is [...]
(b) a C++ bigot that can't see what he is writing is really just C anyway"
" - the whole C++ exception handling thing is fundamentally broken. It's especially broken for kernels.
- any compiler or language that likes to hide things like memory allocations behind your back just isn't a good choice for a kernel."
JOINT STRIKE FIGHTER AIR VEHICLE C++ CODING STANDARDS:
"AV Rule 208 C++ exceptions shall not be used"
Are the exception handling and the memory allocation the only points where C++ apparently lacks tool support (in this context)?
To fix the exception handling issue, one has to provide bound on the time till the exception is caught after it is thrown?
Could you please explain me why the memory allocation is an issue? How can one overcome this issue, what has to be done?
As I see this, in both cases one has to provide an upper bound at compile time on something nontrivial that happens and depends on things at run-time.
Answer:
No, dynamic casts were also an issue but it has been solved.
Basically yes. The time needed to handle exceptions has to be bounded by analyzing all the throw paths.
See solution on slides "How to live without new" in Embedded systems programming. In short: pre-allocate (global objects, stacks, pools).
Well, there are a couple of things. First, you have to remember that the STL is completely built on OS routines, the C standard library, and dynamic allocation. When your writing a kernel, there is no dynamic memory allocation for you (you're providing it) there is no C standard library (you have to provide one built on top of your kernel), and you are providing system calls. Then there is the fact that C interops very well and easily with assembly, whereas C++ is very difficult to interface with assembly because the ABI isn't necessarily constant, nor are names. Because of name mangling, you get a whole new level of complication.
Then, you have to remember that when you are building an OS, you need to know and control every aspect of the memory used by the kernel. In C++, there are quite a few hidden structures that you have no control over (vtables, RTTI, exceptions) that would severely interfere with your work.
In other words, what Linus is saying is that with C, you can easily understand the assembly being generated and it is simple enough to run directly on the machine. Although C++ can, you will always have to set up quite a bit of context and still do some C to interface the assembly and C. Another reason is that in systems programing, you need to know exactly how methods are being called. C has the very well documented C calling conventions, but in C++ you have this to deal with, name mangling, etc.
In short, it's because C++ does things without you asking.
Per #Josh's comment bellow, another thing C++ does behind your back is constructors and destructors. They add overhead to enter and exiting stack frames, and most of all, make assembly interop even harder, as when you destroy a C++ stack frame, you have to call the destructor of every object in it. This gets ugly quickly.
Why certain kernels refuse C++ code in their code base? Politics and preference, but I digress.
Some parts of modern OS kernels are written in certain subsets of C++. In these subsets mainly exceptions and RTTI are disabled (sometimes multiple inheritance and templates are disallowed, too).
This is the case too in C. Certain features should not be used in a kernel environment (e.g. VLAs).
Outside of exceptions and RTTI, certain features in C++ are heavily critiqued, when we are talking about kernel code (or embedded code).
These are vtables and constructors/destructors. They bring a bit of code under the hood, and that seems to be deemed 'bad'. If you don't want a constructor, then don't implement one. If you worry about using a class with a constructor, then worry too about a function you have to use to initialize a struct. The upside in C++ is, you cannot really forget using a dtor outside of forgetting to deallocate the memory.
But what about vtables?
When you implement a object which contains extension points (e.g. a linux filesystem driver), you implement something like a class with virtual methods. So why is it sooo bad, to have a vtable? You have to control the placement of this vtable when you have certain requirements in which pages the vtable resides. As far as I recall, this is irrelevant for linux, but under windows, code pages can be paged out, and when you call a paged out function from a too high irql, you crash. But you really have to watch out what functions you call, when you are on a high irql, whatever function it is. And you don't need to worry, if you don't use a virtual call in this context. In embedded software this could be worse, because (very seldomly) you need to directly control in which code page your code goes, but even there you can influence what your linker does.
So why are so many people so adamant of 'use C in kernel'?
Because they either got burned by a toolchain problem, or got burned by overenthusiastic developers using the latest stuff in kernel mode.
Maybe the kernel-mode developers are rather conservative, and C++ is a too newfangled thing ...
Why are exceptions not used in kernel-mode code?
Because they need to generate some code per function, introduce complexity in a code path and not handling an exception is bad for a kernel mode component, because it kills the system.
In C++, when an exception is thrown, the stack must be unwound and the according destructors must be called. This involves at least a bit of overhead. This is mostly negligible, but it does incur a cost, which may not be something you want. (Note I do not know how much a table base unwind does actually cost, I think I read that there is no cost when no exception is running, but ... I guess I have to look it up).
A code path, which cannot throw exceptions can be much easier reasoned about, then one which may.
So :
int f( int a )
{
if( a == 0 )
return -1;
if( g() < 0 )
return -2;
f3();
return h();
}
We can reason about every exit path, in this function, because we can easily see all returns, but when exceptions are enabled, the functions may throw and we cannot guarantee what the actual path is, that the function takes. This is the exact point of the code may do something we cannot see at once. (This is bad C++ code, when exceptions are enabled).
The third point is, you want user mode applications to crash, when something unexpected occurs (E.g. when memory runs out), a user mode application should crash (after freeing resources) to allow the developer to debug the problem or at least get a good error message. You should not have a uncaught exception in a kernel mode module, ever.
Note that all this can be overcome, there are SEH exceptions in the windows kernel, so point 2+3 are not really good points in the NT kernel.
There are no memory management problems with C++ in the kernel. E.g. the NT kernel headers provide overloads for new and delete, which let you specify the pool type of your allocation, but are otherwise exactly the same as the new and delete in a user mode application.
I don't really like language wars, and have voted to close this again. But anyway...
Well, there are a couple of things. First, you have to remember that the STL is completely built on OS routines, the C standard library, and dynamic allocation. When your writing a kernel, there is no dynamic memory allocation for you (you're providing it) there is no C standard library (you have to provide one built on top of your kernel), and you are providing system calls. Then there is the fact that C interops very well and easily with assembly, whereas C++ is very difficult to interface with assembly because the ABI isn't necessarily constant, nor are names. Because of name mangling, you get a whole new level of complication.
No, with C++ you can declare functions having an extern "C" (or optionally extern "assembly") calling convention. That makes the names compatible with everything else on the same platform.
Then, you have to remember that when you are building an OS, you need to know and control every aspect of the memory used by the kernel. In C++, there are quite a few hidden structures that you have no control over (vtables, RTTI, exceptions) that would severely interfere with your work.
You have to be careful when coding kernel features, but that is not limited to C++. Sure, you cannot use std::vector<byte> as the base for you memory allocation, but neither can you use malloc for that. You don't have to use virtual functions, multiple inheritance and dynamic allocations for all C++ classes, do you?
In other words, what Linus is saying is that with C, you can easily understand the assembly being generated and it is simple enough to run directly on the machine. Although C++ can, you will always have to set up quite a bit of context and still do some C to interface the assembly and C. Another reason is that in systems programing, you need to know exactly how methods are being called. C has the very well documented C calling conventions, but in C++ you have this to deal with, name mangling, etc.
Linus is possibly claiming that he can spot every call to f(x) and immediately see that it is calling g(x), h(x), and q(x) 20 levels deep. Still MyClass M(x); is a great mystery, as it might be calling some unknown code behind his back. Lost me there.
In short, it's because C++ does things without you asking.
How? If I write a constructor and a destructor for a class, it is because I am asking for the code to be executed. Don't tell me that C can magically copy an object without executing some code!
Per #Josh's comment bellow, another thing C++ does behind your back is constructors and destructors. They add overhead to enter and exiting stack frames, and most of all, make assembly interop even harder, as when you destroy a C++ stack frame, you have to call the destructor of every object in it. This gets ugly quickly.
Constuctors and destructors do not add code behind your back, they are only there if needed. Destructors are called only when it is required, like when dynamic memory needs to be deallocated. Don't tell me that C code work without this.
One reason for the lack of C++ support in both Linux and Windows is that a lot of the guys working on the kernels have been doing this since long before C++ was available. I have seen posts from the Windows kernel developers arguing that C++ support isn't really needed, as there are very few device drivers written in C++. Catch-22!
Are the exception handling and the memory allocation the only points where C++ apparently lacks tool support (in this context)?
In places where this is not properly handled, just don't use it. You don't have to use multiple inheritance, dynamic allocation, and throwing exceptions everywhere. If returning an error code works, fine. Do that!
To fix the exception handling issue, one has to provide bound on the time till the exception is caught after it is thrown?
No, but you just cannot use application levels features in the kernel. Implementing dynamic memory using a std::vector<byte> isn't a good idea, but who would really try that?
Could you please explain me why the memory allocation is an issue? How can one overcome this issue, what has to be done?
Using standard library features depending on memory allocation on a layer below the functions implementing the memory management would be a problem. Implementing malloc using calls to malloc would be just as silly. But who would try that?
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers. It would make sense that this optimization will kick in if there are only 1-2 parameters, and not when there are 256 (not that one would want to have the max number of parameters).
How can one find out the parameter limit (number of parameters) for a certain compiler (such as gcc) where one can be sure that this optimization will be used?
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers.
As FrankH says in his comments and as I'm going to say in my answer, the application binary interface for the system in question determines how arguments are passed to functions - this is called the calling convention for that platform.
To complicate matters, x86 32-bit actually has several. This is historical and comes from the fact that when Win32 bit arrived, everyone went crazy doing different things.
So, yes, you can "optimise" by writing function calls in such a way, but no, you shouldn't. You should follow the standards for your platform. Because the honest truth is, the speed of stack access probably isn't slowing your code down to that great an extent that you need to be binary-incompatible from everyone else on your system.
Why the need for ABIs/standard calling conventions? Well, in terms of using the processor registers, stack etc, applications must agree on what means what and where it shoudl go. If one function decided all its arguments were in registers and another that some were on the stack, how would they be interoperable? Moreover, you might come across the term scratch registers to mean those registers you don't have to restore. What happens if you call a function expecting it to leave some registers alone?
Anyway, as for what you asked for, here's some ABI documentation:
The difference between x86 and x64 on windows.
x86_64 ABI used for Unix-like platforms.
Wikipedia's x86 calling conventions.
A document on compiler calling conventions.
The last one is my favourite. To quote it:
In the days of the old DOS operating system, it was often possible to combine development
tools from different vendors with few compatibility problems. With 32-bit Windows, the
situation has gone completely out of hand. Different compilers use different data
representations, different function calling conventions, and different object file formats.
While static link libraries have traditionally been considered compiler-specific, the
widespread use of dynamic link libraries (DLL's) has made the distribution of function
libraries in binary form more common.
So whatever you're trying to do with optimising via modifying the function calling method, don't. Find another way to optimise. Profile your code. Study the compiler optimisations you've got for your compiler (-OX) if you think it helps and dump the assembly to check, if the speed is really that crucial
For publically visible functions, this is documented in the ABI standard. For functions that are not referencible from the outside, all bets are off anyway.
You would have to read the fine manual for the compiler. If you were lucky, you would find it there in a description of function calling conventions. Otherwise, for an OSS compiler such as gcc you would probably have to read its source-code.
In C/C++ (specifically, I'm using MSVS), in what situation would one ever need to worry about specifying a calling convention for a function definition? Are they ever important? Isn't the complied capable of choosing the optimal convention when necessary (ie fastcall, etc).
Maybe my understanding is lacking, but I just do not see when their would be a case that the programmer would need to care about things like the order that the arguments are placed on the stack and so forth. I also do not see why the compiler's optimization would not be able to choose whatever scheme would work best for that particular function. Any knowledge anyone could provide me with would be great. Thanks!
In general terms, the calling convention is important when you're integrating code that's being compiled by different compilers. For example, if you're publishing a DLL that will be used by your customers, you will want to make sure that the functions you export all have a consistent, expected calling convention.
You are correct that within a single program, the compiler can generally choose which calling convention to use for each function (and the rules are usually pretty simple).
You do not need to care for 64-bit applicatins since there is only one calling convention.
You do need to care for 32-bit applications in the following cases:
You interact with 3rd party libraries and the headers for these libraries did not declare the correct calling convention.
You are creating a library or DLL for someone else to use. You need to decide on a calling convention so that other code would use the correct calling convention when calling your code.
I always assumed that fortran passed entities "by reference" to a dummy argument. Then I got this answer (the actual argument of the answer was related, but not on this)
The standard never specifies that and,
indeed goes quite a lot out of its way
to avoid such specification. Although
yours is a common misconception, it
was not strictly accurate even in most
older compilers, particularly with
optimization turned on. A strict
pass-by-reference would kill many
common optimizations.
With recent standards,
pass-by-reference is all but
disallowed in some cases. The standard
doesn't use those words in its
normative text, but there are things
that would be impractical to implement
with pass-by-reference.
When you start getting into things
like pointers, the error of assuming
that everything is pass-by-reference
will start making itself more evident
than before. You'll have to drop that
misconception or many things wil
confuse you.
I think other people have answered the
rest of the post adequately. Some also
addressed the above point, but I
wanted to emphasize it.
See here for attribution.
According to this answer, then, there's nothing in the standard specifying how data are shipped from the caller to the callee. In practical terms, how this should be interpreted in terms of actually working with it (regardless of the practical effects resulting from how compilers implement the standard) in particular when it comes to intent() specification?
Edit: I'd like to clarify my question. What I am trying to understand is how the standard expects you to work when you are performing calls. Given that the actual compiler strategy used to pass entities is undefined by the standard, you cannot in principle (according to the standard) expect that passing an argument to a function will actually behave as a "pass-by-reference", with all its related side-effects, because this behavior is compiler and optimization dependent. I assume therefore the standard imposes you a programming style you have to follow to work regardless of the actual implementation strategy.
Yes, that's true. Fortran standard doesn't specify the exact evaluation strategy for different INTENT attributes. For example, for a dummy argument with INTENT(OUT) some Fortran compiler can use call-by-reference or call-by-copy-restore evaluation.
But I do not understand why do you really need to know which evaluation strategy is used? What is the reason? I think Fortran do it right. Humans concern is (high-level) argument intent, and computers concern is (low-level) evaluation strategy. "Render unto man the things which are man’s and unto the computer the things which are the computer’s." N. Wiener
I think the standard contains enough information on the usage of different INTENT attributes. Section "5.1.2.7 INTENT attribute" of Final Committee Draft of Fortran 2003 standard (PDF, 5 MB) is a good starting point.
As the others have said, you don't have to know how the compiler works, as long as it works correctly. Fortran >=90 allows you to specify the purpose of an argument: input, output or both, and then the compiler takes care of passing the argument. As long as the compiler is correctly implemented so that both sides (caller and callee) use the same mechanism, why should the programmer care? So the Fortran standard tries not to limit how compiler writers implement their compilers, allowing them to design and optimize as they wish.
In terms of programming style, decide the purpose of your procedure arguments, and declare them with the appropriate intent. Then, for example, if you accidentally try to modify an intent(in) argument in a procedure, the compiler will issue an error message, preventing you from making this mistake. It might even do this before emitting object code, so the actual argument passing mechanism has nothing to do with enforcing this requirement of the language.
C is a lower-level language and the programmer needs to choose between passing by value or by reference for various reasons. But for problems such as scientific programming this is an unnecessary aspect to have to control. It can be essential in embedded or device driver programming.
If there is a reason that you must control of the argument passing mechanism, such as interfacing to another language, Fortran 2003 provides the ISO C Binding (which is already widely implemented). With this you can match the C method of argument passing.
I'm not sure I understand your question, but one way of seeing things is that as long as one writes standard-conforming code, one does not need to be concerned with how argument passing is implemented. The compiler makes sure (well, barring compiler bugs, of course) that arguments are passed in a way that conforms to the standard.
Now, if you play tricks e.g. with C interop or such, say by comparing argument addresses you might be able to determine whether the compiler uses pass by reference, copy-in/copy-out or something else in some particular case. But generally speaking, at that point you're outside the scope of the standard and all bets are off.