In C/C++ (specifically, I'm using MSVS), in what situation would one ever need to worry about specifying a calling convention for a function definition? Are they ever important? Isn't the complied capable of choosing the optimal convention when necessary (ie fastcall, etc).
Maybe my understanding is lacking, but I just do not see when their would be a case that the programmer would need to care about things like the order that the arguments are placed on the stack and so forth. I also do not see why the compiler's optimization would not be able to choose whatever scheme would work best for that particular function. Any knowledge anyone could provide me with would be great. Thanks!
In general terms, the calling convention is important when you're integrating code that's being compiled by different compilers. For example, if you're publishing a DLL that will be used by your customers, you will want to make sure that the functions you export all have a consistent, expected calling convention.
You are correct that within a single program, the compiler can generally choose which calling convention to use for each function (and the rules are usually pretty simple).
You do not need to care for 64-bit applicatins since there is only one calling convention.
You do need to care for 32-bit applications in the following cases:
You interact with 3rd party libraries and the headers for these libraries did not declare the correct calling convention.
You are creating a library or DLL for someone else to use. You need to decide on a calling convention so that other code would use the correct calling convention when calling your code.
Related
We have a lot of VCL-based applications written in C++. All the VCL methods (under the __published class modifier require the __fastcall calling convention. However, for whatever reason, developers have been adding __fastcall to other non-VCL functions which are private, protected, or public.
Based on this article, this makes no sense to me as it unnecessarily complexifies the code and might even be a performance hit (probably neglible though). Nonetheless, after suggesting we remove it in some places I was told we've always done it that way so be consistent and it's just a question of style. I think it actually confuses people if it isn't necessary, so it's bad practice.
My question is, when is it appropriate to use the __fastcall calling convention?
A good optimizing compiler that supports whole-program optimization (aka link-time code generation) doesn't care about the calling convention for internal functions*. It will use whatever calling convention is the fastest/best in that situation, including inventing a custom calling convention or inlining the function altogether.
The only time a calling convention matters is for functions that form part of a public API. And in that case, __fastcall is probably a poor choice. Use a more standard calling convention like __cdecl or __stdcall, widely supported by Windows toolchains. __fastcall is an especially poor choice for interoperability, since it was never standardized and therefore is implemented differently by different vendors. This becomes a nightmare the minute you try to use your DLL with an application compiled with a different toolchain, much less in a different language.
Except, of course, when you're working with the VCL APIs that are documented as requiring the __fastcall convention. For example, the documentation says that member functions for VCL classes use the __fastcall convention, so you need to use the same calling convention in all of your overrides.
Or when you need caller clean-up, e.g., to support variadic arguments. Then you need __cdecl.
If you do want to use a particular calling convention for internal functions (i.e., those that are not part of a public API), you should really prefer to specify that globally with a compiler switch. This will then specify the calling convention to be used for all functions whose prototypes do not specifically override it. This has several advantages. For one, it avoids cluttering your code with a bunch of calling-convention boilerplate. Second, it allows you to easily make changes later (for example, if profiling reveals that your original choice of calling convention is a bottleneck that the optimizer is unable to resolve).
Anecdotally, __stdcall is superior to __cdecl because of a reduction of binary size, made possible by the fact that the callee adjusts the stack instead of the caller (and there are fewer callees than callers), but as the article you linked mentions, __fastcall may not always be faster than __stdcall. The article doesn't go into any technical details, but the issue is basically the extremely limited numbers of registers available on 32-bit x86. Passing values in registers instead of on the stack is generally a performance win, but can become a pessimization in certain cases when the function is large and runs out of registers, forcing it to spill the arguments back to the stack, doing double work (which evokes a speed penalty) and further inflating the code (which evokes a cache penalty and, indirectly, a speed penalty). It is also a pessimization in cases where the values are already on the stack, but need to be moved into registers in order to make a function call, hindering the optimization potential in both places.
Do note that this all becomes irrelevant when you start targeting 64-bit x86 architectures. The calling convention is finally standardized there for all Windows applications, regardless of vendor. The x64 calling convention is somewhat akin to __fastcall, but works much better there because of the larger number of available registers. The optimizer is not required to go through as many contortions to free up registers for passing parameters as it is on x86-32.
* Note that when I say "internal" functions here, I refer not to a particular access modifier, but rather to functions that are within a single compiland and/or those that are never called into by external code.
I was wondering, it is possible to explicitly specify a custom calling convention, but considering the maturity and amount of optimizations found in the compiler, when no calling convention is specified, can I expect the compiler to pick the best one for the particular function, for example if parameters are few and primitive use fastcall and so on...
It is a "convention" for a reason. Everybody has to follow the convention or you couldn't call your function from another module.
However, if the function is not visible then GCC has options. It may inline the function or call it however it wants to. It might even split it into "hot" and "cold" parts and inline the hot code path. That usually only happens when building with profile guided optimization.
If you want GCC to make optimizations like that, work on hiding your functions. If you are building an executable look at -fwhole-program. If you are building libraries look at -fvisibility=hidden. Also look into -flto.
I've come across calling conventions whilst studying states for game making with C++.
In a previous question someone stated that MSDN doesn't explain _stdcall very well - I agree.
What are the primary purposes for calling conventions like _stdcall? Does it matter what order the arguments are placed on the stack? How does it reduce the size of the code in X86 (as someone else stated)?
The reason for having some calling convention is pretty simple: so that the caller and the callee agree on how things will work. Without it, the caller doesn't know where to put arguments when it's calling a particular function.
As for why Microsoft decided on the specific details of _stdcall, that's largely historical. On MS-DOS, all calls were register based, so all OS calls required assembly language, or strange extensions to most higher-level languages.
When they first did Windows, they used the cdecl calling convention, mostly because that's what the compiler did by default. At least according to rumor, shortly before they got ready to release Windows 1.0, they switched to the Pascal calling convention because it was enough more efficient that (among other things) it allowed Windows to fit on one fewer floppy disc. Regardless of the precise details, the Pascal calling convention did make code a little smaller, because the called function cleaned up the arguments from the stack instead of needing to clean them up everywhere the function was called. For any function that was called from at least 2 different places, that's a win (and if it's tie anywhere else).
Then they started work on OS/2, and invented yet another calling convention (syscall).
Then, of course, came Win32. There wasn't really a lot wrong with syscall from a technical viewpoint, but (I'd guess) everything associated with OS/2 was considered tainted, so syscall had to go. The result was something just enough different to justify a new name. In fairness, that's a little bit of an exaggeration: they did add one truly useful addition: they encoded the number of bytes of arguments into each function name, so if (for example) you supplied an incorrect prototype for a function, the code wouldn't link rather than ending up with a mismatch between caller and callee that could lead to much more serious problems.
For the most part, it really comes back to the original point though: the exact details of the calling convention don't matter all that much, as long as you don't make a complete mess of it. Most of what matters is that the caller and callee agree on the same thing, so if a compiler knows what parameters a function accepts, it knows how to generate code to get those parameters to the function correctly (and, likewise, they both agree on how stack cleanup is handled, etc.)
I use and Application compiled with the Visual C++ Compiler. It can load plugins in form of a .dll. It is rather unimportant what exactly it does, fact is:
This includes calling functions from the .dll that return a pointer to an object of the Applications API, etc.
My question is, what problems may appear when the Application calls a function from the .dll, retrieves a pointer from it and works with it. For example, something that comes into my mind, is the size of a pointer. Is it different in VC++ and G++? If yes, this would probably crash the Application?
I don't want to use the Visual Studio IDE (which is unfortunately the "preferred" way to use the Applications SDK). Can I configure G++ to compile like VC++?
PS: I use MINGW GNU G++
As long as both application and DLL are compiled on the same machine, and as long as they both only use the C ABI, you should be fine.
What you can certainly not do is share any sort of C++ construct. For example, you mustn't new[] an array in the main application and let the DLL delete[] it. That's because there is no fixed C++ ABI, and thus no way in which any given compiler knows how a different compiler implements C++ data structures. This is even true for different versions of MSVC++, which are not ABI-compatible.
All C++ language features are going to be entirely incompatible, I'm afraid. Everything from the name-mangling to memory allocation to the virtual-call mechanism are going to be completely different and not interoperable. The best you can hope for is a quick, merciful crash.
If your components only use extern "C" interfaces to talk to one another, you can probably make this work, although even there, you'll need to be careful. Both C++ runtimes will have startup and shutdown code, and there's no guarantee that whichever linker you use to assemble the application will know how to include this code for the other compiler. You pretty much must link g++-compiled code with g++, for example.
If you use C++ features with only one compiler, and use that compiler's linker, then it gets that much more likely to work.
This should be OK if you know what you are doing. But there's some things to watch out for:
I'm assuming the interface between EXE and DLL is a "C" interface or something COM like where the only C++ classes exposed are through pure-virutal interfaces. It gets messier if you are exporting a concrete class through a DLL.
32-bit vs. 64bit. The 32-bit app won't load a 64-bit DLL and vice-versa. Make sure they match.
Calling convention. __cdecl vs __stdcall. Often times Visual Studio apps are compiled with flags that assuming __stdcall as the default calling convention (or the function prototype explicitly says so). So make sure that the g++ compilers generates code that matches the calling type expected by the EXE. Otherwise, the exported function might run, but the stack can get trashed on return. If you debug through a crash like this, there's a good chance the cdecl vs stdcall convention was incorrectly specified. Easy to fix.
C-Runtimes will not likely be shared between the EXE and DLL, so don't mix and match. A pointer allocated with new or malloc in the EXE should not be released with delete or free in the DLL (and vice versa). Likewise, FILE handles returned by fopen() can not be shared between EXE and DLL. You'll likely crash if any of this happens.... which leads me to my next point....
C++ header files with inline code cause enough headaches and are the source of issues I called out in #3. You'll be OK if the interface between DLL And EXE is a pure "C" interface.
Name mangling issues. If you run into issues where the function name exported doesn't match because of name mangling or because of a leading underscore, you can fix that up in a .DEF file. At least that's what I've done in the past with Visual Studio. Not sure if the equivalent exists in g++/MinGW. Example below. Learn to use "dumpbin.exe /exports" to you can validate your DLL is exporting function with the right name. Using extern "C" will also help fix this as well.
EXPORTS
FooBar=_Foobar#12
BlahBlah=??BlahBlah##QAE#XZ #236 NONAME
Those are the issues that I know of. I can't tell you much more since you didn't explain the interface between the DLL and EXE.
The size of a pointer won't vary; that is dependent on the platform and module bitness and not the compiler (32-bit vs 64-bit and so on).
What can vary is the size of basically everything else, and what will vary are templates.
Padding and alignment of structs tends to be compiler-dependent, and often settings-within-compiler dependent. There are so loose rules, like pointers typically being on a platform-bitness-boundary and bools having 3 bytes after them, but it's up to the compiler how to handle that.
Templates, particularly from the STL (which is different for each compiler) may have different members, sizes, padding, and mostly anything. The only standard part is the API, the backend is left to the STL implementation (there are some rules, but compilers can still compile templates differently). Passing templates between modules from one build is bad enough, but between different compilers it can often be fatal.
Things which aren't standardized (name mangling) or are highly specific by necessity (memory allocation) will also be incompatible. You can get around both of those issues by only destroying from the library that creates (good practice anyway) and using STL objects that take a deleter, for allocation, and exporting using undecorated names and/or the C style (extern "C") for exported methods.
I also seem to remember a catch with how the compilers handle virtual destructors in the vtable, with some small difference.
If you can manage to only pass references of your own objects, avoid externally visible templates entirely, work primarily with pointers and exported or virtual methods, you can avoid a vast majority of the issues (COM does precisely this, for compatibility with most compilers and languages). It can be a pain to write, but if you need that compatibility, it is possible.
To alleviate some, but not all, of the issues, using an alternate to the STL (like Qt's core library) will remove that particular problem. While throwing Qt into any old project is a hideous waste and will cause more bloat than the "boost ALL THE THINGS!!!" philosophy, it can be useful for decoupling the library and the compiler to a greater extent than using a stock STL can.
You can't pass C runtime objects between them. For example you can not open a FILE buffer in one and pass it to be used in the other. You can't free memory allocated on the other side.
The main problems are the function signatures and way parameters are passed to library code. I've had great difficulty getting VC++ dll's to work in gnu based compilers in the past. This was way back when VC++ always cost money and mingw was the free solution.
My experience was with DirectX API's. Slowly a subset got it's binaries modified by enthusiasts but it was never as up-to-date or reliable so after evaluating it I switched to a proper cross platform API, that was SDL.
This wikipedia article describes the different ways libraries can be compiled and linked. It is rather more in depth than I am able to summarise here.
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers. It would make sense that this optimization will kick in if there are only 1-2 parameters, and not when there are 256 (not that one would want to have the max number of parameters).
How can one find out the parameter limit (number of parameters) for a certain compiler (such as gcc) where one can be sure that this optimization will be used?
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers.
As FrankH says in his comments and as I'm going to say in my answer, the application binary interface for the system in question determines how arguments are passed to functions - this is called the calling convention for that platform.
To complicate matters, x86 32-bit actually has several. This is historical and comes from the fact that when Win32 bit arrived, everyone went crazy doing different things.
So, yes, you can "optimise" by writing function calls in such a way, but no, you shouldn't. You should follow the standards for your platform. Because the honest truth is, the speed of stack access probably isn't slowing your code down to that great an extent that you need to be binary-incompatible from everyone else on your system.
Why the need for ABIs/standard calling conventions? Well, in terms of using the processor registers, stack etc, applications must agree on what means what and where it shoudl go. If one function decided all its arguments were in registers and another that some were on the stack, how would they be interoperable? Moreover, you might come across the term scratch registers to mean those registers you don't have to restore. What happens if you call a function expecting it to leave some registers alone?
Anyway, as for what you asked for, here's some ABI documentation:
The difference between x86 and x64 on windows.
x86_64 ABI used for Unix-like platforms.
Wikipedia's x86 calling conventions.
A document on compiler calling conventions.
The last one is my favourite. To quote it:
In the days of the old DOS operating system, it was often possible to combine development
tools from different vendors with few compatibility problems. With 32-bit Windows, the
situation has gone completely out of hand. Different compilers use different data
representations, different function calling conventions, and different object file formats.
While static link libraries have traditionally been considered compiler-specific, the
widespread use of dynamic link libraries (DLL's) has made the distribution of function
libraries in binary form more common.
So whatever you're trying to do with optimising via modifying the function calling method, don't. Find another way to optimise. Profile your code. Study the compiler optimisations you've got for your compiler (-OX) if you think it helps and dump the assembly to check, if the speed is really that crucial
For publically visible functions, this is documented in the ABI standard. For functions that are not referencible from the outside, all bets are off anyway.
You would have to read the fine manual for the compiler. If you were lucky, you would find it there in a description of function calling conventions. Otherwise, for an OSS compiler such as gcc you would probably have to read its source-code.