__cdecl results in larger executable than __stdcall? - c++

I found this:
Because the stack is cleaned by the called function, the __stdcall
calling convention creates smaller executables than __cdecl, in which
the code for stack cleanup must be generated for each function call.
Suppose I got 2 functions:
void __cdecl func1(int x)
{
//do some stuff using x
}
void __stdcall func2(int x, int y)
{
//do some stuff using x, y
}
and here in the main():
int main()
{
func1(5);
func2(5, 6);
}
IMO, it is main()'s responsibility to clean up the stack of the call to func1(5), and func2 will clean up the stack of the call to func2(5,6), right?
Four questions:
1.For the call to func1 in main(), it's main's responsibility to clean up the stack, so will compiler insert some code (code to clean up the stack) before and after the call to func? Like this:
int main()
{
before_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func1(5);
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func2(5, 6);
}
2.For the call to func2 in main(), it's func2's own job to clean up the stack, so I presume, no code will be inserted in main() before or after the call to func2, right?
3.Because func2 is __stdcall, so I presume, compiler will automatically insert code (to clean up the stack) like this:
void __stdcall func1(int x, int y)
{
before_call_to_stdcall_func(); //compiler generated code for stack-clean-up of stdcall-func-call
//do some stuff using x, y
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of stdcall-func-call
}
I presume right?
4.Finally, back to the quoted words, why __stdcall results in smaller executable than __cdecl? And there is no such a thing as __stdcall in linux, right? Does it means linux elf will be always larger than exe in win?

It'll only insert code after the call, which is to reset the stack pointer, so long as there where call arguments.*
__stdcall generates no cleanup code at the call site, however, it should be noted that compilers can accrue stack cleanup from multiple __cdecl calls into one cleanup, or it can delay the cleanup to prevent pipeline stalls.
Ignoring the inverted order in this example, no, it'll only insert code to cleanup the __cdecl function, setting up of function arguments is something different (different compilers generate/prefer different methods).
__stdcall was more a windows thing, see this. the size of the binary depends on the number of calls to the __cdecl funcs, more calls means more clean up code, where as __stdcall has only 1 singular instance of cleanup code. however, you shouldn't see that much size increase, as at most you have a few bytes per call.
*Its important to distinguish between cleanup and setting up call parameters.

Historically, the first C++ compilers used the equivalent of
__stdcall. From a quality of implementation point of view, I'd expect
the C compiler to use the __cdecl convensions, and the C++ compiler
the __stdcall (which were known as the Pascal convensions back then).
This is one thing that the early Zortech compiles got right.
Of course, vararg functions must still use __cdecl conventions. The
callee can't clean up the stack if it doesn't know how much to clean up.
(Note that the C standard was carefully designed to allow the
__stdcall conventions in C as well. I only know of one compiler which
took advantage of this, however; the amount of existing code at the time
which called vararg functions without a prototype in view was enormous,
and while the standard declared it broken, compiler implementors didn't
want to break their clients' code.)
In a lot of milieu, there seems to be a very strong tendency to insist
that the C and the C++ conventions be the same, that one can take the
address of an extern "C++" function, and pass it to a function written
in C which calls it. IIRC, for example, g++ doesn't treat
extern "C" void f();
and
void f();
as having two different types (although the standard requires it), and
allows passing the address of a static member function to
pthread_create, for example. The result is that such compilers use
the exact same conventions everywhere, and on Intel, they are the
equivalent of __cdecl.
Many compilers have extensions to support other convensions. (Why they
don't use the standard extern "xxx", I don't know.) The syntax for
these extensions is very varied, however. Microsoft puts the attribute
directly before the function name:
void __stdcall func( int, int );
, g++ puts it in a special attribute clause after the function
declaration:
void func( int, int ) __attribute__((stdcall));
The C++11 has added a standard way of specifying attributes:
void [[stdcall]] func( int, int );
It doesn't specify stdcall as an attribute, but it does specify that
additional attributes (other than those defined in the standard) may be
specified, and are implementation dependent. I expect that both g++ and
VC++ accept this syntax in their most recent versions, at least if C++11
is activated. The exact name of the attribute (__stdcall, stdcall,
etc.) may vary, however, so you probably want to wrap this in a macro.
Finally: in a modern compiler with optimization turned on, the
difference in the calling conventions is probably negligible.
Attributes like const (not to be confused with the C++ keyword
const), regparm or noreturn will probably have a larger impact,
both in terms of executable size and performance.

This calling convention crowd is history by the new 64-bit ABI.
http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
There is also the ABI side of things for different architectures. (like ARM)
Not everything executes the same for all architectures. So do not bother thinking about this calling convention thing !
http://en.wikipedia.org/wiki/Calling_convention
EXE size improvement is insignificant (maybe nonexistent), do not bother...
__cdecl is much more flexible than __stdcall. Variable number of arguments flexibility, the insignificance of cleanup code (instruction), __cdecl function can be called with wrong number of arguments and this does not necessarily cause a serious problem ! But the same situation with __stdcall always goes wrong !

Others have answered the other parts of your question, so I'll just add my answer about the size:
4.Finally, back to the quoted words, why __stdcall results in smaller executable than __cdecl?
That appears to not be true. I tested it by compiling libudis with and without the stdcall calling convention. First without:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -c *.c && strip *.o
$ du -cb *.o
6524 decode.o
95932 itab.o
1434 syn-att.o
1706 syn-intel.o
2288 syn.o
1245 udis86.o
109129 totalt
And with. It is the -mrtd switch that enables stdcall:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -mrtd -c *.c && strip *.o
7084 decode.o
95932 itab.o
1502 syn-att.o
1778 syn-intel.o
2296 syn.o
1305 udis86.o
109897 totalt
As you can see, cdecl beats stdcall with a few hundred bytes. It could be my testing methodology that is flawed, or clang's stdcall code generator is weak. But I think that with modern compilers the extra flexibility afforded by caller cleanup means that they will always generate better code with cdecl rather than stdcall.

Related

MSVC optimizer saves and restores XMM SIMD registers on an early-out path through a function. Why? [duplicate]

In C, if I have a function call that looks like
// main.c
...
do_work_on_object(object, arg1, arg2);
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
if(object == NULL)
{
return;
}
// do lots of work
}
then the compiler will generate a lot of stuff in main.o to save state, pass parameters (hopefully in registers in this case), and restore state.
However, at link time it can be observed that arg1 and arg2 are not used in the quick-return path, so the clean-up and state restoration can be short-circuited. Do linkers tend to do this kind of thing automatically, or would one need to turn on link-time optimization (LTO) to get that kind of thing to work?
(Yes, I could inspect the disassembled code, but I'm interested in the behaviours of compilers and linkers in general, and on multiple architectures, so hoping to learn from others' experience.)
Assuming that profiling shows this function call is worth optimizing, should we expect the following code to be noticeably faster (e.g. without the need to use LTO)?
// main.c
...
if(object != NULL)
{
do_work_on_object(object, arg1, arg2);
}
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
assert(object != NULL) // generates no code in release build
// do lots of work
}
Some compilers (like GCC and clang) are able to do "shrink-wrap" optimization to delay saving call-preserved regs until after a possible early-out, if they're able to spot the pattern. But some don't, e.g. apparently MSVC 16.11 still doesn't.
I don't think any do partial inlining of just the early-out check into the caller, to avoid even the overhead of arg-passing and the call / ret itself.
Since compiler/linker support for this is not universal and not always successful even for shrink-wrapping, you can write your code in a way that gets much of the benefit, at the cost of splitting the logic of your function into two places.
If you have a fast-path that takes hardly any code, but happens often enough to matter, put that part in a header so it gets inlined, with a fallback to calling the rest of the function (which you make private, so it can assume that any checks in the inlined part are already done).
e.g. par2's routine that processes a block of data has a fast-path for when the galois16 factor is zero. (dst[i] += 0 * src[i] is a no-op, even when * is a multiply in Galois16, and += is a GF16 add (i.e. a bitwise XOR)).
Note how the commit in question renames the old function to InternalProcess, and adds a new template<class g> inline bool ReedSolomon<g>::Process that checks for the fast-path, and otherwise calls InternalProcess. (as well as making a bunch of unrelated whitespace changes, and some ifdefs... It was originally a 2006 CVS commit.)
The comment in the commit claims an overall 8% speed gain for repairing.
Neither the setup or cleanup state code can be short-circuited, because the resulted compiled code is static, and it doesn't know what will happen when the program get's executed. So the compiler will always have to setup the whole parameter stack.
Think of two situations: in one object is nil, in the other is not. How will the assembly code know if to put on the stack the rest of the argument? Especially as the caller is the one responsible of placing the arguments at their proper location (stack or registry).

How to know whether a calling convention is in use with a Qt project?

You know, when you define a function, you can define it's calling convention at the same time, like this:
int __stdcall foo(int) { return 1; };
int __cdecl bar(int) { return 1; };
And if you want to use this function to deduce template parameters, you have to deal with the calling convention, eg:
template<typename T> void tmpl_func( T( __stdcall f )(T) ) { other_func<T>(); };
With this template, you can only use "foo" as the parameter. If you use "bar", it can't be compiled.
tmpl_func(foo); // ok
tmpl_func(bar); // err
So, if you have a function that uses __cdecl, you have to define the template functions with both calling conventions, like this:
template<typename T> void tmpl_func( T( __stdcall f )(T) ) { other_func<T>(); };
template<typename T> void tmpl_func( T( __cdecl f )(T) ) { other_func<T>(); };
And foo instantiate the __stdcall version, bar instantiate the __cdecl version, no redefinition.
In general, this works well, but recently i met a problem:
In a Qt project, when i define template functions in the way i said above, the compiler said :__cdecl template function has already defined.
And if you only define tmpl_func in __stdcall version, it also can use bar as it's parameter.
Is this means that all the calling convention identifiers i wrote just dont work, and all functions are called in __cdecl way?
Why and how could i know whether this situation will happen? Can i check it using macros or compile options?
Sorry for my bad english, I wish i had made it clear, but this do troubles me for some time.
You know, when you define a function, you can define it's calling convention at the same time
This is wrong. What you describe is a (vendor specific) extension to C++. For example, on my GCC compiler on Linux, it probably won't work.
In a Qt project,
Don't use at all any explicit __stdcall or __cdecl annotation in your Qt code.
(using that in your Qt code is shooting yourself in your foot: it hurts badly; stick to standard C++11 code for a Qt5 project)
If you need to call some external weird function (with a strange & explicit calling convention) write some extern "C" (or some static inline one, if it is short) wrapping function doing that (and your wrapping function has a usual signature, without explicit calling convention annotation).
Pragmatically, when coding a Qt project, you compile it with all warnings & debug info (so g++ -Wall -g if using GCC), you debug it with your gdb debugger, and you later optimize (e.g. compile with g++ -Wall -g -O2) it -e.g. for benchmarking and production code- and you trust the compiler to optimize well, inline many function calls, and choose good enough calling conventions. Don't mess with calling conventions in your Qt code.
(implicitly you are asking if and how __stdcall or __cdecl is changing the type system of C++; and C++ types are complex enough already, you don't want even more mess)

Patch C/C++ function to just return without execution

I want to avoid one system function executing in a large project. It is impossible to redefine it or add some ifdef logic. So I want to patch the code to just the ret operation.
The functions are:
void __cdecl _wassert(const wchar_t *, const wchar_t *, unsigned);
and:
void __dj_assert(const char *, const char *, int, const char *) __attribute__((__noreturn__));
So I need to patch the first one on Visual C++ compiler, and the second one on GCC compiler.
Can I just write the ret instruction directly at the address of the _wassert/__dj_assert function, for x86/x64?
UPDATE:
I just wanna modify function body like this:
*_wassert = `ret`;
Or maybe copy another function body like this:
void __cdecl _wassert_emptyhar_t *, const wchar_t *, unsigned)
{
}
for (int i = 0; i < sizeof(void*); i++) {
((char*)_wassert)[i] = ((char*)_wassert_empty
}
UPDATE 2:
I really don't understand why there are so many objections against silent asserts. In fact, there is no asserts in the RELEASE mode, but nobody cares. I just want to be able turning on/off the asserts in the DEBUG mode.
You need to understand the calling conventions for your particular processor ISA and system ABI. See this for x86 & x86-64 calling conventions.
Some calling conventions require more than a single ret machine instruction in the epilogue, and you have to count with that. BTW, code of some function usually resides in a read-only code segment, and you'll need some dirty tricks to patch it and write inside it.
You could compile a no-op function of the same signature, and ask the compiler to show the emitted assembler code (e.g. with gcc -O -Wall -fverbose-asm -S if using GCC....)
On Linux you might use dynamic linker LD_PRELOAD tricks. If using a recent GCC you might perhaps consider customizing it with MELT, but I don't think it is worthwhile in your particular case...
However, you apparently have some assert failure. It is very unlikely that your program could continue without any undefined behavior. So practically speaking, your program will very likely crash elsewhere with your proposed "fix", and you'll lose more of your time with it.
Better take enough time to correct the original bug, and improve your development process. Your way is postponing a critical bug correction, and you are extremely likely to spend more time avoiding that bug fix than dealing with it properly (and finding it now, not later) as you should. Avoid increasing your technical debt and making your code base even more buggy and rotten.
My feeling is that you are going nowhere (except to a big failure) with your approach of patching the binary to avoid assert-s. You should find out why there are violated, and improve the code (either remove the obsolete assert, or improve it, or correct the bug elsewhere that assert has detected).
On Gnu/Linux you can use the --wrapoption like this:
gcc source.c -Wl,--wrap,functionToPatch -o prog
and your source must add the wrapper function:
void *__wrap_functionToPatch () {} // simply returns
Parameters and return values as needed for your function.

Calling convention for dynamically created function in (Visual) C++

I use the following types to create a new function at runtime:
typedef int (*pfunc)(int);
union funcptr {
pfunc x;
byte* y;
};
This enables me to write instructions in y and afterwards call the function like this:
byte* p = (byte*)VirtualAllocEx(GetCurrentProcess(), 0, 1<<16, MEM_COMMIT, PAGE_EXECUTE_READWRITE );
// Write some instructions to p
funcptr func;
func.y = p;
int ret = func.x(arg1); // Call the generated function
It is crucial to know how C++ prepare arguments (call convention) and therefore I have looked up the project properties (Visual C++) and I can see it uses __cdecl. It should put arguments on the stack according to: http://msdn.microsoft.com/en-us/library/aa271989(v=vs.60).aspx and http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl but when I look at the assembly generated, the argument is moved to the EAX register.
I want to be absolutely certain how the arguments is prepared. So have I overlooked something about cdecl or is Visual C++ optimizing the call, and if so, how do I ensure it doesn't happen?
Best regards, Lasse Espeholt
The EAX register is used for the return value of the function. You state in the comments that you are compiling using /Gd and so the function will use __cdecl. All the same it would make sense in my view to mark the declaration of your function type pfunc with an explicit __cdecl so that there can be no scope for confusion and mis-match.
Of course, there's nothing to stop you using one of the other calling conventions supported by your compiler. The most important point is that whatever calling convention you settle on, you should explicitly specify the calling convention for the function pointer since the compiler is only responsible for one half of the interface.
At least on Linux, you probably want to use libffi (foreign function interface), and it has even been ported to other systems (including Windows).
And if you want to do machine code generation at runtime, consider using GNU lightning, DotGnu's libjit, LLVM. LuaJit's dynasm etc You could also generate C code into foo.c, get it compiled by forking a gcc -fPIC -shared foo.c -o foo.so command, and dlopen("./foo.so", RTLD_GLOBAL) (and Windows have equivalent ability)

What is the meaning and usage of __stdcall?

I've come across __stdcall a lot these days.
MSDN doesn't explain very clearly what it really means, when and why should it be used, if at all.
I would appreciate if someone would provide an explanation, preferably with an example or two.
This answer covers 32-bit mode. (Windows x64 only uses 2 conventions: the normal one (which is called __fastcall if it has a name at all) and __vectorcall, which is the same except for how SIMD vector args like __m128i are passed).
Traditionally, C function calls are made with the caller pushing some parameters onto the stack, calling the function, and then popping the stack to clean up those pushed arguments.
/* example of __cdecl */
push arg1
push arg2
push arg3
call function
add esp,12 ; effectively "pop; pop; pop"
Note: The default convention — shown above — is known as __cdecl.
The other most popular convention is __stdcall. In it the parameters are again pushed by the caller, but the stack is cleaned up by the callee. It is the standard convention for Win32 API functions (as defined by the WINAPI macro in <windows.h>), and it's also sometimes called the "Pascal" calling convention.
/* example of __stdcall */
push arg1
push arg2
push arg3
call function // no stack cleanup - callee does this
This looks like a minor technical detail, but if there is a disagreement on how the stack is managed between the caller and the callee, the stack will be destroyed in a way that is unlikely to be recovered.
Since __stdcall does stack cleanup, the (very tiny) code to perform this task is found in only one place, rather than being duplicated in every caller as it is in __cdecl. This makes the code very slightly smaller, though the size impact is only visible in large programs.
(Optimizing compilers can sometimes leave space for args allocated across multiple cdecl calls made from the same function and mov args into it, instead of always add esp, n / push. That saves instructions but can increase code-size. For example gcc -maccumulate-outgoing-args always does this, and was good for performance on older CPUs before push was efficient.)
Variadic functions like printf() are impossible to get right with __stdcall, because only the caller really knows how many arguments were passed in order to clean them up. The callee can make some good guesses (say, by looking at a format string), but it's legal in C to pass more args to printf than the format-string references (they'll be silently ignored). Hence only __cdecl supports variadic functions, where the caller does the cleanup.
Linker symbol name decorations:
As mentioned in a bullet point above, calling a function with the "wrong" convention can be disastrous, so Microsoft has a mechanism to avoid this from happening. It works well, though it can be maddening if one does not know what the reasons are.
They have chosen to resolve this by encoding the calling convention into the low-level function names with extra characters (which are often called "decorations"), and these are treated as unrelated names by the linker. The default calling convention is __cdecl, but each one can be requested explicitly with the /G? parameter to the compiler.
__cdecl (cl /Gd ...)
All function names of this type are prefixed with an underscore, and the number of parameters does not really matter because the caller is responsible for stack setup and stack cleanup. It is possible for a caller and callee to be confused over the number of parameters actually passed, but at least the stack discipline is maintained properly.
__stdcall (cl /Gz ...)
These function names are prefixed with an underscore and appended with # plus the number of bytes of parameters passed. By this mechanism, it's not possible to call a function with the wrong amount of parameters. The caller and callee definitely agree on returning with a ret 12 instruction for example, to pop 12 bytes of stack args along with the return address.
You'll get a link-time or runtime DLL error instead of having a function return with ESP pointing somewhere the caller isn't expecting. (For example if you added a new arg and didn't recompile both the main program and the library. Assuming you didn't fool the system by making an earlier arg narrower, like int64_t -> int32_t.)
__fastcall (cl /Gr ...)
These function names start with an # sign and are suffixed with the #bytes count, much like __stdcall. The first 2 args are passed in ECX and EDX, the rest are passed on the stack. The byte count includes the register args. As with __stdcall, a narrow arg like char still uses up a 4-byte arg-passing slot (a register, or a dword on the stack).
Examples:
Declaration -----------------------> decorated name
void __cdecl foo(void); -----------------------> _foo
void __cdecl foo(int a); -----------------------> _foo
void __cdecl foo(int a, int b); -----------------------> _foo
void __stdcall foo(void); -----------------------> _foo#0
void __stdcall foo(int a); -----------------------> _foo#4
void __stdcall foo(int a, int b); -----------------------> _foo#8
void __fastcall foo(void); -----------------------> #foo#0
void __fastcall foo(int a); -----------------------> #foo#4
void __fastcall foo(int a, int b); -----------------------> #foo#8
Note that in C++, the normal name-mangling mechanism that allows function overloading is used instead of #8, not as well. So you'll only see actual numbers in extern "C" functions. For example, https://godbolt.org/z/v7EaWs for example.
All functions in C/C++ have a particular calling convention. The point of a calling convention is to establish how data is passed between the caller and callee and who is responsible for operations such as cleaning out the call stack.
The most popular calling conventions on windows are
__stdcall, Pushes parameters on the stack, in reverse order (right to left)
__cdecl, Pushes parameters on the stack, in reverse order (right to left)
__clrcall, Load parameters onto CLR expression stack in order (left to right).
__fastcall, Stored in registers, then pushed on stack
__thiscall, Pushed on stack; this pointer stored in ECX
Adding this specifier to the function declaration essentially tells the compiler that you want this particular function to have this particular calling convention.
The calling conventions are documented here
https://learn.microsoft.com/en-us/cpp/cpp/calling-conventions
Raymond Chen also did a long series on the history of the various calling conventions (5 parts) starting here.
https://devblogs.microsoft.com/oldnewthing/20040102-00/?p=41213
__stdcall is a calling convention: a way of determining how parameters are passed to a function (on the stack or in registers) and who is responsible for cleaning up after the function returns (the caller or the callee).
Raymond Chen wrote a blog about the major x86 calling conventions, and there's a nice CodeProject article too.
For the most part, you shouldn't have to worry about them. The only case in which you should is if you're calling a library function that uses something other than the default -- otherwise the compiler will generate the wrong code and your program will probably crash.
Unfortunately, there is no easy answer for when to use it and when not.
__stdcall means that the arguments to a function are pushed onto the stack from the first to the last. This is as opposed to __cdecl, which means that the arguments are pushed from last to first, and __fastcall, which places the first four (I think) arguments in registers, and the rest go on the stack.
You just need to know what the callee expects, or if you are writing a library, what your callers are likely expect, and make sure you document your chosen convention.
That's a calling convention that WinAPI functions need to be called properly. A calling convention is a set of rules on how the parameters are passed into the function and how the return value is passed from the function.
If the caller and the called code use different conventions you run into undefined behaviour (like such a strange-looking crash).
C++ compilers don't use __stdcall by default - they use other conventions. So in order to call WinAPI functions from C++ you need to specify that they use __stdcall - this is usually done in Windoes SDK header files and you also do it when declaring function pointers.
It specifies a calling convention for a function. A calling convention is a set of rules how parameters are passed to a function: in which order, per address or per copy, who is to clean up the parameters (caller or callee) etc.
__stdcall denotes a calling convention (see this PDF for some details). This means it specifies how function arguments are pushed and popped from the stack, and who is responsible.
__stdcall is just one of several calling conventions, and is used throughout the WINAPI. You must use it if you provide function pointers as callbacks for some of those functions. In general, you do not need to denote any specific calling convention in your code, but just use the compiler's default, except for the case noted above (providing callbacks to 3rd party code).
simply put when you call function, it gets loaded in stack/register. __stdcall is one convention/way(right argument first, then left argument ...), __decl is another convention that are used to load the function on the stack or registers.
If you use them you instruct the computer to use that specific way to load/unload the function during linking and hence you would not get a mismatch/crash.
Otherwise the function-callee and function-caller might use different conventions causing program to crash.
__stdcall is the calling convention used for the function. This tells the compiler the rules that apply for setting up the stack, pushing arguments and getting a return value. There are a number of other calling conventions like __cdecl, __thiscall, __fastcall and __naked.
__stdcall is the standard calling convention for Win32 system calls.
More details can be found on Wikipedia.