Compiler ignore __stdcall - c++

It seems to me, that MSVS ignores __stdcall directive on my functions. I'm cleaning up the stack manually, but the compiler still append ADD ESP instructions after each CALL.
This is how I declare the function:
extern "C" void * __stdcall core_call(int addr, ...);
#define function(...) (DWORD WINAPI) core_call(12345, __VA_ARGS__)
return function("Hello", 789);
And this is how the output looks like:
(source: server4u.cz)
I've marked with arrows redundant ADD instructions, which MSVS automatically append after each call, despite the fact, that cleaining the stack is a callee responsibility (reference: http://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions) and this causes the crash of my progrm. If I manually replace the ADD instructions with NOPs, program works as supposed. So, my question is... Is there a way how to force the compiler to stop addaing these instructions?
Thanks.

The problem is here: , ...).
Functions with variable number of arguments cannot be __stdcall.
__stdcall functions must remove all their stack arguments from the stack at the end, but they can't know in advance how much stuff they will receive as parameters.
The same holds for __fastcall functions.
The only applicable calling convention for functions with variable number of arguments is __cdecl, where the caller has to remove the stack parameters after the call. And that's what the compiler uses despite your request to use __stdcall.

Related

MSVS 2010 C++ Compiler and Stack alignment issue?

My problem is MSVS 2010 C++ compiler is generating code in a way after returning from a function call resolved in runtime(GetProcAddress+GetModuleHandle) from another dll the compiler then tries to align stack this way:
CALL DWORD PTR DS:[2000367C] ; apiresolvedinruntime.dll
ADD ESP,12 ; <- this is the stack alignment
This is of course overwriting the return address and my program crashes, can someone explain me why compiler aligning the stack when it really shouldn't do it?
You didn't call the runtime loaded function using the correct calling convention. Calling convention specifies the default handling of what happens to the stack. Most likely, the DLL was compiled using the __stdcall calling convention (which is what e.g. the Windows DLLs use), which specifies that the called function is supposed to clean up the stack, but the calling code was declared with a function pointer using the __cdecl calling convention (which is the default). Under __cdecl, functions support variadic arguments, so the caller needs to do the cleanup of the stack, because the called function does not know how many arguments are passed.
You need to verify that the DLL and the calling code are compiled using the same calling conventions.

__cdecl results in larger executable than __stdcall?

I found this:
Because the stack is cleaned by the called function, the __stdcall
calling convention creates smaller executables than __cdecl, in which
the code for stack cleanup must be generated for each function call.
Suppose I got 2 functions:
void __cdecl func1(int x)
{
//do some stuff using x
}
void __stdcall func2(int x, int y)
{
//do some stuff using x, y
}
and here in the main():
int main()
{
func1(5);
func2(5, 6);
}
IMO, it is main()'s responsibility to clean up the stack of the call to func1(5), and func2 will clean up the stack of the call to func2(5,6), right?
Four questions:
1.For the call to func1 in main(), it's main's responsibility to clean up the stack, so will compiler insert some code (code to clean up the stack) before and after the call to func? Like this:
int main()
{
before_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func1(5);
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func2(5, 6);
}
2.For the call to func2 in main(), it's func2's own job to clean up the stack, so I presume, no code will be inserted in main() before or after the call to func2, right?
3.Because func2 is __stdcall, so I presume, compiler will automatically insert code (to clean up the stack) like this:
void __stdcall func1(int x, int y)
{
before_call_to_stdcall_func(); //compiler generated code for stack-clean-up of stdcall-func-call
//do some stuff using x, y
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of stdcall-func-call
}
I presume right?
4.Finally, back to the quoted words, why __stdcall results in smaller executable than __cdecl? And there is no such a thing as __stdcall in linux, right? Does it means linux elf will be always larger than exe in win?
It'll only insert code after the call, which is to reset the stack pointer, so long as there where call arguments.*
__stdcall generates no cleanup code at the call site, however, it should be noted that compilers can accrue stack cleanup from multiple __cdecl calls into one cleanup, or it can delay the cleanup to prevent pipeline stalls.
Ignoring the inverted order in this example, no, it'll only insert code to cleanup the __cdecl function, setting up of function arguments is something different (different compilers generate/prefer different methods).
__stdcall was more a windows thing, see this. the size of the binary depends on the number of calls to the __cdecl funcs, more calls means more clean up code, where as __stdcall has only 1 singular instance of cleanup code. however, you shouldn't see that much size increase, as at most you have a few bytes per call.
*Its important to distinguish between cleanup and setting up call parameters.
Historically, the first C++ compilers used the equivalent of
__stdcall. From a quality of implementation point of view, I'd expect
the C compiler to use the __cdecl convensions, and the C++ compiler
the __stdcall (which were known as the Pascal convensions back then).
This is one thing that the early Zortech compiles got right.
Of course, vararg functions must still use __cdecl conventions. The
callee can't clean up the stack if it doesn't know how much to clean up.
(Note that the C standard was carefully designed to allow the
__stdcall conventions in C as well. I only know of one compiler which
took advantage of this, however; the amount of existing code at the time
which called vararg functions without a prototype in view was enormous,
and while the standard declared it broken, compiler implementors didn't
want to break their clients' code.)
In a lot of milieu, there seems to be a very strong tendency to insist
that the C and the C++ conventions be the same, that one can take the
address of an extern "C++" function, and pass it to a function written
in C which calls it. IIRC, for example, g++ doesn't treat
extern "C" void f();
and
void f();
as having two different types (although the standard requires it), and
allows passing the address of a static member function to
pthread_create, for example. The result is that such compilers use
the exact same conventions everywhere, and on Intel, they are the
equivalent of __cdecl.
Many compilers have extensions to support other convensions. (Why they
don't use the standard extern "xxx", I don't know.) The syntax for
these extensions is very varied, however. Microsoft puts the attribute
directly before the function name:
void __stdcall func( int, int );
, g++ puts it in a special attribute clause after the function
declaration:
void func( int, int ) __attribute__((stdcall));
The C++11 has added a standard way of specifying attributes:
void [[stdcall]] func( int, int );
It doesn't specify stdcall as an attribute, but it does specify that
additional attributes (other than those defined in the standard) may be
specified, and are implementation dependent. I expect that both g++ and
VC++ accept this syntax in their most recent versions, at least if C++11
is activated. The exact name of the attribute (__stdcall, stdcall,
etc.) may vary, however, so you probably want to wrap this in a macro.
Finally: in a modern compiler with optimization turned on, the
difference in the calling conventions is probably negligible.
Attributes like const (not to be confused with the C++ keyword
const), regparm or noreturn will probably have a larger impact,
both in terms of executable size and performance.
This calling convention crowd is history by the new 64-bit ABI.
http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
There is also the ABI side of things for different architectures. (like ARM)
Not everything executes the same for all architectures. So do not bother thinking about this calling convention thing !
http://en.wikipedia.org/wiki/Calling_convention
EXE size improvement is insignificant (maybe nonexistent), do not bother...
__cdecl is much more flexible than __stdcall. Variable number of arguments flexibility, the insignificance of cleanup code (instruction), __cdecl function can be called with wrong number of arguments and this does not necessarily cause a serious problem ! But the same situation with __stdcall always goes wrong !
Others have answered the other parts of your question, so I'll just add my answer about the size:
4.Finally, back to the quoted words, why __stdcall results in smaller executable than __cdecl?
That appears to not be true. I tested it by compiling libudis with and without the stdcall calling convention. First without:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -c *.c && strip *.o
$ du -cb *.o
6524 decode.o
95932 itab.o
1434 syn-att.o
1706 syn-intel.o
2288 syn.o
1245 udis86.o
109129 totalt
And with. It is the -mrtd switch that enables stdcall:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -mrtd -c *.c && strip *.o
7084 decode.o
95932 itab.o
1502 syn-att.o
1778 syn-intel.o
2296 syn.o
1305 udis86.o
109897 totalt
As you can see, cdecl beats stdcall with a few hundred bytes. It could be my testing methodology that is flawed, or clang's stdcall code generator is weak. But I think that with modern compilers the extra flexibility afforded by caller cleanup means that they will always generate better code with cdecl rather than stdcall.

stdcall and cdecl

There are (among others) two types of calling conventions - stdcall and cdecl. I have few questions on them:
When a cdecl function is called, how does a caller
know if it should free up the stack ? At the call site, does the
caller know if the function being called is a cdecl or a stdcall
function ? How does it work ? How does the caller know if it should
free up the stack or not ? Or is it the linkers responsibility ?
If a function which is declared as stdcall calls a function(which
has a calling convention as cdecl), or the other way round, would
this be inappropriate ?
In general, can we say that which call will be faster - cdecl or
stdcall ?
Raymond Chen gives a nice overview of what __stdcall and __cdecl does.
(1) The caller "knows" to clean up the stack after calling a function because the compiler knows the calling convention of that function and generates the necessary code.
void __stdcall StdcallFunc() {}
void __cdecl CdeclFunc()
{
// The compiler knows that StdcallFunc() uses the __stdcall
// convention at this point, so it generates the proper binary
// for stack cleanup.
StdcallFunc();
}
It is possible to mismatch the calling convention, like this:
LRESULT MyWndProc(HWND hwnd, UINT msg,
WPARAM wParam, LPARAM lParam);
// ...
// Compiler usually complains but there's this cast here...
windowClass.lpfnWndProc = reinterpret_cast<WNDPROC>(&MyWndProc);
So many code samples get this wrong it's not even funny. It's supposed to be like this:
// CALLBACK is #define'd as __stdcall
LRESULT CALLBACK MyWndProc(HWND hwnd, UINT msg
WPARAM wParam, LPARAM lParam);
// ...
windowClass.lpfnWndProc = &MyWndProc;
However, assuming the programmer doesn't ignore compiler errors, the compiler will generate the code needed to clean up the stack properly since it'll know the calling conventions of the functions involved.
(2) Both ways should work. In fact, this happens quite frequently at least in code that interacts with the Windows API, because __cdecl is the default for C and C++ programs according to the Visual C++ compiler and the WinAPI functions use the __stdcall convention.
(3) There should be no real performance difference between the two.
In CDECL arguments are pushed onto the stack in revers order, the caller clears the stack and result is returned via processor registry (later I will call it "register A"). In STDCALL there is one difference, the caller doeasn't clear the stack, the calle do.
You are asking which one is faster. No one. You should use native calling convention as long as you can. Change convention only if there is no way out, when using external libraries that requires certain convention to be used.
Besides, there are other conventions that compiler may choose as default one i.e. Visual C++ compiler uses FASTCALL which is theoretically faster because of more extensive usage of processor registers.
Usually you must give a proper calling convention signature to callback functions passed to some external library i.e. callback to qsort from C library must be CDECL (if the compiler by default uses other convention then we must mark the callback as CDECL) or various WinAPI callbacks must be STDCALL (whole WinAPI is STDCALL).
Other usual case may be when you are storing pointers to some external functions i.e. to create a pointer to WinAPI function its type definition must be marked with STDCALL.
And below is an example showing how does the compiler do it:
/* 1. calling function in C++ */
i = Function(x, y, z);
/* 2. function body in C++ */
int Function(int a, int b, int c) { return a + b + c; }
CDECL:
/* 1. calling CDECL 'Function' in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call (jump to function body, after function is finished it will jump back here, the address where to jump back is in registers)
move contents of register A to 'i' variable
pop all from the stack that we have pushed (copy of x, y and z)
/* 2. CDECL 'Function' body in pseudo-assembler */
/* Now copies of 'a', 'b' and 'c' variables are pushed onto the stack */
copy 'a' (from stack) to register A
copy 'b' (from stack) to register B
add A and B, store result in A
copy 'c' (from stack) to register B
add A and B, store result in A
jump back to caller code (a, b and c still on the stack, the result is in register A)
STDCALL:
/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then a copy of 'y', then a copy of 'x'
call
move contents of register A to 'i' variable
/* 2. STDCALL 'Function' body in pseaudo-assembler */
pop 'a' from stack to register A
pop 'b' from stack to register B
add A and B, store result in A
pop 'c' from stack to register B
add A and B, store result in A
jump back to caller code (a, b and c are no more on the stack, result in register A)
I noticed a posting that say that it does not matter if you call a __stdcall from a __cdecl or visa versa. It does.
The reason: with __cdecl the arguments that are passed to the called functions are removed form the stack by the calling function, in __stdcall, the arguments are removed from the stack by the called function. If you call a __cdecl function with a __stdcall, the stack is not cleaned up at all, so eventually when the __cdecl uses a stacked based reference for arguments or return address will use the old data at the current stack pointer. If you call a __stdcall function from a __cdecl, the __stdcall function cleans up the arguments on the stack, and then the __cdecl function does it again, possibly removing the calling functions return information.
The Microsoft convention for C tries to circumvent this by mangling the names. A __cdecl function is prefixed with an underscore. A __stdcall function prefixes with an underscore and suffixed with an at sign “#” and the number of bytes to be removed. Eg __cdecl f(x) is linked as _f, __stdcall f(int x) is linked as _f#4 where sizeof(int) is 4 bytes)
If you manage to get past the linker, enjoy the debugging mess.
I want to improve on #adf88's answer. I feel that pseudocode for the STDCALL does not reflect the way of how it happens in reality. 'a', 'b', and 'c' aren't popped from the stack in the function body. Instead they are popped by the ret instruction (ret 12 would be used in this case) that in one swoop jumps back to the caller and at the same time pops 'a', 'b', and 'c' from the stack.
Here is my version corrected according to my understanding:
STDCALL:
/* 1. calling STDCALL in pseudo-assembler (similar to what the compiler outputs) */
push on the stack a copy of 'z', then copy of 'y', then copy of 'x'
call
move contents of register A to 'i' variable
/* 2. STDCALL 'Function' body in pseaudo-assembler */
copy 'a' (from stack) to register A
copy 'b' (from stack) to register B
add A and B, store result in A
copy 'c' (from stack) to register B
add A and B, store result in A
jump back to caller code and at the same time pop 'a', 'b' and 'c' off the stack (a, b and
c are removed from the stack in this step, result in register A)
It's specified in the function type. When you have a function pointer, it's assumed to be cdecl if not explicitly stdcall. This means that if you get a stdcall pointer and a cdecl pointer, you can't exchange them. The two function types can call each other without issues, it's just getting one type when you expect the other. As for speed, they both perform the same roles, just in a very slightly different place, it's really irrelevant.
The caller and the callee need to use the same convention at the point of invokation - that's the only way it could reliably work. Both the caller and the callee follow a predefined protocol - for example, who needs to clean up the stack. If conventions mismatch your program runs into undefined behavior - likely just crashes spectacularly.
This is only required per invokation site - the calling code itself can be a function with any calling convention.
You shouldn't notice any real difference in performance between those conventions. If that becomes a problem you usually need to make less calls - for example, change the algorithm.
Those things are Compiler- and Platform-specific. Neither the C nor the C++ standard say anything about calling conventions except for extern "C" in C++.
how does a caller know if it should free up the stack ?
The caller knows the calling convention of the function and handles the call accordingly.
At the call site, does the caller know if the function being called is a cdecl or a stdcall function ?
Yes.
How does it work ?
It is part of the function declaration.
How does the caller know if it should free up the stack or not ?
The caller knows the calling conventions and can act accordingly.
Or is it the linkers responsibility ?
No, the calling convention is part of a function's declaration so the compiler knows everything it needs to know.
If a function which is declared as stdcall calls a function(which has a calling convention as cdecl), or the other way round, would this be inappropriate ?
No. Why should it?
In general, can we say that which call will be faster - cdecl or stdcall ?
I don't know. Test it.
a) When a cdecl function is called by the caller, how does a caller know if it should free up the stack?
The cdecl modifier is part of the function prototype (or function pointer type etc.) so the caller get the info from there and acts accordingly.
b) If a function which is declared as stdcall calls a function(which has a calling convention as cdecl), or the other way round, would this be inappropriate?
No, it's fine.
c) In general, can we say that which call will be faster - cdecl or stdcall?
In general, I would refrain from any such statements. The distinction matters eg. when you want to use va_arg functions. In theory, it could be that stdcall is faster and generates smaller code because it allows to combine popping the arguments with popping the locals, but OTOH with cdecl, you can do the same thing, too, if you're clever.
The calling conventions that aim to be faster usually do some register-passing.
Calling conventions have nothing to do with the C/C++ programming languages and are rather specifics on how a compiler implements the given language. If you consistently use the same compiler, you never need to worry about calling conventions.
However, sometimes we want binary code compiled by different compilers to inter-operate correctly. When we do so we need to define something called the Application Binary Interface (ABI). The ABI defines how the compiler converts the C/C++ source into machine-code. This will include calling conventions, name mangling, and v-table layout. cdelc and stdcall are two different calling conventions commonly used on x86 platforms.
By placing the information on the calling convention into the source header, the compiler will know what code needs to be generated to inter-operate correctly with the given executable.

What is the meaning and usage of __stdcall?

I've come across __stdcall a lot these days.
MSDN doesn't explain very clearly what it really means, when and why should it be used, if at all.
I would appreciate if someone would provide an explanation, preferably with an example or two.
This answer covers 32-bit mode. (Windows x64 only uses 2 conventions: the normal one (which is called __fastcall if it has a name at all) and __vectorcall, which is the same except for how SIMD vector args like __m128i are passed).
Traditionally, C function calls are made with the caller pushing some parameters onto the stack, calling the function, and then popping the stack to clean up those pushed arguments.
/* example of __cdecl */
push arg1
push arg2
push arg3
call function
add esp,12 ; effectively "pop; pop; pop"
Note: The default convention — shown above — is known as __cdecl.
The other most popular convention is __stdcall. In it the parameters are again pushed by the caller, but the stack is cleaned up by the callee. It is the standard convention for Win32 API functions (as defined by the WINAPI macro in <windows.h>), and it's also sometimes called the "Pascal" calling convention.
/* example of __stdcall */
push arg1
push arg2
push arg3
call function // no stack cleanup - callee does this
This looks like a minor technical detail, but if there is a disagreement on how the stack is managed between the caller and the callee, the stack will be destroyed in a way that is unlikely to be recovered.
Since __stdcall does stack cleanup, the (very tiny) code to perform this task is found in only one place, rather than being duplicated in every caller as it is in __cdecl. This makes the code very slightly smaller, though the size impact is only visible in large programs.
(Optimizing compilers can sometimes leave space for args allocated across multiple cdecl calls made from the same function and mov args into it, instead of always add esp, n / push. That saves instructions but can increase code-size. For example gcc -maccumulate-outgoing-args always does this, and was good for performance on older CPUs before push was efficient.)
Variadic functions like printf() are impossible to get right with __stdcall, because only the caller really knows how many arguments were passed in order to clean them up. The callee can make some good guesses (say, by looking at a format string), but it's legal in C to pass more args to printf than the format-string references (they'll be silently ignored). Hence only __cdecl supports variadic functions, where the caller does the cleanup.
Linker symbol name decorations:
As mentioned in a bullet point above, calling a function with the "wrong" convention can be disastrous, so Microsoft has a mechanism to avoid this from happening. It works well, though it can be maddening if one does not know what the reasons are.
They have chosen to resolve this by encoding the calling convention into the low-level function names with extra characters (which are often called "decorations"), and these are treated as unrelated names by the linker. The default calling convention is __cdecl, but each one can be requested explicitly with the /G? parameter to the compiler.
__cdecl (cl /Gd ...)
All function names of this type are prefixed with an underscore, and the number of parameters does not really matter because the caller is responsible for stack setup and stack cleanup. It is possible for a caller and callee to be confused over the number of parameters actually passed, but at least the stack discipline is maintained properly.
__stdcall (cl /Gz ...)
These function names are prefixed with an underscore and appended with # plus the number of bytes of parameters passed. By this mechanism, it's not possible to call a function with the wrong amount of parameters. The caller and callee definitely agree on returning with a ret 12 instruction for example, to pop 12 bytes of stack args along with the return address.
You'll get a link-time or runtime DLL error instead of having a function return with ESP pointing somewhere the caller isn't expecting. (For example if you added a new arg and didn't recompile both the main program and the library. Assuming you didn't fool the system by making an earlier arg narrower, like int64_t -> int32_t.)
__fastcall (cl /Gr ...)
These function names start with an # sign and are suffixed with the #bytes count, much like __stdcall. The first 2 args are passed in ECX and EDX, the rest are passed on the stack. The byte count includes the register args. As with __stdcall, a narrow arg like char still uses up a 4-byte arg-passing slot (a register, or a dword on the stack).
Examples:
Declaration -----------------------> decorated name
void __cdecl foo(void); -----------------------> _foo
void __cdecl foo(int a); -----------------------> _foo
void __cdecl foo(int a, int b); -----------------------> _foo
void __stdcall foo(void); -----------------------> _foo#0
void __stdcall foo(int a); -----------------------> _foo#4
void __stdcall foo(int a, int b); -----------------------> _foo#8
void __fastcall foo(void); -----------------------> #foo#0
void __fastcall foo(int a); -----------------------> #foo#4
void __fastcall foo(int a, int b); -----------------------> #foo#8
Note that in C++, the normal name-mangling mechanism that allows function overloading is used instead of #8, not as well. So you'll only see actual numbers in extern "C" functions. For example, https://godbolt.org/z/v7EaWs for example.
All functions in C/C++ have a particular calling convention. The point of a calling convention is to establish how data is passed between the caller and callee and who is responsible for operations such as cleaning out the call stack.
The most popular calling conventions on windows are
__stdcall, Pushes parameters on the stack, in reverse order (right to left)
__cdecl, Pushes parameters on the stack, in reverse order (right to left)
__clrcall, Load parameters onto CLR expression stack in order (left to right).
__fastcall, Stored in registers, then pushed on stack
__thiscall, Pushed on stack; this pointer stored in ECX
Adding this specifier to the function declaration essentially tells the compiler that you want this particular function to have this particular calling convention.
The calling conventions are documented here
https://learn.microsoft.com/en-us/cpp/cpp/calling-conventions
Raymond Chen also did a long series on the history of the various calling conventions (5 parts) starting here.
https://devblogs.microsoft.com/oldnewthing/20040102-00/?p=41213
__stdcall is a calling convention: a way of determining how parameters are passed to a function (on the stack or in registers) and who is responsible for cleaning up after the function returns (the caller or the callee).
Raymond Chen wrote a blog about the major x86 calling conventions, and there's a nice CodeProject article too.
For the most part, you shouldn't have to worry about them. The only case in which you should is if you're calling a library function that uses something other than the default -- otherwise the compiler will generate the wrong code and your program will probably crash.
Unfortunately, there is no easy answer for when to use it and when not.
__stdcall means that the arguments to a function are pushed onto the stack from the first to the last. This is as opposed to __cdecl, which means that the arguments are pushed from last to first, and __fastcall, which places the first four (I think) arguments in registers, and the rest go on the stack.
You just need to know what the callee expects, or if you are writing a library, what your callers are likely expect, and make sure you document your chosen convention.
That's a calling convention that WinAPI functions need to be called properly. A calling convention is a set of rules on how the parameters are passed into the function and how the return value is passed from the function.
If the caller and the called code use different conventions you run into undefined behaviour (like such a strange-looking crash).
C++ compilers don't use __stdcall by default - they use other conventions. So in order to call WinAPI functions from C++ you need to specify that they use __stdcall - this is usually done in Windoes SDK header files and you also do it when declaring function pointers.
It specifies a calling convention for a function. A calling convention is a set of rules how parameters are passed to a function: in which order, per address or per copy, who is to clean up the parameters (caller or callee) etc.
__stdcall denotes a calling convention (see this PDF for some details). This means it specifies how function arguments are pushed and popped from the stack, and who is responsible.
__stdcall is just one of several calling conventions, and is used throughout the WINAPI. You must use it if you provide function pointers as callbacks for some of those functions. In general, you do not need to denote any specific calling convention in your code, but just use the compiler's default, except for the case noted above (providing callbacks to 3rd party code).
simply put when you call function, it gets loaded in stack/register. __stdcall is one convention/way(right argument first, then left argument ...), __decl is another convention that are used to load the function on the stack or registers.
If you use them you instruct the computer to use that specific way to load/unload the function during linking and hence you would not get a mismatch/crash.
Otherwise the function-callee and function-caller might use different conventions causing program to crash.
__stdcall is the calling convention used for the function. This tells the compiler the rules that apply for setting up the stack, pushing arguments and getting a return value. There are a number of other calling conventions like __cdecl, __thiscall, __fastcall and __naked.
__stdcall is the standard calling convention for Win32 system calls.
More details can be found on Wikipedia.

Why do thread functions need to be declared as '__cdecl'?

Sample code that shows how to create threads using MFC declares the thread function as both static and __cdecl. Why is the latter required? Boost threads don't bother with this convention, so is it just an anachronism?
For example (MFC):
static __cdecl UINT MyFunc(LPVOID pParam)
{
...
}
CWinThread* pThread = AfxBeginThread(MyFunc, ...);
Whereas Boost:
static void func()
{
...
}
boost::thread t;
t.create(&func);
(the code samples might not be 100% correct as I am nowhere near an IDE).
What is the point of __cdecl? How does it help when creating threads?
__cdecl tells the compiler to use the C calling convention (as opposed to the stdcall, fastcall or whatever other calling convention your compiler supports). I believe, VC++ uses stdcall by default.
The calling convention affects things such as how arguments are pushed onto the stack (or registers, in the case of fastcall) and who pops arguments off the stack (caller or callee).
In the case of Boost. I believe it uses template specialization to figure out the appropriate function type and calling convention.
Look at the prototype for AfxBeginThread():
CWinThread* AfxBeginThread(
AFX_THREADPROC pfnThreadProc,
LPVOID pParam,
int nPriority = THREAD_PRIORITY_NORMAL,
UINT nStackSize = 0,
DWORD dwCreateFlags = 0,
LPSECURITY_ATTRIBUTES lpSecurityAttrs = NULL
);
AFX_THREADPROC is a typedef for UINT(AFX_CDECL*)(LPVOID). When you pass a function to AfxBeginThread(), it must match that prototype, including the calling convention.
The MSDN pages on __cdecl and __stdcall (as well as __fastcall and __thiscall) explain the pros and cons of each calling convention.
The boost::thread constructor uses templates to allow you to pass a function pointer or callable function object, so it doesn't have the same restrictions as MFC.
Because your thread is going to be called by a runtime function that manages this for you, and that function expects it to be that way. Boost designed it a different way.
Put a breakpoint at the start of your thread function and look at the stack when it gets called, you'll see the runtime function that calls you.
C/C++ compilers by default use the C calling convention (pushing rightmost param first on the stack) for it allows working with functions with variable argument number as printf.
The Pascal calling convention (aka "fastcall") pushes leftmost param first. This is quicker though costs you the possibility of easy variable argument functions (I read somewhere they're still possible, though you need to use some tricks).
Due to the speed resulting from using the Pascal convention, both Win32 and MacOS APIs by default use that calling convention, except in certain cases.
If that function has only one param, in theory using either calling convention would be legal, though the compiler may enforce the same calling convention is used to avoid any problem.
The boost libraries were designed with an eye on portability, so they should be agnostic as to which caller convention a particular compiler is using.
The real answer has to do with how windows internally calls the thread proc routine, and it is expecting the function to abide by a specific calling convention, which in this case is a macro, WINAPI, which according to my system is defined as:
#define WINAPI __stdcall
This means that the called function is responsible for cleaning up the stack. The reason why boost::thread is able to support arbitrary functions is that it passes a pointer to the function object used in the call to thread::create function to CreateThread. The threadproc associated with the thread simply calls operator() on the function object.
The reason MFC requires __cdecl therefore has to do with the way it internally calls the function passed in to the call to AfxBeginThread. There is no good reason to do this unless they were planning on allowing vararg parameters...