I was wondering, it is possible to explicitly specify a custom calling convention, but considering the maturity and amount of optimizations found in the compiler, when no calling convention is specified, can I expect the compiler to pick the best one for the particular function, for example if parameters are few and primitive use fastcall and so on...
It is a "convention" for a reason. Everybody has to follow the convention or you couldn't call your function from another module.
However, if the function is not visible then GCC has options. It may inline the function or call it however it wants to. It might even split it into "hot" and "cold" parts and inline the hot code path. That usually only happens when building with profile guided optimization.
If you want GCC to make optimizations like that, work on hiding your functions. If you are building an executable look at -fwhole-program. If you are building libraries look at -fvisibility=hidden. Also look into -flto.
Related
At the moment I'm working on a anticheat. I added a way to detect any hooking to the directx functions, since those are what most cheats do.
The problem comes in when a lot of programs, such as OBS, Fraps and many other programs that hook directx get their hook detected too.
So to be able to hook directx, you will most probabbly have to call VirtualProtect. If I could determine what address this is being called from, then I could loop through all dll's in memory, and then find what module it has been called from, and then sending the information to the server, maybe perhaps even taking a md5 hash and sending it to the server for validation.
I could also hook the DirectX functions that the cheats hook and check where those get called from (since most of them use ms detours).
I looked it up, and apparently you can check the call stack, but every example I found did not seem to help me.
This -getting the caller's address- is not possible in standard C++. And many C++ compilers might optimize some calls (e.g. by inlining them, even when you don't specify inline, or because there is no more any framepointer, e.g. compiler option -fomit-frame-pointerfor x86 32 bits with GCC, or by optimizing a tail-call ....) to the point that the question might not make any sense.
With some implementations and some C or C++ standard libraries and some (but not all) compiler options (in particular, don't ask the compiler to optimize too much*) you might get it, e.g. (on Linux) use backtrace from GNU glibc or I.Taylor's libbacktrace (from inside GCC implementation) or GCC return address builtins.
I don't know how difficult would it be to port these to Windows (Perhaps Cygwin did it). The GCC builtins might somehow work, if you don't optimize too much.
Read also about continuations. See also this answer to a related question.
Note *: on Linux, better compile all the code (including external libraries!) with at most g++ -Wall -g -O1 : you don't want too much optimization, and you want the debug information (in particular for libbacktrace)
Ray Chen's blog 'The old new thing' covers using return address' to make security decisions and why its a pretty pointless thing
https://devblogs.microsoft.com/oldnewthing/20060203-00/?p=32403
https://devblogs.microsoft.com/oldnewthing/20040101-00/?p=41223
Basically its pretty easy to fake (by injecting code or using a manually constructed fake stack to trick you). Its Windows centric but the basic concepts are generally applicable.
I want to measure the performance and in general the behaviour (how much assembly is created, etc) of some inline functions i use around my project. Other than profiling timings is it possible to look at the overral code expansion of the functions that use those inline ones?
I tried in Visual C++ and MingW (through NetBeans) to look at the Disassembly panel during debugging. With debug building every inline function use call in the assembly so they are not inlined. If i activate optimizations the assembly is so changed that i cannot even put breakpoints inside those functions.
Do you know any compiler settings (in GCC or VC, for example, just optimizing inline functions), book (i have "Efficient C++" that talks about inlining measuring timings) or anything else to understand better the topic?
Here is the link to the compiler switch in VS. If you just want to test inline only enable this optimization.
Tools like Intel VTune can profile inlined functions. They use the debugging info in the binary to map the instruction pointers back to the function "from which the code was derived" even when there is no actual call anymore in the assembly.
You can see this effect while looking at annotated assembly with some tools as well - the source code for several functions will be mixed together which reflects the inlining.
This process isn't perfect, since "to what function does this particular instruction belong" almost becomes a philosophical one rather than a technical one for some types of inlining (indeed, instructions may effectively be "shared" among several functions).
I am wondering whether there is any difference between inlining functions on a linker level or compiler level in terms of execution speed?
e.g. if I have all my functions in .cpp files and rely on the linker to do inlining, will this inlining potentially be less efficient than say defining some functions in the headers for selected inlining on the compiler level or unity builds without any linking and all inlining done by the compiler?
If the linker is just as efficient, why would one then still bother inlining functions explicitly on the compiler level? Is that just for convenience, say there is just a one line constructor hence one can't be bothered with a .cpp file?
I suppose this might depend on the compiler, in which case I would be most interested in Visual C++ (Windows) and gcc (Linux).
Thanks
The general rule is all else being equal the closer to execution (compiling->linking->(maybe JIT)->execution) the optimization is done the more data the optimizer has and the better optimization it can perform. So unless the optimizer is dumb you should expect better results when inlining is done by the linker - the linker will know more about the invokation context and do better optimization.
Generally, by the time the linker is run, your source has already been compiled into machine code. The linkers job is to take all the code fragments and link then together (possibly fixing addresses along the way). In such a case, there is no room for performing inlining.
But all is not lost. Gcc does provide a mechanism for link time optimization (using the -flto) option when compiling and linking. This causes gcc to produce a byte code that can then be compiled and linked by the linker into a single executable. Since the byte code contains more information than optimized machine code. The linker can now perform radical optimization on the whole codebase. Something that the compiler cannot do.
See here for more details on gcc. Not to sure about VC++ though.
Inlining is normally performed within a single translation unit (.cpp file). When you call functions in another file, they’re never inlined.
Link Time Optimization (LTO) changes this, allowing inlining to work across translation units. It should always be equal or better (sometimes very very significantly) to regular linking in terms of how efficient the generated code is.
The reason both options are still available is that LTO can take a large amount of RAM and CPU – I’ve had VC++ take several minutes on linking a large C++ project before. Sometimes it’s not worth it to enable until you ship. You could also run out of address space with a large enough project, as it has to load all that bytecode into RAM.
For writing efficient code, nothing changes – all the same rules apply with LTO. It is potentially more efficient to explicitly define an inline function in a header file versus depending on LTO to inline it. The inline keyword only provides a hint so there’s no guarantee, but it might nudge it into being inlined where normally (with or without LTO) it wouldn’t be.
If the function is inlined, there would be no difference.
I believe the main reason for having inline functions defined in the headers is history. Another is portability. Until resently most compilers did not do link time code generation, so it having the functions in the headers was a necessity. That of course affects code bases started on more than a couple of years ago.
Also, if you still target some compilers that don't support link time code generation, you dont have a choice.
As an aside, I have in one case been forced to add a pragma to ask one specific compiler not to inline an init() function defined in one .cpp file, but potentially called from many places.
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers. It would make sense that this optimization will kick in if there are only 1-2 parameters, and not when there are 256 (not that one would want to have the max number of parameters).
How can one find out the parameter limit (number of parameters) for a certain compiler (such as gcc) where one can be sure that this optimization will be used?
Function parameters are placed on the stack, but compilers can optimize this task by the use of optional registers.
As FrankH says in his comments and as I'm going to say in my answer, the application binary interface for the system in question determines how arguments are passed to functions - this is called the calling convention for that platform.
To complicate matters, x86 32-bit actually has several. This is historical and comes from the fact that when Win32 bit arrived, everyone went crazy doing different things.
So, yes, you can "optimise" by writing function calls in such a way, but no, you shouldn't. You should follow the standards for your platform. Because the honest truth is, the speed of stack access probably isn't slowing your code down to that great an extent that you need to be binary-incompatible from everyone else on your system.
Why the need for ABIs/standard calling conventions? Well, in terms of using the processor registers, stack etc, applications must agree on what means what and where it shoudl go. If one function decided all its arguments were in registers and another that some were on the stack, how would they be interoperable? Moreover, you might come across the term scratch registers to mean those registers you don't have to restore. What happens if you call a function expecting it to leave some registers alone?
Anyway, as for what you asked for, here's some ABI documentation:
The difference between x86 and x64 on windows.
x86_64 ABI used for Unix-like platforms.
Wikipedia's x86 calling conventions.
A document on compiler calling conventions.
The last one is my favourite. To quote it:
In the days of the old DOS operating system, it was often possible to combine development
tools from different vendors with few compatibility problems. With 32-bit Windows, the
situation has gone completely out of hand. Different compilers use different data
representations, different function calling conventions, and different object file formats.
While static link libraries have traditionally been considered compiler-specific, the
widespread use of dynamic link libraries (DLL's) has made the distribution of function
libraries in binary form more common.
So whatever you're trying to do with optimising via modifying the function calling method, don't. Find another way to optimise. Profile your code. Study the compiler optimisations you've got for your compiler (-OX) if you think it helps and dump the assembly to check, if the speed is really that crucial
For publically visible functions, this is documented in the ABI standard. For functions that are not referencible from the outside, all bets are off anyway.
You would have to read the fine manual for the compiler. If you were lucky, you would find it there in a description of function calling conventions. Otherwise, for an OSS compiler such as gcc you would probably have to read its source-code.
In C/C++ (specifically, I'm using MSVS), in what situation would one ever need to worry about specifying a calling convention for a function definition? Are they ever important? Isn't the complied capable of choosing the optimal convention when necessary (ie fastcall, etc).
Maybe my understanding is lacking, but I just do not see when their would be a case that the programmer would need to care about things like the order that the arguments are placed on the stack and so forth. I also do not see why the compiler's optimization would not be able to choose whatever scheme would work best for that particular function. Any knowledge anyone could provide me with would be great. Thanks!
In general terms, the calling convention is important when you're integrating code that's being compiled by different compilers. For example, if you're publishing a DLL that will be used by your customers, you will want to make sure that the functions you export all have a consistent, expected calling convention.
You are correct that within a single program, the compiler can generally choose which calling convention to use for each function (and the rules are usually pretty simple).
You do not need to care for 64-bit applicatins since there is only one calling convention.
You do need to care for 32-bit applications in the following cases:
You interact with 3rd party libraries and the headers for these libraries did not declare the correct calling convention.
You are creating a library or DLL for someone else to use. You need to decide on a calling convention so that other code would use the correct calling convention when calling your code.