This question already has answers here:
What is wrong with using inline functions?
(13 answers)
Closed 4 years ago.
Introduction:
I have been creating a simple wrapper classes. I randomly found out that (or it appears to be) an inline function still compiled into a function call. I created an example class to test things out and this is what I found:
Consider the following class:
//compile with MSVC
class InlineTestClass
{
public:
int InternalInt;
int GetInt() {return InternalInt;}
inline int GetInt_Inl() {return InternalInt;}
//__forceinline -Forces the compiler to implement the function as inline
__forceinline int GetInt_ForceInl() {return InternalInt;}
};
This class has 3 functions for reference.
The GetInt function is a standard function.
The GetInt_Inl function is an inline function.
The GetInt_ForceInl function is an ensured inline function in case of the
compiler deciding not to implement GetInt_Inl as inline function
Implemented like so:
InlineTestClass itc;
itc.InternalInt = 3;
int myInt;
myInt = itc.InternalInt; //No function
myInt = itc.GetInt(); //Normal function
myInt = itc.GetInt_Inl(); //Inline function
myInt = itc.GetInt_ForceInl(); //Forced inline function
The resulting assembler code of the setting of myInt (taken from dissassembler):
451 myInt = itc.InternalInt;
0x7ff6fe0d4cae <+0x003e> mov eax,dword ptr [rsp+20h]
0x7ff6fe0d4cb2 <+0x0042> mov dword ptr [rsp+38h],eax
452 myInt = itc.GetInt();
0x7ff6fe0d4cb6 <+0x0046> lea rcx,[rsp+20h]
0x7ff6fe0d4cbb <+0x004b> call nD_Render!ILT+2125(?GetIntInlineTestClassQEAAHXZ) (00007ff6`fe0d1852)
0x7ff6fe0d4cc0 <+0x0050> mov dword ptr [rsp+38h],eax
453 myInt = itc.GetInt_Inl();
0x7ff6fe0d4cc4 <+0x0054> lea rcx,[rsp+20h]
0x7ff6fe0d4cc9 <+0x0059> call nD_Render!ILT+1885(?GetInt_InlInlineTestClassQEAAHXZ) (00007ff6`fe0d1762)
0x7ff6fe0d4cce <+0x005e> mov dword ptr [rsp+38h],eax
454 myInt = itc.GetInt_ForceInl();
0x7ff6fe0d4cd2 <+0x0062> lea rcx,[rsp+20h]
0x7ff6fe0d4cd7 <+0x0067> call nD_Render!ILT+715(?GetInt_ForceInlInlineTestClassQEAAHXZ) (00007ff6`fe0d12d0)
0x7ff6fe0d4cdc <+0x006c> mov dword ptr [rsp+38h],eax
As shown above the setting (of myInt) from the member of InlineTestClass directly is (as expected) 2 mov instructions long.
Setting from the GetInt function results in a function call (as expected), however both of the GetInt_Inl and GetInt_ForceInl (inline functions) also result in a function call.
It appears as if the inline function has been compiled as a normal function ignoring the inlining completely (correct me if I am wrong).
This is strange cause according to MSVC documentation:
The inline and __inline specifiers instruct the compiler to insert a
copy of the function body into each place the function is called.
Which (I think) would result in:
inline int GetInt_Inl() {return InternalInt; //Is the function body}
myInt = itc.GetInt_Inl(); //Call site
//Should result in
myInt = itc.InternalInt; //Identical to setting from the member directly
Which means that the assembler code should be also identical to the one of setting from the class member directly but it isn't.
Questions:
Am I missing something or implementing the functions incorrectly?
Am I interpreting the function of the inline keyword? What is it?
Why do these inline functions result in a function call?
Functions defined within classes are 'recommended inline' by default. So inline there does absolutely nothing. Also, the compiler is always free to overrule the programmer's keyword regardless. It's merely advisory.
From the C++17 draft (page 147):
The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call; however, even if this inline substitution is omitted, the other rules for inline functions specified in this subclause shall still be respected.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf
Related
TL;DR : Should we use fn(Interface* pMaybeNull) or fn(Interface& maybeNullObject) -- specifically in the case of "optional" function arguments of a virtual/abstract base class?
Our code base contains various forms of the following pattern:
struct CallbackBase {
virtual ~CallbackBase() = default;
virtual void Hello(/*omitted ...*/) = 0;
};
...
void DoTheThing(..., CallbackBase* pOpt) {
...
if (pOpt) { pOpt->Hello(...); }
}
where the usage site would look like:
... {
auto greet = ...;
...
DoTheThing(..., &greet);
// or if no callback is required from call site:
DoTheThing(..., nullptr);
}
It has been proposed that, going forward, we should use a form of the Null-Object-Pattern. like so:
struct NoopCall : public CallbackBase {
virtual void Hello(/*omitted ...*/) { /*noop*/ }
};
void DoTheThing2(..., CallbackBase& opt) {
...
opt.Hello(...);
}
... {
NoopCall noop;
// if no callback is required from call site:
DoTheThing2(..., noop);
}
Note: Search variations yield lots of results regarding Null-Object (many not in the C++ space), a lot of very basic treatment of pointer vs. references and if you include the word "optional", as-in the parameter is optional, you obviously get a lot of hits regarding std::optional which, afaik, is unsuitable for this virtual interface use case.
I couldn't find a decent comparison of the two variants present here, so here goes:
Given C++17/C++20 and a halfway modern compiler, is there any expected difference in the runtime characteristics of the two approaches? (this factor being just a corollary to the overall design choice.)
The "Null Object" approach certainly "seems" more modern and safer to me -- is there anything in favor of the pointer approach?
Note:
I think it is orthogonal to the question posed, whether it stands as posted, or uses a variant of overloading or default arguments.
That is, the question should be valid, regardless of:
//a
void DoTheThing(arg);
// vs b
void DoTheThing(arg=nullthing);
// vs c
void DoTheThing(arg); // overload1
void DoTheThing(); // overload0 (calling 1 internally)
Performance:
I inspected the code on godbolt and while MSVC shows "the obvious", the gcc output is interesting (see below).
// Gist for a MCVE.
"The obvious" is that the version with the Noop object contains an unconditional virtual call to Hello and the pointer version has an additional pointer test, eliding the call if the pointer is null.
So, if the function is "always" called with a valid callback, the pointer version is a pessimization, paying an additional null check.
If the function is "never" called with a valid callback, the NullObject version is a (worse) pessimization, paying a virtual call that does nothing.
However, the object version in the gcc code contains this:
WithObject(int, CallbackBase&):
...
mov rax, QWORD PTR [rsi]
...
mov rax, QWORD PTR [rax+16]
(!) cmp rax, OFFSET FLAT:NoopCaller::Hello(HelloData const&)
jne .L31
.L25:
...
.L31:
mov rdi, rsi
mov rsi, rsp
call rax
jmp .L25
And while my understanding of assembly is certainly near non existent, this looks like gcc is comparing the call pointer to the NoopCaller::Hello function, and eliding the call in this case!
Conclusion
In general, the pointer version should produce more optimal code on the micro-level. However, compiler optimizations might make any difference near non-observable.
Think about using the pointer version if you have a very hot path where the callback is null.
Use the null object version otherwise, as it is arguably safer and more maintainable.
I a have function with inline assembly that has the following definition:
void __declspec(naked) func()
{
__asm
{
//...
JMP [address]
//...
}
}
This address variable is known only at run time, at main I have:
int main()
{
//...
DWORD address = getAddress();
func();
//...
}
As like that, the code will not compile with the following error message:
error C2094: label 'address' was undefined
How can I work around this problem, knowing that I cannot pass address as a parameter to the func() function?
Could I define address in a namespace? Would it be good practice? Can namespaces be used to promote the scope of a variable (using this variable at diferent functions/scopes)?
you are in C++, the destructors must be called when you leave blocks having instances of classes in the stack, this will not be the case with your JMP, and I do not speak about the value of the stack pointer/frame
C++ have exceptions, use them, for instance give in argument the address of a function without argument to call in your assembly portion, and that function throw the exception you want, and place a try-catch at the destination you want to go
I have written an instrument-er in C++ to log entry and exit functions by hooking on enter and exit calls. It is working as supposed to with a legacy code base. However on hooking with a project that I downloaded from git, function addresses that I save in an extern variable in the subject code, they are coming out different in the profiler library. That is messing up the function pointer comparison between hooked and saved functions.
Function address in subject code main file, breakpoint is inside the _penter hook function in the profiler code currently
The same entry is showing a different address with a "_" preceding the function name, in the profiler code
I have no idea how it is changing the addresses and want to know if I am doing something wrong.
The way I am doing it is, I have an extern array of function pointers( and their names) that is initialized with subject code functions' references in the subject main file(where all functions are available). In hook function (_penter) of the library, I get the address of the function just entered. So I compare it with the addresses in the extern array, and if it is a match, I log the entered function.
SNIPPET FROM PROFILE.H (profiler)
extern Signature FuncTable[3000];
SNIPPET FROM PROFILE.CPP (profiler)
void _stdcall EnterFunc0(unsigned * pStack)
{
void * pCaller;
pCaller = (void *)(pStack[0] - 5); // the instruction for calling _penter is 5 bytes long
Signature * funct = FuncTable; //the table that has references to functions and their names
funct = FuncTable;
while (funct->function)
{
//const BYTE * func = (const BYTE *)funct->function;
if ((void *)(pStack[0] - 5) == (void *)(funct->function))
{
int a = 0;
linesBuffer = linesBuffer + "Entering " + funct->signature + ";";
linesBuffer = linesBuffer + "\n";
WriteToFile(false); //function buffers 100kb before writing
break;
}
funct++;
}
}
extern "C" __declspec(naked) void __cdecl _penter()
{
_asm
{
pushad // save all general purpose registers
mov eax, esp // current stack pointer
add eax, 32 // stack pointer before pushad
push eax // push pointer to return address as parameter to EnterFunc0
call EnterFunc0
popad // restore general purpose registers
ret // start executing original function
}
}
SNIPPET FROM main.c (subject code main file)
#include "../Profile/Profile.h"
Signature FuncTable[] = {
{ (int)TetrisView_ProcessPauseMenu, "TetrisView_ProcessPauseMenu" },
{ NULL }
};
I think it is because of Incremental Linking. When it is turned on, you'll get an Incremental Linking Table (ILT). ILT contains a jump table. When a function is called, it is called via this ILT.
In FuncTable, you'll get an address which is in ILT, it won't be the address of the actual function. But in _penter, its return address will be the actual function (this is what is put in pCaller).
Turn off incremental linking, and you'll be fine.
Along the lines with the first answer here I tried to encapsulate some assembly code in a C++ function.
When I put this code in a function (inline or not) and pass the shellcode as an argument to the function it gives me an access violation 0xC0000005 at the call instruction, with or without DEP enabled. However, when I define the shellcode inside the function just before VirtualProtect, it works fine.
Current function code:
inline void ExecuteShellcode(char shellcode[])
{
/*char shellcode[] = \
"shellcode"; // If I use this local variable instead of the argument it works tho
*/
DWORD tempstore;
if (VirtualProtect(shellcode, sizeof(shellcode), PAGE_EXECUTE_READWRITE, &tempstore))
{
__asm lea eax, shellcode;
__asm call eax; Access violation 0xC0000005
}
}
Why does __asm call not work with non-local variables in this instance?
I have a hack program; it injects some functions into a target process to control it. The program is written in C++ with inline assembly.
class GameProcMain {
// this just a class
};
GameProcMain* mainproc; // there is no problem I can do =(GameProcMain*)0xC1EA90
Now I want to define a class function (which set ecx to class pointer) instead of writing assembly.
PPLYDATA GetNearblyMob(__Vector3* cordinate) {
__asm {
mov ecx, 0xC1EA90
enter code here
push cordinate
mov edi, 0x4A8010
call edi
}
}
I want to define it and call it like.
PPLYDATA (DLPL::*GetNearblyMob)(__Vector3* cordinate);
mainproc->GetNearblyMob(ADDR_CHRB->kordinat)
When I try GetNearblyMob=(PPLYDATA (DLPL::*)(__Vector3*)) 0x4A8010;
It says something like error: invalid type conversion: "int" to "PPLYDATA (DLPL::*)(int, int)"
but I can do this to set the pointer:
void initializeHack() {
__asm {
LEA edi, GetNearblyMob
MOV eax, 0x4A8010
MOV [edi], eax
}
}
Now I want to learn "how I can set GetNearblyMob without using assembly and legitimately in C++".
The problem is that member functions automatically get an extra parameter for the this pointer. Sometimes you can cast between member and non-member functions, but I don't see the need to cast anything.
Typically it's easier to reverse-engineer into C functions than into C++. C typically has a more straightforward ABI, so you can keep the data structures straight as you work them out.
So, I would recommend
PPLYDATA (*GetNearblyMob)(DLPL *main_obj, __Vector3* cordinate) = 0x12345UL;
and then define your own function
class DLPL {
GetNearblyMob( __Vector3* cordinate ) {
return ::GetNearblyMob( this, cordinate );
}
// ... other program functions
};
I am a bit surprised that it won't you cast like that.
You can try to do something like
GetNearblyMob=reinterpret_cast<PPLYDATA (DLPL::*)(__Vector3*)> (0x4A8010);
If that still does not work, try
*(int*)(&GetNearblyMob) = 0x4A8010;