In my C++ JNI-Agent project i am implementing a function which would be given a variable number of parameters and would pass the execution to the other function:
// address of theOriginalFunction
public static void* originalfunc;
void* interceptor(JNIEnv *env, jclass clazz, ...){
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
The function above needs to just jump to the:
JNIEXPORT void JNICALL Java_main_Main_theOriginalFunction(JNIEnv *env, jclass clazz, jboolean p1, jbyte p2, jshort p3, jint p4, jlong p5, jfloat p6, jdouble p7, jintArray p8, jbyteArray p9){
// Do something
}
The code above works perfectly, the original function can read all the parameters correctly (tested with 9 parameters of different types including arrays).
However, before jumping into original function from the interceptor i need to do some computations. However, here i observe interesting behavior.
void* interceptor(JNIEnv *env, jclass clazz, ...){
int x = 10;
int y = 20;
int summ = x + y;
// NEED TO RESTORE ESP TO EBP SO THAT ORIGINAL FUNCTION READS PARAMETERS CORRECTLY
asm (
"movl %ebp, %esp;"
"mov %rbp, %rsp"
);
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
This still works fine, i am able to do some basic computations , then reset the stack pointer and jump to my original function, the original function also reads the parameters from the var_args correctly. However: if i replace the basic int operations with malloc or printf("any string"); , then, somehow, if jump into my original function, then my parameters get messed up and the original function ends reading wrong values...
I have tried to debug this behavior and i inspected the memory regions to see what is goin wrong... Right before the jump, everything looks fine there, ebp is being followed by function parameters.
If i jump without complicated computations, everything works fine, memory region behind ebp doesnt get changed. original function reads correct values...
Now if i jump after doing printf (for example), the parameters read by the original method get corrupted...
What is causing this strange behavior? printf doesnt even store any lokal variables in my method... Ok it does store some literals in registers but why my stack gets corrupted only after the jump and not already before it?
For this project I use g++ version 4.9.1 compiler running on a windows machine.
And yes I am concerned of std::forward and templates options but they just do not work in my case... Aaand yes I know that jumping into other methods is a bit hacky but thats my only idea of how to bring JNI-interceptor to work...
******************** EDIT ********************
As discussed i am adding the generated assembler code with the source functions.
Function without printf (which works fine):
void* interceptor(JNIEnv *env, jclass clazz, ...){
//just an example
int x=8;
// restoring stack pointers
asm (
"movl %ebp, %esp;"
"mov %rbp, %rsp"
);
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
void* interceptor(JNIEnv *env, jclass clazz, ...){
// first when interceptor is called, probably some parameter restoring...
push %rbp
mov %rsp %rbp
sub $0x30, %rsp
mov %rcx, 0x10(%rbp)
mov %r8, 0x20(%rbp)
mov %r9, 0x28(%rbp)
mov %rdx, 0x18(%rbp)
// int x = 8;
movl $0x8, -0x4(%rbp)
// my inline asm restoring stack pointers
mov %ebp, %esp
mov %rbp, %rsp
// asm volatile("jmp *%0;"::"r" (originalfunc+4))
mov 0xa698b(%rip),%rax // store originalfunc in rax
add %0x4, %rax
jmpq *%rax
// return NULL;
mov $0x0, %eax
}
Now asm output for printf variant...
void* interceptor(JNIEnv *env, jclass clazz, ...){
//just an example
int x=8;
printf("hey");
// restoring stack pointers
asm (
"movl %ebp, %esp;"
"mov %rbp, %rsp"
);
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
void* interceptor(JNIEnv *env, jclass clazz, ...){
// first when interceptor is called, probably some parameter restoring...
push %rbp
mov %rsp %rbp
sub $0x30, %rsp
mov %rcx, 0x10(%rbp)
mov %r8, 0x20(%rbp)
mov %r9, 0x28(%rbp)
mov %rdx, 0x18(%rbp)
// int x = 8;
movl $0x8, -0x4(%rbp)
// printf("hey");
lea 0x86970(%rip), %rcx // stores "hey" in rcx???
callq 0x6b701450 // calls the print function, i guess
// my inline asm restoring stack pointers
mov %ebp, %esp
mov %rbp, %rsp
// asm volatile("jmp *%0;"::"r" (originalfunc+4))
mov 0xa698b(%rip),%rax // store originalfunc in rax
add %0x4, %rax
jmpq *%rax
// return NULL;
mov $0x0, %eax
}
And here is the asm code for the printf function:
printf(char const*, ...)
push %rbp
push %rbx
sub $0x38, %rsp
lea 0x80(%rsp), %rbp
mov %rdx, -0x28(%rbp)
mov $r8, -0x20(%rbp)
mov $r9, -0x18(%rbp)
mov $rcx, -0x30(%rbp)
lea -0x28(%rbp), %rax
mov %rax, -0x58(%rbp)
mov -0x58(%rbp), %rax
mov %rax, %rdx
mov -0x30(%rbp), %rcx
callq 0x6b70ff60 // (__mingw_vprintf)
mov %eax, %ebx
mov %ebx, %eax
add $0x38, %rsp
pop %rbx
pop %rbp
retq
It looks like printf does many operations on rbp , but i cannot see anything wrong with it...
And here is the asm code of the intercepted function.
push %rbp // 1 byte
push %rsp, %rbp // 3 bytes , need to skip them
sub $0x50, %rsp
mov %rcx, 0x10(%rbp)
mov %rdx, 0x18(%rbp)
mov %r8d, %ecx
mov %r9d, %edx
mov 0x30(%rbp), %eax
mov %cl, 0x20(%rbp)
mov %dl, 0x28(%rbp)
mov %ax, -0x24(%rbp)
************* EDIT 2 **************
I thought it would be useful to see how memory changes at the run-time:
The first picture shows the memory layout right after entering the interceptor function:
The second images shows the same memory region after problematic code (like printf and so)
The third picture shows the memory layout right after jumping to original function.
As you can see, right after calling printf , stack looks fine, however when i jump into the original function, it messes up...
Looking at the screenshots, I am pretty sure that all the parameters lie on the stack in the memory, and parameter are not passed by registers.
Arguments are passed manually in assembly using a set calling convention. In this case, the arguments are passed in registers beginning with %rcx. Any modification to the registers used as calling conventions will change the arguments perceived by any proceeding jmp.
Calling printf before your jmp changes the value of %rcx from *env to a pointer to constant "hello". After you change the value of %rcx you need to restore it to the value it was previously. The following code should work:
void* interceptor(JNIEnv *env, jclass clazz, ...){
//just an example
int x=8;
printf("hey");
// restoring stack pointers
asm (
"movl %ebp, %esp;"
"mov %rbp, %rsp"
);
// restore %rcx to equal *env
asm volatile("mov %rcx, 0x10(%rbp)");
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
What architecture is this? From the register names, it appears to be x64.
You say the parameters are wrong. I agree. You jump from there to believing the stack is wrong. Probably not. x64 passes some parameters in registers, but not varargs. So the function signature for your forwarder is simply incompatible with the function you are trying to call.
Post the assembly for a direct call to Java_main_Main_theOriginalFunction and then for a call to your forwarder using the exact same parameters; you'll see a terrible difference in how the arguments are passed.
Most likely any function you call before your forwarding destroys the structure that is needed to handle the variable argument list (in your assembly there is still the mingw_printf call of which you didn't show the disassembly).
To understand better what's going on you might want to have a look at this question.
To solve your problem you could consider to add another indirection, I think that the following might work (but I haven't tested it).
void *forward_interceptor(env, clazz, ... ) {
// add 4 to the function address to skip "push ebp / mov ebp esp"
asm volatile("jmp *%0;"::"r" (originalfunc+4));
// will not get here anyway
return NULL;
}
void* interceptor(JNIEnv *env, jclass clazz, ...){
//do your preparations
...
va_list args;
va_start(args, clazz);
forward_interceptor(env, clazz, args);
va_end(args);
}
IMHO the important thing is that you need the va_list/va_start/va_end setup to make sure that the parameters are properly passed on to the next function.
However, since you seem to know the signature of the function you are forwarding to and it doesn't seem to accept a variable number of arguments, why not extract the arguments, and call the function properly like:
void* interceptor(JNIEnv *env, jclass clazz, ...){
//do your preparations
...
va_list args;
va_start(args, clazz);
jboolean p1 = va_arg(args, jboolean);
jbyte p2 = va_arg(args, jbyte);
jshort p3 = va_arg(args, jshort);
...
Java_main_Main_theOriginalFunction(env, clazz, p1, p2, ...
va_end(args);
return NULL;
}
Note, however, that va_arg can not check whether the parameter is of the correct type or available at all.
Related
I have ubuntu 16.04, x86_64 arch, 4.15.0-39-generic kernel version.
GCC 8.1.0
I tried to rewrite this functions(from first post https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/qHDCU73cEFc) from Intel dialect to AT&T. And I did not succeed.
namespace atomic {
__declspec(naked)
static void*
ldptr_acq(void* volatile*) {
_asm {
MOV EAX, [ESP + 4]
MOV EAX, [EAX]
RET
}
}
__declspec(naked)
static void*
stptr_rel(void* volatile*, void* const) {
_asm {
MOV ECX, [ESP + 4]
MOV EAX, [ESP + 8]
MOV [ECX], EAX
RET
}
}
}
Then I wrote a simple program, to get the same pointer, which I pass inside. I installed GCC version 8.1 with supported naked attributes(https://gcc.gnu.org/gcc-8/changes.html "The x86 port now supports the naked function attribute") for fuctions.
As far as I remember, this attribute tells the compiler not to create the prologue and epilog of the function, and I can take the parameters from the stack myself and return them.
Code:(don't work with segfault)
#include <cstdio>
#include <cstdlib>
__attribute__ ((naked))
int *get_num(int*) {
__asm__ (
"movl 4(%esp), %eax\n\t"
"movl (%eax), %eax\n\t"
"ret"
);
}
int main() {
int *i =(int*) malloc(sizeof(int));
*i = 5;
int *j = get_num(i);
printf("%d\n", *j);
free(i);
return 0;
}
then I tried using 64bit registers:(don't work with segfault)
__asm__ (
"movq 4(%rsp), %rax\n\t"
"movq (%rax), %rax\n\t"
"ret"
);
And only after I took the value out of rdi register - it all worked.
__asm__ (
"movq %rdi, %rax\n\t"
"ret"
);
Why did I fail to make the transfer through the stack register? I probably made a mistake. Please tell me where is my fail?
Because the x86-64 System V calling convention passes args in registers, not on the stack, unlike the old inefficient i386 System V calling convention.
You always have to write asm that matches the calling convention, if you're writing the whole function in asm, like with a naked function or a stand-along .S file.
GNU C extended asm allows you to use operands to specify the inputs to an asm statement, and the compiler will generate instructions to make that happen. (I wouldn't recommend using it until you understand asm and how compilers turn C into asm with optimization enabled, though.)
Also note that movq %rdi, %rax implements long *foo(long*p){return p;} not return *p. Perhaps you meant mov (%rdi), %rax to dereference the pointer arg?
And BTW, you definitely don't need and shouldn't use inline asm for this. https://gcc.gnu.org/wiki/DontUseInlineAsm, and see https://stackoverflow.com/tags/inline-assembly/info
In GNU C, you can cast a pointer to volatile uint64_t*. Or you can use __atomic_load_n (ptr, __ATOMIC_ACQUIRE) to get basically everything you were getting from that asm, without the overhead of a function call or any of the cost for the optimizer at the call-site of having all the call-clobbered registers be clobbered.
You can use them on any object: https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html Unlike C++11 where you can only do atomic ops on a std::atomic<T>.
I know that C++ compilers optimize empty (static) functions.
Based on that knowledge I wrote a piece of code that should get optimized away whenever I some identifier is defined (using the -D option of the compiler).
Consider the following dummy example:
#include <iostream>
#ifdef NO_INC
struct T {
static inline void inc(int& v, int i) {}
};
#else
struct T {
static inline void inc(int& v, int i) {
v += i;
}
};
#endif
int main(int argc, char* argv[]) {
int a = 42;
for (int i = 0; i < argc; ++i)
T::inc(a, i);
std::cout << a;
}
The desired behavior would be the following:
Whenever the NO_INC identifier is defined (using -DNO_INC when compiling), all calls to T::inc(...) should be optimized away (due to the empty function body). Otherwise, the call to T::inc(...) should trigger an increment by some given value i.
I got two questions regarding this:
Is my assumption correct that calls to T::inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
I wonder if the variables (a and i) are still loaded into the cache when T::inc(a, i) is called (assuming they are not there yet) although the function body is empty.
Thanks for any advice!
Compiler Explorer is an very useful tool to look at the assembly of your generated program, because there is no other way to figure out if the compiler optimized something or not for sure. Demo.
With actually incrementing, your main looks like:
main: # #main
push rax
test edi, edi
jle .LBB0_1
lea eax, [rdi - 1]
lea ecx, [rdi - 2]
imul rcx, rax
shr rcx
lea esi, [rcx + rdi]
add esi, 41
jmp .LBB0_3
.LBB0_1:
mov esi, 42
.LBB0_3:
mov edi, offset std::cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
As you can see, the compiler completely inlined the call to T::inc and does the incrementing directly.
For an empty T::inc you get:
main: # #main
push rax
mov edi, offset std::cout
mov esi, 42
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
xor eax, eax
pop rcx
ret
The compiler optimized away the entire loop!
Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized?
Yes.
If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?
No, for some definition of "complex". Compilers use heuristics to determine whether it's worth it to inline a function or not, and bases its decision on that and on nothing else.
I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.
No, as demonstrated above, the loop doesn't even exist.
Is my assumption correct that calls to t.inc(...) do not affect the performance negatively when I specify the -DNO_INC option because the call to the empty function is optimized? If my assumption holds, does it also hold for more complex function bodies (in the #else branch)?
You are right. I have modified your example (i.e. removed cout which clutters the assembly) in compiler explorer to make it more obvious what happens.
The compiler optimizes everything away and outouts
main: # #main
movl $42, %eax
retq
Only 42 is leaded in eax and returned.
For the more complex case, however, more instructions are needed to compute the return value. See here
main: # #main
testl %edi, %edi
jle .LBB0_1
leal -1(%rdi), %eax
leal -2(%rdi), %ecx
imulq %rax, %rcx
shrq %rcx
leal (%rcx,%rdi), %eax
addl $41, %eax
retq
.LBB0_1:
movl $42, %eax
retq
I wonder if the variables (a and i) are still loaded into the cache when t.inc(a, i) is called (assuming they are not there yet) although the function body is empty.
They are only loaded, when the compiler cannot reason that they are unused. See the second example of compiler explorer.
By the way: You do not need to make an instance of T (i.e. T t;) in order to call a static function within a class. This is defeating the purpose. Call it like T::inc(...) rahter than t.inc(...).
Because the inline keword is used, you can safely assume 1. Using these functions shouldn't negatively affect performance.
Running your code through
g++ -c -Os -g
objdump -S
confirms this; An extract:
int main(int argc, char* argv[]) {
T t;
int a = 42;
1020: b8 2a 00 00 00 mov $0x2a,%eax
for (int i = 0; i < argc; ++i)
1025: 31 d2 xor %edx,%edx
1027: 39 fa cmp %edi,%edx
1029: 7d 06 jge 1031 <main+0x11>
v += i;
102b: 01 d0 add %edx,%eax
for (int i = 0; i < argc; ++i)
102d: ff c2 inc %edx
102f: eb f6 jmp 1027 <main+0x7>
t.inc(a, i);
return a;
}
1031: c3 retq
(I replaced the cout with return for better readability)
In assembly, many functions begin with the following prologue:
00000001004010e0: main(int, char**)+0 push %rbp
00000001004010e1: main(int, char**)+1 mov %rsp,%rbp
Some functions, like the one below, do not:
int MainEntry(){
MainEntry():
0000000100401104: MainEntry()+0 push %rbp
0000000100401105: MainEntry()+1 push %rbx
0000000100401106: MainEntry()+2 sub $0x48,%rsp
000000010040110a: MainEntry()+6 lea 0x80(%rsp),%rbp
vector<int> v;
0000000100401112: MainEntry()+14 lea -0x60(%rbp),%rax
0000000100401116: MainEntry()+18 mov %rax,%rcx
0000000100401119: MainEntry()+21 callq 0x100401b00 <std::vector<int, std::allocator<int> >::vector()>
return 0;
000000010040111e: MainEntry()+26 mov $0x0,%ebx
0000000100401123: MainEntry()+31 lea -0x60(%rbp),%rax
0000000100401127: MainEntry()+35 mov %rax,%rcx
000000010040112a: MainEntry()+38 callq 0x100401b20 <std::vector<int, std::allocator<int> >::~vector()>
000000010040112f: MainEntry()+43 mov %ebx,%eax
}
Here is the C++ code that compiles into this:
int main(int c, char** args){
MainEntry();
return 0;
}
int MainEntry(){
vector<int> v;
return 0;
}
So here are my two questions:
In the MainEntry function, there is a push %rbp, and then a push %rbx. Why is RBX pushed onto the stack?
If I understand correctly, sub $0x48, %rsp allocates 0x48 bytes on the stack, and lea 0x80(%rsp), %rbp moves 0x80 bytes down on the stack and assigns that as the base. Where is RBP going to end up in the local stack frame and how did it get there?
rbx is pushed onto the stack because the calling convention says it is preserved across calls.
This function is compiled without frame pointers. rbp is just another general purpose register when compiling without frame pointers.
About the question in the title (now improved)
The push rsp, rbp instruction doesn't exist. push always takes one argument. Perhaps you meant to ask why rbp isn't pushed. The answer is that nothing uses it and so no instructions are required to preserve it.
This question already has answers here:
What is the lifetime of a static variable in a C++ function?
(5 answers)
Closed 8 years ago.
C++, unlike some other languages, allows static data to be of any arbitrary type, not just plain-old-data. Plain-old-data is trivial to initialize (the compiler just writes the value at the appropriate address in the data segment), but the other, more complex types, are not.
How is initialization of non-POD types typically implemented in C++? In particular, what exactly happens when the function foo is executed for the first time? What mechanisms are used to keep track of whether str has already been initialized or not?
#include <string>
void foo() {
static std::string str("Hello, Stack Overflow!");
}
C++11 requires the initialization of function local static variables to be thread-safe. So at least in compilers that are compliant, there'll typically be some sort of synchronization primitive in use that'll need to be checked each time the function is entered.
For example, here's the assembly listing for the code from this program:
#include <string>
void foo() {
static std::string str("Hello, Stack Overflow!");
}
int main() {}
.LC0:
.string "Hello, Stack Overflow!"
foo():
cmpb $0, guard variable for foo()::str(%rip)
je .L14
ret
.L14:
pushq %rbx
movl guard variable for foo()::str, %edi
subq $16, %rsp
call __cxa_guard_acquire
testl %eax, %eax
jne .L15
.L1:
addq $16, %rsp
popq %rbx
ret
.L15:
leaq 15(%rsp), %rdx
movl $.LC0, %esi
movl foo()::str, %edi
call std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
movl guard variable for foo()::str, %edi
call __cxa_guard_release
movl $__dso_handle, %edx
movl foo()::str, %esi
movl std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string(), %edi
call __cxa_atexit
jmp .L1
movq %rax, %rbx
movl guard variable for foo()::str, %edi
call __cxa_guard_abort
movq %rbx, %rdi
call _Unwind_Resume
main:
xorl %eax, %eax
ret
The __cxa_guard_acquire, __cxa_guard_release etc. are guarding initialization of the static variable.
The implementation that I've seen uses a hidden boolean variable to check if the variable is initialized. Modern compiler will do this thread-safely, but IIRC, some older compilerd did not do that, and if it was called from several threads at the same time you could get the constructor called twice.
Something along the lines of:
static bool __str_initialized = false;
static char __mem_for_str[...]; //std::string str("Hello, Stack Overflow!");
void foo() {
if (!__str_initialized)
{
lock();
__str_initialized = true;
new (__mem_for_str) std::string("Hello, Stack Overflow!");
unlock();
}
}
Then, in the finalization code of the program:
if (__str_initialized)
((std::string&)__mem_for_str).~std::string();
It's implementation specific.
Typically, there'll be a flag (statically initialised to zero) to indicate whether it's initialised, and (in C++11, or earlier thread-safe implementations) some kind of mutex, also statically initialisable, to protect against multiple threads trying to in initialise it.
The generated code would typically behave something along the lines of
static __atomic_flag_type __initialised = false;
static __mutex_type __mutex = __MUTEX_INITIALISER;
if (!__initialised) {
__lock_type __lock(__mutex);
if (!__initialised) {
__initialise(str);
__initialised = true;
}
}
You can check what your compiler does by generating an assembler listing.
MSVC2008 in debug mode generates this code (excluding exception handling prolog/epilog etc):
mov eax, DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA
and eax, 1
jne SHORT $LN1#foo
mov eax, DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA
or eax, 1
mov DWORD PTR ?$S1#?1??foo##YA_NXZ#4IA, eax
mov DWORD PTR __$EHRec$[ebp+8], 0
mov esi, esp
push OFFSET ??_C#_0BH#ENJCLPMJ#Hello?0?5Stack?5Overflow?$CB?$AA#
mov ecx, OFFSET ?str#?1??foo##YA_NXZ#4V?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##A
call DWORD PTR __imp_??0?$basic_string#DU?$char_traits#D#std##V?$allocator#D#2##std##QAE#PBD#Z
cmp esi, esp
call __RTC_CheckEsp
push OFFSET ??__Fstr#?1??foo##YA_NXZ#YAXXZ ; `foo'::`2'::`dynamic atexit destructor for 'str''
call _atexit
add esp, 4
mov DWORD PTR __$EHRec$[ebp+8], -1
$LN1#foo:
i.e there is a static variable referenced by ?$S1#?1??foo##YA_NXZ#4IA this is checked to see if it & 1 is zero. if not it branches to the label $LN1#foo:. Otherwise it or's in 1 to the flag, constructs the string at a known location and then adds a call for its destructor at program exit using 'atexit'. Then continues the function as normal.
Let's take this for example:
Class TestClass {
public:
int functionInline();
int functionComplex();
};
inline int TestClas::functionInline()
{
// a single instruction
return functionComplex();
}
int TestClas::functionComplex()
{
/* many complex
instructions
*/
}
void myFunc()
{
TestClass testVar;
testVar.functionInline();
}
Suposing that all coments are in fact lines of code that are single line or many and complex lines of code. The equivalent code would be (after compilation):
void myFunc()
{
TestClass testVar;
// a single instruction
return functionComplex();
}
or would be:
void myFunc()
{
TestClass testVar;
// a single instruction
/* many complex
instructions
*/
}
In other words, would a normal function be inserted inline if called inside an inline function or not?
If the compiler can see that the function is not called anywhere else (e.g. it is static in the case of a free function), then at least gcc has inlined it for a long time.
Of course, this also assumes the compiler can actually "see" the source code of the function - only if you use "whole program optimisation" (available in at least MS and GCC compilers), does it inline functions that aren't either in the source file or headers included in the source.
Obviously, inlining a "large" function has very little benefit (because the overhead of making the call is such a small portion of the total runtime), and if the function gets called more than once (or "may be called more than once" by not being static), the compiler will almost certainly not inline a "large" function.
In summary: maybe the large function is inline, but quite likely not.
Please check the assembly code that I generated both for VC++ 2010 and g++.
Both the compilers dont actually treat any of the function as inline in this example.
Code:
class TestClass {
public:
int functionInline();
int functionComplex();
};
inline int TestClass::functionInline()
{
// a single instruction
return functionComplex();
}
int TestClass::functionComplex()
{
/* many complex
instructions
*/
return 0;
}
int main(){
TestClass t;
t.functionInline();
return 0;
}
VC++ 2010:
int main(){
01372E50 push ebp
01372E51 mov ebp,esp
01372E53 sub esp,0CCh
01372E59 push ebx
01372E5A push esi
01372E5B push edi
01372E5C lea edi,[ebp-0CCh]
01372E62 mov ecx,33h
01372E67 mov eax,0CCCCCCCCh
01372E6C rep stos dword ptr es:[edi]
TestClass t;
t.functionInline();
01372E6E lea ecx,[t]
01372E71 call TestClass::functionInline (1371677h)
return 0;
01372E76 xor eax,eax
}
Linux G++:
main:
.LFB3:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $16, %rsp
leaq -1(%rbp), %rax
movq %rax, %rdi
call _ZN9TestClass14functionInlineEv
movl $0, %eax
leave
ret
.cfi_endproc
Both the lines
01372E71 call TestClass::functionInline (1371677h)
and
call _ZN9TestClass14functionInlineEv
indicate that the function functionInline is not inline.
Now have a look at functionInline assembly:
inline int TestClass::functionInline()
{
01372E00 push ebp
01372E01 mov ebp,esp
01372E03 sub esp,0CCh
01372E09 push ebx
01372E0A push esi
01372E0B push edi
01372E0C push ecx
01372E0D lea edi,[ebp-0CCh]
01372E13 mov ecx,33h
01372E18 mov eax,0CCCCCCCCh
01372E1D rep stos dword ptr es:[edi]
01372E1F pop ecx
01372E20 mov dword ptr [ebp-8],ecx
// a single instruction
return functionComplex();
01372E23 mov ecx,dword ptr [this]
01372E26 call TestClass::functionComplex (1371627h)
}
Hence, functionComplex is not also inline.
No it would not be inlined. It is impossible beacause the compiler does not have available the body definition of a non-inlined function that could be located in another translation unit. I suppose normal function is a non-inlined function.
No, if you want your complex function be inserted inline, you must specify the inline keyword too.
In practice, use __forceinline keyword (on windows, __always_inline on linux) otherwise the compiler will ignore the keyword if there is a lot of instructions.
First of all inline function is just a directive to the compiler. It is not guaranteed that compiler will do the inlining.
Secondly, when you specify a function as inline, it tells compiler two things
1) Function might be a candidate for inlining. Whether it is going to be inlined is not guaranteed
2) This function has internal linkage. That is, function will be visible only in the translation unit it is compiled. This internal linkage is guaranteed irrespective of whether the function is actually inlined or not.
In you case, functionInline is specified as inline but functionComplex is not. functionComplex has external linkage. Compiler will never do the inlining of function with external linkage.
So, a plain answer to your question is "No" a normal (function without inline keyword and defined outside class)function will never be inlined