Jump/tailcall to another function - c++

I have two functions, looking like this in C++:
void f1(...);
void f2(...);
I can change the body of f1, but f2 is defined in another library I cannot change. I absolutely have to (tail) call f2 inside f1, and I must pass all arguments provided to f1 to f2, but as far as I know, this is impossible in pure C or C++. There is no alternative of f2 that accepts a va_list, unfortunately. The call to f2 happens last in the function, so I need some form of tailcall.
I decided to use assembly to pop the stack frame of the current function, then jump to f2 (it is actually received as a function pointer and in a variable, so that's why I first store it in a register):
__asm {
mov eax, f2
leave
jmp eax
}
In MSVC++, in Debug, it appears to work at first, but it somehow messes with the return values of other functions, and sometimes it crashes. In Release, it always crashes.
Is this assembly code incorrect, or do some optimizations of the compiler somehow break this code?

The compiler will make no guarantees at the point you are digging around. A trampoline function might work, but you have to save state between them, and do a lot of digging around.
Here is a skeleton, but you will need to know a lot about calling conventions, class method invocation, etc...
/
* argn, ..., arg0, retaddr */
trampoline:
push < all volatile regs >
call <get thread local storage >
copy < volatile regs and ret addr > to < local storage >
pop < volatile regs >
remove ret addr
call f2
call < get thread local storage >
restore < volatile regs and ret addr>
jmp f1
ret

You have to write f1 in pure asm for it to be guaranteed-safe.
In all the major x86 calling conventions, the callee "owns" the args, and can modify the stack-space that held them. (Whether or not the C source changes them and whether or not they're declared const).
e.g. void foo(int x) { x += 1; bar(x); } might modify the stack space above the return address that holds x, if compiled with optimization disabled. Making another call with the same args requires storing them again unless you know the callee hasn't stepped on them. The same argument applies for tailcalling from the end of one function.
I checked on the Godbolt compiler explorer; both MSVC and gcc do in fact modify x on the stack in debug builds. gcc uses add DWORD PTR [ebp+8], 1 before pushing [ebp+8].
Compilers in practice may not actually take advantage of this for variadic functions, though, so depending on the definitions of your functions, you might get away with it if you can convince them to make a tailcall.
Note that void bar(...); is not a valid prototype in C, though:
# gcc -xc on Godbolt to force compiling as C, not C++
<source>:1:10: error: ISO C requires a named argument before '...'
It is valid in C++, or at least g++ accepts it while gcc doesn't. MSVC accepts it in C++ mode, but not in C mode. (Godbolt has a whole separate C mode with a different set of compilers, which you can use to get MSVC to compile code as C instead of C++. I don't know a command-line option to flip it to C mode the way gcc has -xc and -xc++)
Anyway, It might work (in optimized builds) to write f2(); at the end of f1, but that's nasty and completely lying to the compiler about what args are passed. And of course only works for a calling convention with no register args. (But you were showing 32-bit asm, so you might well be using a calling convention with no register args.)
Any decent compiler will use jmp f2 to make an optimized tail-call in this case, because they both return void. (For non-void, you would return f2();)
BTW, if mov eax, f2 works, then jmp f2 will also work.
Your code can't work in an optimized build, though, because you're assuming that the compiler made a legacy stack-frame, and that the function won't inline anywhere.
It's unsafe even in a debug build because the compiler may have pushed some call-preserved registers that need to be popped before leaving the function (and before running leave to destroy the stack frame).
The trampoline idea that #mevets showed could maybe be simplified: if there's a reasonable fixed upper size limit on the args, you can copy maybe 64 or 128 bytes of potential-args from your incoming args into args for f1. A few SIMD vectors will do it. Then you can call f1 normally, then tail-call f2 from your asm wrapper.
If there are potentially register args, save them to stack space before the args you copy, and restore them before tailcalling.

Related

MSVC optimizer saves and restores XMM SIMD registers on an early-out path through a function. Why? [duplicate]

In C, if I have a function call that looks like
// main.c
...
do_work_on_object(object, arg1, arg2);
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
if(object == NULL)
{
return;
}
// do lots of work
}
then the compiler will generate a lot of stuff in main.o to save state, pass parameters (hopefully in registers in this case), and restore state.
However, at link time it can be observed that arg1 and arg2 are not used in the quick-return path, so the clean-up and state restoration can be short-circuited. Do linkers tend to do this kind of thing automatically, or would one need to turn on link-time optimization (LTO) to get that kind of thing to work?
(Yes, I could inspect the disassembled code, but I'm interested in the behaviours of compilers and linkers in general, and on multiple architectures, so hoping to learn from others' experience.)
Assuming that profiling shows this function call is worth optimizing, should we expect the following code to be noticeably faster (e.g. without the need to use LTO)?
// main.c
...
if(object != NULL)
{
do_work_on_object(object, arg1, arg2);
}
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
assert(object != NULL) // generates no code in release build
// do lots of work
}
Some compilers (like GCC and clang) are able to do "shrink-wrap" optimization to delay saving call-preserved regs until after a possible early-out, if they're able to spot the pattern. But some don't, e.g. apparently MSVC 16.11 still doesn't.
I don't think any do partial inlining of just the early-out check into the caller, to avoid even the overhead of arg-passing and the call / ret itself.
Since compiler/linker support for this is not universal and not always successful even for shrink-wrapping, you can write your code in a way that gets much of the benefit, at the cost of splitting the logic of your function into two places.
If you have a fast-path that takes hardly any code, but happens often enough to matter, put that part in a header so it gets inlined, with a fallback to calling the rest of the function (which you make private, so it can assume that any checks in the inlined part are already done).
e.g. par2's routine that processes a block of data has a fast-path for when the galois16 factor is zero. (dst[i] += 0 * src[i] is a no-op, even when * is a multiply in Galois16, and += is a GF16 add (i.e. a bitwise XOR)).
Note how the commit in question renames the old function to InternalProcess, and adds a new template<class g> inline bool ReedSolomon<g>::Process that checks for the fast-path, and otherwise calls InternalProcess. (as well as making a bunch of unrelated whitespace changes, and some ifdefs... It was originally a 2006 CVS commit.)
The comment in the commit claims an overall 8% speed gain for repairing.
Neither the setup or cleanup state code can be short-circuited, because the resulted compiled code is static, and it doesn't know what will happen when the program get's executed. So the compiler will always have to setup the whole parameter stack.
Think of two situations: in one object is nil, in the other is not. How will the assembly code know if to put on the stack the rest of the argument? Especially as the caller is the one responsible of placing the arguments at their proper location (stack or registry).

Can I make a C++ method in external assembly (function.asm)?

I am writing a program that requires one function in assembly. It would be pretty helpful to encapsulate the assembly function in a C++ class, so its own data is isolated and I can create multiple instances.
If I create a class and call an external function from a C++ method, the function is reentrant even if it has its own stack and local "variables" into the stack frame.
Is there some way to make the assembly function a C++ method, maybe using name mangling, so the function is implemented in assembly but the prototype is declared inside the C++ class?
If not possible, is there some way to create multiple instances (dynamically) of the assembly function although it is not part of the class? Something like clone the function in memory and just call it, obviously using relocatable code (adding a delta displacement for variables and data if required)...
I am writing a program that requires one function in assembly.
Then, by definition, your program becomes much less portable. And depends upon the calling conventions and ABI of your C++ implementation and your operating system.
It would then be coherent to use some compiler specific features (which are not in portable standard C++11, e.g. in n3337).
My recommendation is then to take advantage of GCC extended assembly. Read the chapter on using assembly language with C (it also, and of course, applies to C++).
By directly embedding some extended asm inside a C++ member function, you avoid the hassle of calling some function. Probably, your assembler code is really short and executed quickly. So it is better to embed it in C or C++ functions, avoiding the costs of function call prologue and epilogue.
NB: In 2019, there is no economical sense to spend efforts in writing large assembly code: most optimizing compilers produce better assembler code than a reasonable programmer can (in a reasonable time). So you have an incentive to use small assembler code chunks in larger C++ or C functions.
Yes, you can. Either define it as an inline wrapper that passes all the args (including the implicit this pointer) to an external function, or figure out the name-mangling to define the right symbol for the function entry point in asm.
An example of the wrapper way:
extern "C" int asm_function(myclass *p, int a, double b);
class myclass {
int q, r, member_array[4];
int my_method(int a, double b) { return asm_function(this, a, b); }
};
A stand-alone definition of my_method for x86-64 would be just jmp asm_function, a tailcall, because the args are identical. So after inlining, you'll have call asm_function instead of call _Zmyclass_mymethodZd or whatever the actual name mangling is. (I made that up).
In GNU C / C++, there's also the asm keyword to set the asm symbol name for a function, instead of letting the normal name-mangling rules generate it from the class and member-function name, and arg types. (Or with extern "C", usually just a leading underscore or not, depending on the platform.)
class myclass {
int q, r, member_array[4];
public:
int my_method(int a, double b)
asm("myclass_my_method_int_double"); // symbol name for separate asm
};
Then in your .asm file (e.g. NASM syntax, for the x86-64 System V calling convention)
global myclass_my_method_int_double
myclass_my_method_int_double:
;; inputs: myclass *this in RDI, int a in ESI, double b in XMM0
cvtsd2si eax, xmm0
add eax, [rdi+4] ;; this->r
imul eax, esi
ret
(You can pick any name you want for your asm function; it doesn't have to encode the args. But doing that will let you overload it without conflicting symbol names.)
Example on Godbolt of a test caller calling the asm("") way:
void foo(myclass *p){
p->my_method(1, 1.0);
}
compiles to
foo(myclass*):
movsd xmm0, qword ptr [rip + .LCPI0_0] # xmm0 = mem[0],zero
mov esi, 1
jmp myclass_my_method_int_double # TAILCALL
Note that the caller emitted jmp myclass_my_method_int_double, using your name, not a mangled name.

inside naked function - how to do simple assignment

This is the beginning of a function that already exists and works; the commented line is my addition and its purpose is to toggle a pin.
inline __attribute__((naked))
void CScheduler::SwapToThread(void* pNew, void* pPrev)
{
//*(volatile DWORD*)0x400FF08C = (1 << 14);
if (pPrev != NULL)
{
if (pPrev == this) // Special case to save scheduler stack on startup
{
asm("mov lr,%0"::"p"(&CScheduler_Run_Exit)); // load r1 with schedulers End thread
asm("orr lr, 1");
When I uncomment my addition, my hard fault handler executes. I get it has something to do with this being a naked function but I don't understand why a simple assignment causes a problem.
Two questions:
Why does this line trigger the hard fault?
How can I perform this assignment inside this function?
It was only luck that your previous version of the function happened to work without crashing.
The only thing that can safely be put inside a naked function is a pure Basic Asm statement. https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html. You can split it up into multiple Basic Asm statements, instead of asm("insn \n\t" / "insn2 \n\t" / ...);, but you have to write the entire function in asm yourself.
While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.
If you want to run C++ code from a naked function, you could call a regular function (or bl on ARM, jal on MIPS, etc.), following to the standard calling convention.
As for the specific reason in this case? Maybe creating that address in a register stepped on the function args, leading to the branches going wrong? Inspect the generated asm if you want, but it's 100% unsupported.
Or maybe it ended up using more registers, and since it's naked didn't properly save/restore call-preserved registers? I haven't looked at the code-gen myself for naked functions.
Are you sure this function needs to be naked? I guess that's because you manipulate lr to return to the new context.
If you don't want to just write more logic in asm, maybe have this function's caller do more work (and maybe pass it pointer and/or boolean args telling it more simply what it needs to do, so your inputs are already in registers, and you don't need to access globals).

Order of function signature, call and definition

I want to ask order of function signature, call and definition
like, which one would the computer look first, second and third
So:
#include <iostream>
using namespace std;
void max(void);
void min(void);
int main() {
max();
min();
return;
}
void max() {
return;
}
void min() {
return;
}
So this is what I think,
the computer will go to main and look at the function call, then it will look at the
function signature, and at the last, it will look at the definition.
It is right?
Thank
It is right?
No.
You need to understand the difference between function declarations and function definitions, the difference between compilation, linking, and execution, and the difference between non-virtual and virtual functions.
Function declarations
This is a function declaration: void max(void);. It doesn't tell the compiler anything about what the function does. What it does is to tell the compiler how to call the function and how to interpret the result. When the compiler is compiling the body of some function, call it function A, the compiler doesn't need to know what other functions do. All it needs to know is what to do with the functions that function A calls. The compiler might generate code in assembly or some intermediate language that corresponds to your C++ function calls. Or it might reject your C++ code because your code doesn't make sense.
Determining whether your code makes sense is another key purpose of those function declarations. This is particularly important in C++ where multiple functions can have the same name. How would the compiler know which of the half dozen or so max functions to call if it didn't know about those functions? When your C++ code calls some function, the compiler must find one best match (possibly involving type conversions) with one of those function declarations. Your code doesn't make sense if the compiler can't find a match at all, or if it finds more than one match but can't distinguish one as the best match.
When the compiler does find a best match, the generated code will be in the form of a call to an undefined external reference to that function. Where that function lives is not the job of the compiler.
Function definitions
That void max(void) was a function declaration. The corresponding void max() {...} is the definition of that function. When the compiler is processing void max() {...} it doesn't have to worry about what other functions have called it. It just has to worry about processing void max() {...} . The body of this function becomes assembly or intermediate language code that is inserted into some compiled object file. The compiler marks the address of the entry point to this generated code is marked as such.
Compilation versus linking
So far I've talked about what the compiler does. It generates chunks of low-level code that correspond to your C++ code. That generated code is not ready for prime time because of those external references. Resolving those undefined external references is the job of the linker. The linker is what builds your executable from multiple object files, multiple libraries. It keeps track of where it has put those chunks of code in the executable. What about those undefined external references? If the linker has already placed that reference in the executable, the linker simply fills in the placeholder for that reference. If the linker hasn't come across the definition for that reference, it puts the reference and the placeholder onto a list of still-unresolved references. Every time the linker adds a chunk of code to the executable, it checks that list to see if it can fix any of those still-unresolved references. At the end, you will either have all references resolved or you will still have some outstanding ones. The latter is an error. The former means that you have an executable.
Execution
When your code runs, those function calls are really just some stack management wrapped around the machine language equivalent of that evil goto statement. There's no examining your function declarations; those don't even exist by the time the code is executed. Return? That's a goto also.
Non-virtual versus virtual functions
What I said above pertains to non-virtual functions. Run-time dispatching does occur for virtual functions. That run-time dispatching has nothing to do with examining function declarations. Those virtual functions are perhaps an issue for a different question.
One last thing:
Get out of the habit of using namespace std; Think of it as akin to smoking. It's a bad habit.
As you may know, the compiler converts the program into machine code (via several intermediate steps). Here is the dissassembly of the machine code for main() when compiled on Visual Studio 2012 in debug mode on Windows 8:
int main() {
00C24400 push ebp # Setup stack frame
00C24401 mov ebp,esp
00C24403 sub esp,0C0h
00C24409 push ebx
00C2440A push esi
00C2440B push edi
00C2440C lea edi,[ebp-0C0h] # Fill with guard bytes
00C24412 mov ecx,30h
00C24417 mov eax,0CCCCCCCCh
00C2441C rep stos dword ptr es:[edi]
max();
00C2441E call max (0C21302h) # Call max
min();
00C24423 call min (0C2126Ch) # Call min
return 0;
00C24428 xor eax,eax
}
00C2442A pop edi # Restore stack frame
00C2442B pop esi
00C2442C pop ebx
00C2442D add esp,0C0h
00C24433 cmp ebp,esp
}
00C24435 call __RTC_CheckEsp (0C212D5h) # Check for memory corruption
00C2443A mov esp,ebp
00C2443C pop ebp
00C2443D ret
The exact details will vary from compiler to compiler and operating system to operating system. If min() or max() had arguments or return values, they would be passed as appropriate for the architecture. The key point is that the compiler has already worked out what the arguments and return values are and created machine code that just passes or accepts them.
You can learn more details if you wish to help with debugging or to do low level calls but be aware that the machine code emitted can be highly variable. For example, here is the same code compiled on the same system in release mode (i.e. with optimizations on):
return 0;
01151270 xor eax,eax
}
01151272 ret
As you can see, it has detected that min() and max() do nothing and removed them completely. Since there is now no stack frame to setup and restore, that is gone, leaving a single instruction to set eax to 0 then returning (since the return value is in the eax register).

C++ custom calling convention

While reverse engineering I came around a very odd program that uses a calling convention that passes one argument in eax ( very odd compiler ?? ). I want to call that function now and I don't know how to declare it, IDA defines it as
bool __usercall foo<ax>(int param1<eax>, int param2);
where param1 is passed in the eax register. I tried something like
bool MyFoo(int param1, int param2)
{
__asm mov eax, param1;
return reinterpret_cast<bool(__stdcall *)(int)>(g_FooAddress)(param2);
}
However, unfortunately my compiler makes use of the eax register when pushing param2 on the stack, is there any way how I can make this clean without writing the whole call with inline assembler? (I am using Visual Studio if that matters)
There are "normal" calling conventions which pass arguments via registers. If you are using MSVC for example, __fastcall.
http://en.wikipedia.org/wiki/X86_calling_conventions#fastcall
You cannot define your own calling conventions, but I would suggest that you do create a wrapper function which does its own calling / cleanup via inline assembly. This is probably the most practical to achieve this effect, though you could also probably do it faster by using __fastcall, doing a bit of register swapping, then jmp to the correct function.
There's more to a calling convention than argument passing though, so option #1 is probably the best as you'll get full control over how the caller acts.