inside naked function - how to do simple assignment - c++

This is the beginning of a function that already exists and works; the commented line is my addition and its purpose is to toggle a pin.
inline __attribute__((naked))
void CScheduler::SwapToThread(void* pNew, void* pPrev)
{
//*(volatile DWORD*)0x400FF08C = (1 << 14);
if (pPrev != NULL)
{
if (pPrev == this) // Special case to save scheduler stack on startup
{
asm("mov lr,%0"::"p"(&CScheduler_Run_Exit)); // load r1 with schedulers End thread
asm("orr lr, 1");
When I uncomment my addition, my hard fault handler executes. I get it has something to do with this being a naked function but I don't understand why a simple assignment causes a problem.
Two questions:
Why does this line trigger the hard fault?
How can I perform this assignment inside this function?

It was only luck that your previous version of the function happened to work without crashing.
The only thing that can safely be put inside a naked function is a pure Basic Asm statement. https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html. You can split it up into multiple Basic Asm statements, instead of asm("insn \n\t" / "insn2 \n\t" / ...);, but you have to write the entire function in asm yourself.
While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.
If you want to run C++ code from a naked function, you could call a regular function (or bl on ARM, jal on MIPS, etc.), following to the standard calling convention.
As for the specific reason in this case? Maybe creating that address in a register stepped on the function args, leading to the branches going wrong? Inspect the generated asm if you want, but it's 100% unsupported.
Or maybe it ended up using more registers, and since it's naked didn't properly save/restore call-preserved registers? I haven't looked at the code-gen myself for naked functions.
Are you sure this function needs to be naked? I guess that's because you manipulate lr to return to the new context.
If you don't want to just write more logic in asm, maybe have this function's caller do more work (and maybe pass it pointer and/or boolean args telling it more simply what it needs to do, so your inputs are already in registers, and you don't need to access globals).

Related

MSVC optimizer saves and restores XMM SIMD registers on an early-out path through a function. Why? [duplicate]

In C, if I have a function call that looks like
// main.c
...
do_work_on_object(object, arg1, arg2);
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
if(object == NULL)
{
return;
}
// do lots of work
}
then the compiler will generate a lot of stuff in main.o to save state, pass parameters (hopefully in registers in this case), and restore state.
However, at link time it can be observed that arg1 and arg2 are not used in the quick-return path, so the clean-up and state restoration can be short-circuited. Do linkers tend to do this kind of thing automatically, or would one need to turn on link-time optimization (LTO) to get that kind of thing to work?
(Yes, I could inspect the disassembled code, but I'm interested in the behaviours of compilers and linkers in general, and on multiple architectures, so hoping to learn from others' experience.)
Assuming that profiling shows this function call is worth optimizing, should we expect the following code to be noticeably faster (e.g. without the need to use LTO)?
// main.c
...
if(object != NULL)
{
do_work_on_object(object, arg1, arg2);
}
...
// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
assert(object != NULL) // generates no code in release build
// do lots of work
}
Some compilers (like GCC and clang) are able to do "shrink-wrap" optimization to delay saving call-preserved regs until after a possible early-out, if they're able to spot the pattern. But some don't, e.g. apparently MSVC 16.11 still doesn't.
I don't think any do partial inlining of just the early-out check into the caller, to avoid even the overhead of arg-passing and the call / ret itself.
Since compiler/linker support for this is not universal and not always successful even for shrink-wrapping, you can write your code in a way that gets much of the benefit, at the cost of splitting the logic of your function into two places.
If you have a fast-path that takes hardly any code, but happens often enough to matter, put that part in a header so it gets inlined, with a fallback to calling the rest of the function (which you make private, so it can assume that any checks in the inlined part are already done).
e.g. par2's routine that processes a block of data has a fast-path for when the galois16 factor is zero. (dst[i] += 0 * src[i] is a no-op, even when * is a multiply in Galois16, and += is a GF16 add (i.e. a bitwise XOR)).
Note how the commit in question renames the old function to InternalProcess, and adds a new template<class g> inline bool ReedSolomon<g>::Process that checks for the fast-path, and otherwise calls InternalProcess. (as well as making a bunch of unrelated whitespace changes, and some ifdefs... It was originally a 2006 CVS commit.)
The comment in the commit claims an overall 8% speed gain for repairing.
Neither the setup or cleanup state code can be short-circuited, because the resulted compiled code is static, and it doesn't know what will happen when the program get's executed. So the compiler will always have to setup the whole parameter stack.
Think of two situations: in one object is nil, in the other is not. How will the assembly code know if to put on the stack the rest of the argument? Especially as the caller is the one responsible of placing the arguments at their proper location (stack or registry).

ARM assembly - access parameter vs return value?

I have a function prototype int Palindrome(const char *c_style_string);
In ARM v8 assembly, I believe that the parameter is stored in register w0. However, isn't this also the register that ret outputs the value of?
If so, what do I need to do so that values do not get overwritten? I was thinking something like mov w0, w1 at the beginning of my code so that I refer to c_style_string as w1 whenever I parse through it, and then edit w0 to store an int...would this be right?
Thank you!
You may want to write your assembly code in compliance with the ABI for ARM 64-bit Architecture.
In the example above, you could keep the address for c_style_string in a 'Callee-saved' register (X19-X29)', and copy it to x0/w0 every time you are calling a Palindrome() - I am assuming here Palindrome() is a C function, and is therefore itself compliant with the ARCH 64-bit ABI.
A desirable side-effect would be that your C code could call always your assembly code, and vice-versa.
IMHO, your best solution is to write the C function, or minimal function, then tell the compiler to output the assembly language. This will show the calling interface for functions.
You could also look up the register passing convention in your compiler's documentation.
If you want to preserve register values, you should use the PUSH instruction (or it's equivalent, depending on ARM mode or Thumb mode). Also remember to POP the registers before the end of the function.

Will every line in a program(except variable declarations) ultimately use atleast one system call?

I was thinking about system calls and code that we write! Lets say I have a program like below
#include<stdio.h>
int main()
{
int a=0,b=2,c;
c=a+b;
printf("The value of c is %d", c);
return 0;
}
Lets take the case of c = a+b; if it was c++ compiler, then i beleive there would be a call to operator+() function. The compiler ofcourse might optimize it by replacing it with the actual code that performs addition rather than a function call within an assembly code.
And printf will have to use system call in order to write it to different hardware buffers. So i beleive most of the api's provided by the language would use system call to accomplish the function.. I am not sure if my understanding is correct. Please do correct me if I am wrong.
No, not at all. I'm unsure if you have your definition of a system call correct. Stealing a definition from Wikipedia:
In computing, a system call is how a program requests a service from an operating system's kernel.
This means that the kinds of operations that result in system calls are I/O, timing, etc -- not math, assignments, (most) memory assignments, ...
Even malloc() is usually implemented so you don't always need a system call. In general: only actions that affect or interact with the program's surrounding enviroment require a system call. Registers, program variables, etc. do not count as part of the surrounding environment.
Adding to Ethereal's answer, even if you mean "call" (as in to a function) rather than "system call" the answer is still no. For example, c=a+b is likely to generate inline assembly similar to the following pseudo-assembly:
mov reg1, [a]
mov reg2, [b]
add reg1, reg2
mov [c], reg1
No calls needed!

how to create a trampoline function using DetourAttachEx? (with MS detours)

I have a dll and i wish to create a detour to one of its exported functions,
The dll is not part of windows.
I need to be able to call the real function after my detour (call the real function from a detoured one)
I know the exact signature of the function.
I already have been able to detour the function, but right now i can't call the real one.
I realize i need to use a trampoline function, I've seen examples online.
the problem is: all those examples show how to detour a windows API function, i need to do the same for a function i get thorough a dll import.
any help would be welcomed
--edit
just to clarify, I have attempted to call the original function by its pointer, but that does not work.
also tried using the method from this stack overflow article
that doesn't even crash but it looks like it goes into to an infinite loop (i assume because in the original function there is a jump to the detoured one)
edit -- solved!
not sure what solved it,
used this as reference.
stopped using getProcadder and instead started using DetourFindFunction instead
cleaned up the code (pretty sure i cleaned out whatever caused the issue)
works,
thanks anyway
I don't use detours(I actually detest it!), but detouring any non hot-patchable function can be done in a generic manner, like so:
Sstep 1:
insert a JMP <your code> at the start of the function, takes 5 bytes, probably a little more to align to the nearest instruction. as an example
the start of the function to hook:
SUB ESP,3C
PUSH EDI
PUSH ESI
//more code
would become:
JMP MyFunction
//more code
one would do this by writing 0xE9 at the first byte then writing the value (function_addr - patch_addr + sizeof(INT_PTR)) in the following DWORD. writing should be done using WriteProcessMemory after setting Read/write/execute permissions with VirtualProtectEx
Step 2:
next, we create an assembly interface:
void __declspec(naked) MyFunc()
{
__asm
{
call Check ;call out filter func
test eax,eax ; test if we let the call through
je _EXIT
sub esp,3c ; its gone through, so we replicate what we overwrote
push edi
push esi
jmp NextExecutionAddress ; now we jump back to the location just after our jump
_EXIT:
retn ; note, this must have the correct stack cleanup
}
}
NextExecutionAddress will need to be filled at run time using ModuleBase + RVA.
To be honest, its way easier, and better(!) to just EAT (Export Address Table) hook the export table of the dll, or IAT (Import Address Table) hook the import tables of whats calling the funcs you want to filter. Detours should have functions for these type of hooks, if not, there are other freely available libs to do it.
The other way would be to use detour to hook every call in the apps using the dll to reroute them to a proxy function in your own code, this has the advantage of allowing one to filter only certain calls, and not everything across a binary(it is possible to do the same using _ReturnAddress, but thats more work), the disadvantage though is capturing the locations to patch(I use ollydbg + a custom patching engine) and it won't work on non-regular calling convention functions(like those made with #pragma aux in Watcom or the optimized calls generated by VC7+).
One important thing to note: if your hooking a multithreaded app, your patches need to be done with the app suspended, or be done attomically use InterlockedExchange, InterlockExchange64 and InterlockedExchangePointer(I use the latter for all IAT/EAT hooks, especially when hooking from a 'third party process')
Looking at the post you link to, the method there is horrible in my opinion, mainly due to the assmebly :P but, how are you calling this pointer you obtain, and how is it obtained?

Using __asm to call a function from hex offset

I don't know assembly so I'm not sure how to go about this.
I have a program which is hooking into another. I have obtained the offset to where the function is located within the hooked program's .exe
#define FuncToCall 0x00447E5D
So now how can I use __asm{} to call that function?
Well short answer is if you do not know assembly you should not be doing this, haha.
But, if you are so intent on wall hacking, I mean, modifying the operation of a legitimate program, you can't just take an address and call it good.
You need to look up the symbol (if in happy linux land) or use sig scanning ( or both D= ) to find the actual function.
Once you do that then its relatively simple, you just need to write a mov and jmp. Assuming you have access to the running process already, and your sig scanner found the right address, this bit of code will get you want you want
mov eax, 0×deadbeef
jmp eax
Now, if this function you want is a class method.. you need to do some more studying. But that bit of assembly will run whatever static function you want.
There is some mess to deal with different calling conventions too, so no commenters try and call me out on that, that is far to advanced for this question.
EDIT:
By the way I do not use call because when using call you have to worry about stack frames and other very messing things. This code will jump to any address and start executing.
If you want to return to your code thats another story, but that WILL get your target function going as long as its the right calling convention, not a class method, etc etc
I think you could also cast that address to a function pointer and call it that way. That might be better.
Thanks for answers, but I figured it out. This is what I'm doing:
#define FuncToCall 0x00447E5D
DWORD myfunc = FuncToCall;
__asm call dword ptr [myfunc];
If it works don't fix it, and by golly it works.
Here is a tricky one:
You can use it with parameters and return value too. It simply forwards everything to the function you intend to call that is given by a pointer (FuncToCall) to the function.
void call_FuncToCall(.......)
{
__asm__
("call label1\n label1:\n"
"pop %eax\n"
"movl FuncToCall, %eax\n"
"leave\n"
"jmp *%eax");
}