ARM assembly - access parameter vs return value? - c++

I have a function prototype int Palindrome(const char *c_style_string);
In ARM v8 assembly, I believe that the parameter is stored in register w0. However, isn't this also the register that ret outputs the value of?
If so, what do I need to do so that values do not get overwritten? I was thinking something like mov w0, w1 at the beginning of my code so that I refer to c_style_string as w1 whenever I parse through it, and then edit w0 to store an int...would this be right?
Thank you!

You may want to write your assembly code in compliance with the ABI for ARM 64-bit Architecture.
In the example above, you could keep the address for c_style_string in a 'Callee-saved' register (X19-X29)', and copy it to x0/w0 every time you are calling a Palindrome() - I am assuming here Palindrome() is a C function, and is therefore itself compliant with the ARCH 64-bit ABI.
A desirable side-effect would be that your C code could call always your assembly code, and vice-versa.

IMHO, your best solution is to write the C function, or minimal function, then tell the compiler to output the assembly language. This will show the calling interface for functions.
You could also look up the register passing convention in your compiler's documentation.
If you want to preserve register values, you should use the PUSH instruction (or it's equivalent, depending on ARM mode or Thumb mode). Also remember to POP the registers before the end of the function.

Related

MSVC: Reading a specific 64 or 32 bit register (e.g. R10) in 64 bit code?

Is there any way with MSVC to read a specific 64 (or 32) bit register directly in a normal C++ function?
For example, can I read the contents of r10 somehow via any intrinsics or such?
For context:
I'm implementing a variadic function (lets call it my_func), which needs to forward its call to another variadic function, and add one more argument along the way (an ID if you will, any numeric type will do - a 16, 32 or 64 bit integer for example, doesn't matter too much).
I need to do this forwarding in as little instructions as possible, so I can't process the variadic list in the initial function and just forward the va_list or such.
So I've implemented my_func in assembly:
; This function needs to be as compact as possible
my_func PROC
; assume 123 is the ID to be passed along with the arguments that my_func is called with
mov r10, 123
jmp address_of_the_real_target_function
my_func
I just jump to the target function, and pass the ID in a seperate register - R10 in this case.
ARG* the_real_target_function(ARG* arg0, ...)
{
auto id = ReadRegister();
// ... do stuff ...
}
This works well so far - only nuisance being that I needed another assembly helper function to read R10 back in the proper C++ function,
ReadRegister PROC
mov rax, r10
ret
ReadRegister ENDP
which is a bit annoying as that call won't get inlined.
Hence the question - is there any way to read this register directly in C++?
(Otherwise, I was thinking of maybe utilizing SSE registers, which should be readable via intrinsics - but curious if there's a way to do this with just 64 - or 32 - bit registers)
Thanks
--
edit: I believe this is not a duplicate of the linked topic. Listed solutions in there are specific to other compilers, or in case os MSVC, 32-bit only (inline assembly is not supported on x64)
--
edit 2: For more context on why I'm trying to do this.
This is indended to be an Excel Addin (which will host plugins and expose their functions to Excel, basically).
In order to register a function in Excel, I need to bind it to a specific function exported by my DLL. I don't know in advance (= at compile time) how many, or what plugin functions need to be registered and called.
So I need to implement loads of exported functions - thousands. Enough to always have registration slots for all plugins available.
In order to keep the overall size of the DLL in check, I need the registered functions to be very slim, and ideally also be capable of dealing with variadic args (as I don't know what shape the plugin functions have at compile-time; and due to the space-constraints, I want to avoid creating callbacks for any possible aririty of arguments)
And for even more added fun, it needs to work in x64 and x86 - though in the latter case, the function is called by Excel via stdcall convention, so the usual C++ variadic args won't work. But, at least at runtime I can find out the number (and type) of args passed to the function, so I should be able to handle the stack myself.
So bottom line, my idea is to have these slim trampoline functions, which will forward all arguments, plus their ID, to some central handler (as per above in X64; and via stack in X86).
The handler then gets things a bit into order - i.e. creates some standardized iterator for the arguments, calls the actual plugin function registered via that ID etc.
static thread_local variable would take few instructions, so it is not that slim as you may want.
Yet it would be fully portable.
There's less portable but more instruction-efficien way.
Notice Arbitrary data slot in TEB.
So __readfsdword(0x14)/__writefsdword(0x14) on x86 and __readgsqword(0x28)/__writegsqword(0x28) on x64 may do this trick. If, well, no one else is using the same extra space for other purpose.

How to call a function and pass arguments to it in x86 assembly

Intel CPU
Windows 10 64bit
C++
x86 assembly
I have two programs, both written by me in C++. For the sake of simplicity I will refer to them as program A and program B. They do not do anything special really, I am just using them to test things out and have some fun in the process.
The idea is that program A injects code into program B and that injected code will set the parameters of a function in program B and will call a function in program B.
I must say I have learned a lot from this experiment. As I needed to open up a handle to a process with proper permissions and then construct assembly code to inject, call it with CreateRemoteThread and clean up afterwards.
I ve managed to do this and call a function from program B and that function takes one parameter of type UINT64.
I do this by injecting the following assembly code:
b9 paramAddr
e8 funcAddr
c3
By calling this code snippet from program A with CreateRemoteThread in program B I manage to call a function at an address and with a parameter passed. And this works fine. Nothing too complex just call a function that takes one param. One thing to note here is that I have injected the parameter prior to this code and just provided a parameter address to b9.
Now what I am failing to do is call a function in program B from program A that takes two parameters.
Function Example:
myFunction(uint num1, int num2)
The procedure for injection is the same, and all that works just fine windows API provides plenty of well documented functionalities.
What I do not seam to be able to do is pass the two parameters to the function. This is where my troubles begin. I have been looking at x86 assembly function call conventions. And what they do is either just
push param2
push param1
call functAddr
retn
or
perform a mov to esi
Could anyone please clarify,explain and provide a clear example of how to call a function in x86 assembly that takes two parameters or type uint and int.
Thank you all for your time and effort.
Since you are looking for a way to understand and clarify what is happening internally, I recommend to start with generating an assembler file for the specific machine you are working with. If you are using gcc or g++ you can use the -S flag to generate the associated assembler files. For the beginning you can implement a function with two arguments and call that function inside your main function. Using the assembler files, you should get a really good picture of how the stack is filled before your function is called and where your return value is put. In the next step you should compare what you see in the assembler file with the x86 calling convetion.

Inline Assembly - Display a register in decimal using printf?

I just had a really quick question that I saw someone mention something about in another question, but I didn't want to necro-post on it.
I'm coding in inline assembly with c++, and need to display a register value in decimal. I was searching ways to do this, and saw someone mention "If you're using inline c, just call printf." But they didn't go much further into explanation on it than that.
Is it possible the call printf can be used to get a register value in decimal format without needing to write a conversion section of the code? And if so, how would that work? Say after some computations to a user entered integer, the value now lies in the AX register. Would I simply put call printf in the code after it? Or does it print values from the stack? Or is it maybe even possible to do something like:
AX printf
I apologize for my ignorance on this, our book does not cover inline assembly, and I'd like to avoid having to write a massive segment of code to convert if I can. Plus I can't really seem to find answers on how exactly printf works. Thank you for any help, I really appreciate it!
The easiest way to accomplish this is to use inline assembler to copy your register to some variable, and then print that variable.
short registerValue;
__asm mov registerValue, ax;
printf("ax: %hd", registerValue);
The exact assembler invocation will depend on your compiler and syntax; the above likely won't work with a compiler other than cl.
If you want to actually call printf from assembler, you'll need to figure out it's calling convention and how that calling convention passes variadic function arguments.
Depending on the compiler, there may be predefined pseudo-symbols which directly access the registers. This was especially convenient with Turbo C and its descendants:
_some_magic_function ();
printf ("es:bx = %0x:%0x\n", _ES, _BX);

C++ inline assembly (Intel compiler): LEA and MOV behaving differently in Windows and Linux

I am converting a huge Windows dll to work on both Windows and Linux. The dll has a lot of assembly (and SS2 instructions) for video manipulation.
The code now compiles fine on both Windows and Linux using Intel compiler included in Intel ComposerXE-2011 on Windows and Intel ComposerXE-2013 SP1 on Linux.
The execution, however, crashes in Linux when trying to call a function pointer. I traced the code in gdb and indeed the function pointer doesn't point to the required function (whereas in Windows in does). Almost everything else works fine.
This is the sequence of code:
...
mov rdi, this
lea rdx, [rdi].m_sSomeStruct
...
lea rax, FUNCTION_NAME # if replaced by 'mov', works in Linux but crashes in Windows
mov [rdx].m_pfnFunction, rax
...
call [rdx].m_pfnFunction # crash in Linux
where:
1) 'this' has a struct member m_sSomeStruct.
2) m_sSomeStruct has a member m_pfnFunction, which is a pointer to a function.
3) FUNCTION_NAME is a free function in the same compilation unit.
4) All those pure assembly functions are declared as naked.
5) 64-bit environment.
What is confusing me the most is that if I replace the 'lea' instruction that is supposed to load the function's address into rax with a 'mov' instruction, it works fine on Linux but crashes on Windows. I traced the code in both Visual Studio and gdb and apparently in Windows 'lea' gives the correct function address, whereas in Linux 'mov' does.
I tried looking into the Intel assembly reference but didn't find much to help me there (unless I wasn't looking in the right place).
Any help is appreciated. Thanks!
Edit More details:
1) I tried using square brackets
lea rax, [FUNCTION_NAME]
but that didn't change the behaviour in Windows nor in Linux.
2) I looked at the disassembler in gdb and Windows, seem to both give the same instructions that I actually wrote. What's even worse is that I tried putting both lea/mov one after the other, and when I look at them in disassembly in gdb, the address printed after the instruction after a # sign (which I'm assuming is the address that's going to be stored in the register) is actually the same, and is NOT the correct address of the function.
It looked like this in gdb disassembler
lea 0xOffset1(%rip), %rax # 0xSomeAddress
mov 0xOffset2(%rip), %rax # 0xSomeAddress
where both (SomeAddress) were identical and both offsets were off by the same amount of difference between lea and mov instructions,
But somehow, the when I check the contents of the registers after each execution, mov seem to put in the correct value!!!!
3) The member variable m_pfnFunction is of type LOAD_FUNCTION which is defined as
typedef void (*LOAD_FUNCTION)(const void*, void*);
4) The function FUNCTION_NAME is declared in the .h (within a namespace) as
void FUNCTION_NAME(const void* , void*);
and implemented in .cpp as
__declspec(naked) void namespace_name::FUNCTION_NAME(const void* , void*)
{
...
}
5) I tried turning off optimizations by adding
#pragma optimize("", off)
but I still have the same issue
Off hand, I suspect that the way linking to DLLs works in the latter case is that FUNCTION_NAME is a memory location that actually will be set to the loaded address of the function. That is, it's a reference (or pointer) to the function, not the entry point.
I'm familiar with Win (not the other), and I've seen how calling a function might either
(1) generate a CALL to that address, which is filled in at link time. Normal enough for functions in the same module, but if it's discovered at link time that it's in a different DLL, then the Import Library is a stub that the linker treats the same as any normal function, but is nothing more than JMP [????]. The table of addresses to imported functions is arranged to have bytes that code a JMP instruction just before the field that will hold the address. The table is populated at DLL Load time.
(2) If the compiler knows that the function will be in a different DLL, it can generate more efficient code: It codes an indirect CALL to the address located in the import table. The stub function shown in (1) has a symbol name associated with it, and the actual field containing the address has a symbol name too. They both are named for the function, but with different "decorations". In general, a program might contain fixup references to both.
So, I conjecture that the symbol name you used matches the stub function on one compiler, and (that it works in a similar way) matches the pointer on the other platform. Maybe the assembler assigns the unmangled name to one or the other depending on whether it is declared as imported, and the options are different on the two toolchains.
Hope that helps. I suppose you could look at run-time in a debugger and see if the above helps you interpret the address and the stuff around it.
After reading the difference between mov and lea here What's the purpose of the LEA instruction? it looks to me like on Linux there is one additional level of indirection added into the function pointer. The mov instruction causes that extra level of indirection to be passed through, while on Windows without that extra indirection you would use lea.
Are you by any chance compiling with PIC on Linux? I could see that adding the extra indirection layer.

Will every line in a program(except variable declarations) ultimately use atleast one system call?

I was thinking about system calls and code that we write! Lets say I have a program like below
#include<stdio.h>
int main()
{
int a=0,b=2,c;
c=a+b;
printf("The value of c is %d", c);
return 0;
}
Lets take the case of c = a+b; if it was c++ compiler, then i beleive there would be a call to operator+() function. The compiler ofcourse might optimize it by replacing it with the actual code that performs addition rather than a function call within an assembly code.
And printf will have to use system call in order to write it to different hardware buffers. So i beleive most of the api's provided by the language would use system call to accomplish the function.. I am not sure if my understanding is correct. Please do correct me if I am wrong.
No, not at all. I'm unsure if you have your definition of a system call correct. Stealing a definition from Wikipedia:
In computing, a system call is how a program requests a service from an operating system's kernel.
This means that the kinds of operations that result in system calls are I/O, timing, etc -- not math, assignments, (most) memory assignments, ...
Even malloc() is usually implemented so you don't always need a system call. In general: only actions that affect or interact with the program's surrounding enviroment require a system call. Registers, program variables, etc. do not count as part of the surrounding environment.
Adding to Ethereal's answer, even if you mean "call" (as in to a function) rather than "system call" the answer is still no. For example, c=a+b is likely to generate inline assembly similar to the following pseudo-assembly:
mov reg1, [a]
mov reg2, [b]
add reg1, reg2
mov [c], reg1
No calls needed!