asm subroutine handling int and char from c++ file - c++

how are an int and char handled in an asm subroutine after being linked with a c++ program? e.g. extern "C" void LCD_ byte (char byte, int cmd_ data); how does LCD_ byte handle the "byte" and "cmd_ data"? how do I access "byte" and "cmd_ data" in the assembly code?

This very much depends on the microprocessor you use. If it is x86, the char will be widened to an int, and then both parameters are passed on the stack. You can find out yourself by compiling C code that performs a call into assembly code, and inspect the assembly code.
For example, given
void LCD_byte (char byte, int cmd_data);
void foo()
{
LCD_byte('a',100);
}
gcc generates on x86 Linux the code
foo:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl $100, 4(%esp)
movl $97, (%esp)
call LCD_byte
leave
ret
As you can see, both values are pushed on the stack (so that 'a' is on the top), then a call instruction to the target routine is made. Therefore, the target routine can find the first incoming parameter at esp+4.

Well a lot depends on the calling convention which in turn, AFAIK, depends on the compiler.
But 99.9%% of the time it is one of 2 things. Either they are passed in registers or they are pushed on to the stack and popped back off inside the function.

Look up the documentation for your platform. It tells you which calling convention is used for C.
The calling convention specifies how parameters are passed, which registers are caller-saves and which are callee-saves, how the return address is stored and everything else you need to correctly implement a function that can be called from C. (as well as everything you need to correctly call a C function)

Related

How to use inline assembly to write data with MOVNTI instruction to variable memory address? [duplicate]

I am trying to understand some things about inline assembler in Linux. I am using following function:
void test_func(Word32 *var){
asm( " addl %0, %%eax" : : "m"(var) );
return;
}
It generates following assembler code:
.globl test_func
.type test_func, #function
test_func:
pushl %ebp
movl %esp, %ebp
#APP
# 336 "opers.c" 1
addl 8(%ebp), %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
.size test_func, .-test_func
It sums var mem address to eax register value instead var value.
Is there any way to tell addl instruction to use var value instead var mem address without copying var mem address to a register?
Regards
It sums var mem address to eax register value instead var value.
Yes, the syntax of gcc inline assembly is pretty arcane. Paraphrasing from the relevant section in the GCC Inline Assembly HOWTO "m" roughly gives you the memory location of the C-variable.
It's what you'd use when you just want an address you can write to or read from. Notice I said the location of the C variable, so %0 is set to the address of Word32 *var - you have a pointer to a pointer. A C translation of the inline assembly block could look like EAX += *(&var) because you can say that the "m" constraint implicitly takes the address of the C variable and gives you an address expression, that you then add to %eax.
Is there any way to tell addl instruction to use var value instead var mem address without copying var mem address to a register?
That depends on what you mean. You need to get var from the stack, so someone has to dereference memory (see #Bo Perssons answer), but you don't have to do it in inline assembly
The constraint needs to be "m"(*var) (as #fazo suggested). That will give you the memory location of the value that var is pointing to, rather than a memory location pointing to it.
The generated code is now:
test_func:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
#APP
# 2 "test.c" 1
addl (%eax), %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
Which is a little suspect, but that's understandable as you forgot to tell GCC that you clobbered (modified without having in the input/output list) %eax. Fixing that asm("addl %0, %%eax" : : "m"(*var) : "%eax" ) generates:
movl 8(%ebp), %edx
addl (%edx), %eax
Which isn't any better or more correct in this case, but it is always a good practice to remember. See the section on the clobber list and pay special attention to the "memory" clobber for advanced usage of inline assembly.
Even though you don't want to (explicitly) load the memory address into a register I'll briefly cover it.
Changing the constraint from "m" to "r" almost seems to work, the relevant sections gets changed to (if we include %eax in the clobber list):
movl 8(%ebp), %edx
addl %edx, %eax
Which is almost correct, we have loaded the pointer value var into a register, but now we have to specify ourselves that we're loading from memory. Changing the code to match the constraint (usually undesirable, I'm only showing it for completeness):
asm("addl (%0), %%eax" : : "r"(var) : "%eax" );
Gives:
movl 8(%ebp), %edx
addl (%edx), %eax
The same as with "m".
yes, because you give him var which is address. give him *var.
like:
void test_func(Word32 *var){
asm( " addl %0, %%eax" : : "m"(*var) );
return;
}
i don't remember exactly, but you should replace "m" with "r" ?
memory operand doesn;t mean that it will take value from that address. it's just a pointer
No, there is no addressing mode for x86 processors that goes two levels indirect.
You have to first load the pointer from a memory address and then load indirectly from the pointer.
An "m" constraint doesn't implicitly dereference anything. It's just like an "r" constraint, except it expands to an addressing mode for a memory location holding the value of the expression, instead of a register. (In C, every object has an address, although often that can be optimized away.)
The C object that's an input (or output for "=m") for the asm is the lvalue or rvalue you specify, e.g. "m"(var) takes the value of var, not *var. So you'd adding the pointer. (And telling the compiler that you want that input pointer value to be in memory, not a register.)
Perhaps it's confusing you that you have a pointer but you called it var, not ptr or something? A C pointer is an object whose value is an address, and can itself be stored in memory. If you were using C++, Word32 &var would make the dereference implicit whenever you write var.
In C terms, you're doing eax += ptr, but you want eax += *ptr, so you should write
void test_func(Word32 *ptr){
asm( "add %[input], %%eax"
: // no inputs. Probably you should use "+a"(add_to_this) if you want the add result, and remove the EAX clobber.
: [input] "m"(*ptr) // the pointed-to Word32 in memory
: "eax" // the instruction modifies EAX; tell the compiler about it
);
}
Compiling (Godbolt compiler explorer) results in:
# gcc -O3 -m32
test_func:
movl 4(%esp), %edx # compiler-generated load of the function arg
add (%edx), %eax # from asm template, (%edx) filled in as %[input] for *ptr
ret
Or if you'd compiled with -mregparm=3, or a 64-bit build, the arg would already be in a register. e.g. 64-bit GCC emits add (%rdi), %eax ; ret.
If you'd written return *ptr in C for a function returning Word32, with no inline asm, the asm would be similar, loading the pointer arg from the stack and then mov (%edx), %eax to load the return value. See the Godbolt link for that.
If inline asm isn't doing what you expect, look at the compiler generated asm to see how it filled in your template. That sometimes helps you figure out what the compiler thought you meant. (But only if you understand the basic design principles.)
If you write "m"(ptr), it compiles as follows:
void add_pointer(Word32 *ptr)
{
asm( "add %[input], %%eax" : : [input] "m"(ptr) : "eax" );
}
add_pointer:
add 4(%esp), %eax # ptr
ret
Very similar to if you wrote Word32 *bar(Word32 *ptr){ return ptr; }
Note that if you wanted to increment the memory location, you'd use a "+m"(*ptr) constraint to tell the compiler that the pointed-to memory is both an input and output. Or if you write-only to the memory, "=m"(*ptr) so it can potentially optimize away earlier dead stores to this memory location.
See also How can I indicate that the memory *pointed* to by an inline ASM argument may be used? to handle cases where you use an "r"(ptr) input and dereference the pointer manually inside the asm, accessing memory that you didn't tell the compiler about as being an input or output operand.
Generally avoid doing "r"(ptr) and then manually doing add (%0), %%eax. It needs extra constraints to make it safe, and it forces the compiler to materialize the exact address in a register, instead of using an addressing mode to reach it relative to some other register. e.g. 4(%ecx) if after inlining it sees that you're actually passing a pointer into an array or to a struct member.
Of course, generally avoid inline asm entirely unless you can't get the compiler to emit good enough asm without it. https://gcc.gnu.org/wiki/DontUseInlineAsm. If you do decide to use it, see https://stackoverflow.com/tags/inline-assembly/info for guides to avoid common mistakes.
Try
void test_func(Word32 *var){
asm( " mov %0, %%edx; \
addl (%%edx), %%eax" : : "m"(var) );
return;
}

Why more space on the stack frame is reserved than is needed in x86

With reference to the following code
#include <iostream>
using namespace std;
void do_something(int* ptr) {
cout << "Got address " << reinterpret_cast<void*>(ptr) << endl;
}
void func() {
int a;
do_something(&a);
}
int main() {
func();
}
When I disassemble the func function the x86 (I am not sure whether it is x86 or x86_64) code is
-> 0x100001140 <+0>: pushq %rbp
0x100001141 <+1>: movq %rsp, %rbp
0x100001144 <+4>: subq $0x10, %rsp
0x100001148 <+8>: leaq -0x4(%rbp), %rdi
0x10000114c <+12>: callq 0x100000f90 ; do_something(int*)
0x100001151 <+17>: addq $0x10, %rsp
0x100001155 <+21>: popq %rbp
0x100001156 <+22>: retq
0x100001157 <+23>: nopw (%rax,%rax)
I understand that the first push statement is pushing the base pointer to the previous function call on the stack, and then the stack pointer value is copied over to the base pointer. But then why are 16 bytes reserved for the stack?
Does this have to do with alignment somehow? The variable a needs only 4 bytes..
Also what exactly is the lea instruction doing in this function call? is it just getting the address of the integer relative to the base pointer? Which in this case seems to be 4 bytes off from the base (assuming that the return address is 4 bytes long and is the first thing on the stack)
Other architectures seem to reserve more than 16 bytes and have other things stored on the base of the stack frame..
This is x64 code, note the usage of the rsp register. x86 code uses the esp register. Most important implementation detail of the x64 ABI is that the stack must always be aligned to 16. Not actually necessary to properly run 64-bit code, but the alignment guarantee ensures that the compiler can safely emit SSE instructions. Their operands require 16 byte alignment to be fast. None are actually used in this snippet but they might be in do_something.
Upon entry of your function, the caller's CALL instruction has pushed 8 bytes on the stack to store the return address. The first PUSH instruction aligns the stack to 16 again, no additional corrections required.
It then creates the stack frame to store the a variable. While only 4 bytes are required, adjusting rsp by only 4 isn't good enough to provide the necessary alignment. So it picks the next suitable value, 16. The extra 12 bytes are simply unused.
The LEA instruction is a very handy one that implements &a. LEA = Load Effective Address = "take the address of". Not a particularly involved calculation here, it gets more convoluted when you use something like &array[ix]. Something that still can be done by a single LEA if the array element size is 1, 2 or 4 bytes long, pretty common.
The -4 is the offset from the start of the stack frame for the a variable. 4 bytes are needed to store int, your compiler implements the LP64 data model. Keep in mind that the stack grows downwards so it isn't 0.
Then it is just making the function call, the rdi register is used to pass the 1st argument in the x64 ABI. Then it destroys the stack frame again by re-adjusting rsp and restores rbp.
Do keep in mind that you are looking at unoptimized code. Usually none of this is left after the optimizer is done with it, small functions like this almost always get inlined. So this doesn't teach you that much practical knowledge of the code that actually runs. Have a look-see at the -O2 code.
According to x86-64 ABI, the stack must be 16 byte aligned prior to a subroutine call.
leaq (mem), reg
is equivalent to the following
reg = &(*mem) = mem

Decreasing of the stack pointer by creating local variables

To get a better understanding of binary files I've prepared a small c++ example and used gdb to disassemble and look for the machine code.
The main() function calls the function func():
int func(void)
{
int a;
int b;
int c;
int d;
d = 4;
c = 3;
b = 2;
a = 1;
return 0;
}
The project is compiled with g++ keeping the debugging information. Next gdb is used to disassemble the source code. What I got for func() looks like:
0x00000000004004cc <+0>: push %rbp
0x00000000004004cd <+1>: mov %rsp,%rbp
0x00000000004004d0 <+4>: movl $0x4,-0x10(%rbp)
0x00000000004004d7 <+11>: movl $0x3,-0xc(%rbp)
0x00000000004004de <+18>: movl $0x2,-0x8(%rbp)
0x00000000004004e5 <+25>: movl $0x1,-0x4(%rbp)
0x00000000004004ec <+32>: mov $0x0,%eax
0x00000000004004f1 <+37>: pop %rbp
0x00000000004004f2 <+38>: retq
Now my problem is that I expect that the stack pointer should be moved by 16 bytes to lower addresses relative to the base pointer, since each integer value needs 4 bytes. But it looks like that the values are putted on the stack without moving the stack pointer.
What did I not understand correctly? Is this a problem with the compiler or did the assembler omit some lines?
Best regards,
NouGHt
There's absolutely no problem with your compiler. The compiler is free to choose how to compile your code, and it chose not to modify the stack pointer. There's no need for it to do so since your function doesn't call any other functions. If it did call another function then it would need to create another stack frame so that the callee did not stomp on the caller's stack frame.
As a general rule, you should avoid trying to make any assumptions on how the compiler will compile your code. For example, your compiler would be perfectly at liberty to opimize away the body of your function.

Load 64-bit integer constant via GNU extended asm constraint?

I've written this code in Clang-compatible "GNU extended asm":
namespace foreign {
extern char magic_pointer[];
}
extern "C" __attribute__((naked)) void get_address_of_x(void)
{
asm volatile("movq %[magic_pointer], %%rax\n\t"
"ret"
: : [magic_pointer] "p"(&foreign::magic_pointer));
}
I expected it to compile into the following assembly:
_get_address_of_x:
## InlineAsm Start
movq $__ZN7foreign13magic_pointerE, %rax
ret
## InlineAsm End
ret /* useless but I don't think there's any way to get rid of it */
But instead I get this "nonsense":
_get_address_of_x:
movq __ZN7foreign13magic_pointerE#GOTPCREL(%rip), %rax
movq %rax, -8(%rbp)
## InlineAsm Start
movq -8(%rbp), %rax
ret
## InlineAsm End
ret
Apparently Clang is assigning the value of &foreign::magic_pointer into %rax (which is deadly to a naked function), and then further "spilling" it onto a stack frame that doesn't even exist, all so it can pull it off again in the inline asm block.
So, how can I make Clang generate exactly the code I want, without resorting to manual name-mangling? I mean I could just write
extern "C" __attribute__((naked)) void get_address_of_x(void)
{
asm volatile("movq __ZN7foreign13magic_pointerE#GOTPCREL(%rip), %rax\n\t"
"ret");
}
but I really don't want to do that if there's any way to help it.
Before hitting on "p", I'd tried the "i" and "n" constraints; but they didn't seem to work properly with 64-bit pointer operands. Clang kept giving me error messages about not being able to allocate the operand to the %flags register, which seems like something crazy was going wrong.
For those interested in solving the "XY problem" here: I'm really trying to write a much longer assembly stub that calls off to another function foo(void *p, ...) where the argument p is set to this magic pointer value and the other arguments are set based on the original values of the CPU registers at the point this assembly stub was entered. (Hence, naked function.) Arbitrary company policy prevents just writing the damn thing in a .S file to begin with; and besides, I really would like to write foreign::magic_pointer instead of __ZN7foreign...etc.... Anyway, that should explain why spilling temporary results to stack or registers is strictly verboten in this context.
Perhaps there's some way to write
asm volatile(".long %[magic_pointer]" : : [magic_pointer] "???"(&foreign::magic_pointer));
to get Clang to insert exactly the relocation I want?
I think this is what you want:
namespace foreign {
extern char magic_pointer[];
}
extern "C" __attribute__((naked)) void get_address_of_x(void)
{
asm volatile ("ret" : : "a"(&foreign::magic_pointer));
}
In this context, "a" is a constraint that specifies that %rax must be used. Clang will then load the address of magic_pointer into %rax in preparation for executing your inline asm, which is all you need.
It's a little dodgy because it's defining constraints that are unreferenced in the asm text, and I'm not sure whether that's technically allowed/well-defined - but it does work on latest clang.
On clang 3.0-6ubuntu3 (because I'm being lazy and using gcc.godbolt.org), with -fPIC, this is the asm you get:
get_address_of_x: # #get_address_of_x
movq foreign::magic_pointer#GOTPCREL(%rip), %rax
ret
ret
And without -fPIC:
get_address_of_x: # #get_address_of_x
movl foreign::magic_pointer, %eax
ret
ret
OP here.
I ended up just writing a helper extern "C" function to return the magic value, and then calling that function from my assembly code. I still think Clang ought to support my original approach somehow, but the main problem with that approach in my real-life case was that it didn't scale to x86-32. On x86-64, loading an arbitrary address into %rdx can be done in a single instruction with a %rip-relative mov. But on x86-32, loading an arbitrary address with -fPIC turns into just a ton of code, .indirect_symbol directives, two memory accesses... I just didn't want to attempt writing all that by hand. So my final assembly code looks like
asm volatile(
"...save original register values...;"
"call _get_magic_pointer;"
"movq %rax, %rdx;"
"...set up other parameters to foo...;"
"call _foo;"
"...cleanup..."
);
Simpler and cleaner. :)

The C++ implicit this, and exactly how it is pushed on the stack

I need to know whether, when a class method in C++ is called, the implicit 'this' pointer is the first argument, or the last. i.e: whether it is pushed onto the stack first or last.
In other words, I'm asking whether a class method, being called, is taken by the compiler to be:
int foo::bar(foo *const this, int arg1, int arg2);
//or:
int foo::bar(int arg1, int arg2, foo *const this);
By extension therefore, and more importantly, that would also answer whether G++ would push the this pointer last or first, respectively. I interrogated google, but I didn't find much.
And as a side note, when C++ functions are called, do they do the same thing as C functions? i.e:
push ebp
mov ebp, esp
All in all: would a class method being called look like this?
; About to call foo::bar.
push dword 0xDEADBEEF
push dword 0x2BADBABE
push dword 0x2454ABCD ; This one is the this ptr for the example.
; this code example would match up if the this ptr is the first argument.
call _ZN3foo3barEpjj
Thanks, and much obliged.
EDIT: to clarify things, I'm using GCC/G++ 4.3
This depends on the calling convention of your compiler and the target architecture.
By default, Visual C++ will not push this on the stack. For x86, the compiler will default to "thiscall" calling convention and will pass this in the ecx register. If you specify __stdcall for you member function, it will be pushed on the stack as the first parameter.
For x64 on VC++, the first four parameters are passed in registers. This is the first parameter and passed in the rcx register.
Raymond Chen had a series some years ago on calling conventions. Here are the x86 and x64 articles.
This will depend on your compiler and architecture, but in G++ 4.1.2 on Linux with no optimization settings it treats this as the first parameter, passed in a register:
class A
{
public:
void Hello(int, int) {}
};
void Hello(A *a, int, int) {}
int main()
{
A a;
Hello(&a, 0, 0);
a.Hello(0, 0);
return 0;
}
Disassembly of main():
movl $0, 8(%esp)
movl $0, 4(%esp)
leal -5(%ebp), %eax
movl %eax, (%esp)
call _Z5HelloP1Aii
movl $0, 8(%esp)
movl $0, 4(%esp)
leal -5(%ebp), %eax
movl %eax, (%esp)
call _ZN1A5HelloEii
I just had a read of the C++ Standard (ANSI ISO IEC 14882 2003), section 9.3.2 "The this pointer", and it does not seem to specify anything about where it should occur in the list of arguments, so it's up to the individual compiler.
Try compiling some code with gcc using the '-S' flag to generate assembly code and take a look at what it's doing.
This sort of detail is not specified by the C++ standard. However, read through the C++ ABI for gcc (and other C++ compilers that follow the C++ ABI).