Can any one tell me the meaning of the following:
gdb> disas 0x080ed5af
0x080ed5ac <func1+205>: mov 0x8(%eax),%eax
0x080ed5af <func1+208>: testb $0x10,0x8(%eax)
0x080ed5b3 <func1+212>: jne 0x80ed604 <dapriv_disk_op+293>
0x080ed5b5 <func1+214>: mov %edi,(%esp)
What is the meaning of testb $0x10,0x8(%eax)?
It performs a bitwise AND of the two operands (0x10 AND 0x8(%eax) (this is, the value of the byte located at the address pointed to by %eax + 0x8). Neither of the operands is altered, however, the instruction alters the flags, most importantly the ZF flag to either 1 if the result of the AND is zero, or 0 otherwise. The following jne performs a jump if ZF is equal to 0.
Related
I am trying to understand some things about inline assembler in Linux. I am using following function:
void test_func(Word32 *var){
asm( " addl %0, %%eax" : : "m"(var) );
return;
}
It generates following assembler code:
.globl test_func
.type test_func, #function
test_func:
pushl %ebp
movl %esp, %ebp
#APP
# 336 "opers.c" 1
addl 8(%ebp), %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
.size test_func, .-test_func
It sums var mem address to eax register value instead var value.
Is there any way to tell addl instruction to use var value instead var mem address without copying var mem address to a register?
Regards
It sums var mem address to eax register value instead var value.
Yes, the syntax of gcc inline assembly is pretty arcane. Paraphrasing from the relevant section in the GCC Inline Assembly HOWTO "m" roughly gives you the memory location of the C-variable.
It's what you'd use when you just want an address you can write to or read from. Notice I said the location of the C variable, so %0 is set to the address of Word32 *var - you have a pointer to a pointer. A C translation of the inline assembly block could look like EAX += *(&var) because you can say that the "m" constraint implicitly takes the address of the C variable and gives you an address expression, that you then add to %eax.
Is there any way to tell addl instruction to use var value instead var mem address without copying var mem address to a register?
That depends on what you mean. You need to get var from the stack, so someone has to dereference memory (see #Bo Perssons answer), but you don't have to do it in inline assembly
The constraint needs to be "m"(*var) (as #fazo suggested). That will give you the memory location of the value that var is pointing to, rather than a memory location pointing to it.
The generated code is now:
test_func:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
#APP
# 2 "test.c" 1
addl (%eax), %eax
# 0 "" 2
#NO_APP
popl %ebp
ret
Which is a little suspect, but that's understandable as you forgot to tell GCC that you clobbered (modified without having in the input/output list) %eax. Fixing that asm("addl %0, %%eax" : : "m"(*var) : "%eax" ) generates:
movl 8(%ebp), %edx
addl (%edx), %eax
Which isn't any better or more correct in this case, but it is always a good practice to remember. See the section on the clobber list and pay special attention to the "memory" clobber for advanced usage of inline assembly.
Even though you don't want to (explicitly) load the memory address into a register I'll briefly cover it.
Changing the constraint from "m" to "r" almost seems to work, the relevant sections gets changed to (if we include %eax in the clobber list):
movl 8(%ebp), %edx
addl %edx, %eax
Which is almost correct, we have loaded the pointer value var into a register, but now we have to specify ourselves that we're loading from memory. Changing the code to match the constraint (usually undesirable, I'm only showing it for completeness):
asm("addl (%0), %%eax" : : "r"(var) : "%eax" );
Gives:
movl 8(%ebp), %edx
addl (%edx), %eax
The same as with "m".
yes, because you give him var which is address. give him *var.
like:
void test_func(Word32 *var){
asm( " addl %0, %%eax" : : "m"(*var) );
return;
}
i don't remember exactly, but you should replace "m" with "r" ?
memory operand doesn;t mean that it will take value from that address. it's just a pointer
No, there is no addressing mode for x86 processors that goes two levels indirect.
You have to first load the pointer from a memory address and then load indirectly from the pointer.
An "m" constraint doesn't implicitly dereference anything. It's just like an "r" constraint, except it expands to an addressing mode for a memory location holding the value of the expression, instead of a register. (In C, every object has an address, although often that can be optimized away.)
The C object that's an input (or output for "=m") for the asm is the lvalue or rvalue you specify, e.g. "m"(var) takes the value of var, not *var. So you'd adding the pointer. (And telling the compiler that you want that input pointer value to be in memory, not a register.)
Perhaps it's confusing you that you have a pointer but you called it var, not ptr or something? A C pointer is an object whose value is an address, and can itself be stored in memory. If you were using C++, Word32 &var would make the dereference implicit whenever you write var.
In C terms, you're doing eax += ptr, but you want eax += *ptr, so you should write
void test_func(Word32 *ptr){
asm( "add %[input], %%eax"
: // no inputs. Probably you should use "+a"(add_to_this) if you want the add result, and remove the EAX clobber.
: [input] "m"(*ptr) // the pointed-to Word32 in memory
: "eax" // the instruction modifies EAX; tell the compiler about it
);
}
Compiling (Godbolt compiler explorer) results in:
# gcc -O3 -m32
test_func:
movl 4(%esp), %edx # compiler-generated load of the function arg
add (%edx), %eax # from asm template, (%edx) filled in as %[input] for *ptr
ret
Or if you'd compiled with -mregparm=3, or a 64-bit build, the arg would already be in a register. e.g. 64-bit GCC emits add (%rdi), %eax ; ret.
If you'd written return *ptr in C for a function returning Word32, with no inline asm, the asm would be similar, loading the pointer arg from the stack and then mov (%edx), %eax to load the return value. See the Godbolt link for that.
If inline asm isn't doing what you expect, look at the compiler generated asm to see how it filled in your template. That sometimes helps you figure out what the compiler thought you meant. (But only if you understand the basic design principles.)
If you write "m"(ptr), it compiles as follows:
void add_pointer(Word32 *ptr)
{
asm( "add %[input], %%eax" : : [input] "m"(ptr) : "eax" );
}
add_pointer:
add 4(%esp), %eax # ptr
ret
Very similar to if you wrote Word32 *bar(Word32 *ptr){ return ptr; }
Note that if you wanted to increment the memory location, you'd use a "+m"(*ptr) constraint to tell the compiler that the pointed-to memory is both an input and output. Or if you write-only to the memory, "=m"(*ptr) so it can potentially optimize away earlier dead stores to this memory location.
See also How can I indicate that the memory *pointed* to by an inline ASM argument may be used? to handle cases where you use an "r"(ptr) input and dereference the pointer manually inside the asm, accessing memory that you didn't tell the compiler about as being an input or output operand.
Generally avoid doing "r"(ptr) and then manually doing add (%0), %%eax. It needs extra constraints to make it safe, and it forces the compiler to materialize the exact address in a register, instead of using an addressing mode to reach it relative to some other register. e.g. 4(%ecx) if after inlining it sees that you're actually passing a pointer into an array or to a struct member.
Of course, generally avoid inline asm entirely unless you can't get the compiler to emit good enough asm without it. https://gcc.gnu.org/wiki/DontUseInlineAsm. If you do decide to use it, see https://stackoverflow.com/tags/inline-assembly/info for guides to avoid common mistakes.
Try
void test_func(Word32 *var){
asm( " mov %0, %%edx; \
addl (%%edx), %%eax" : : "m"(var) );
return;
}
Here is the function definition
const int& test_const_ref(const int& a) {
return a;
}
and calling it from main
int main() {
auto& x = test_const_ref(1);
printf("%d, %p\n", x, &x);
}
output as following
./debug/main
>>> 1, 0x7ffee237285c
and here is the disassembly code of test_const_ref
test_const_ref(int const&):
pushq %rbp
movq %rsp, %rbp
movq %rdi, -0x8(%rbp)
movq -0x8(%rbp), %rax
popq %rbp
retq
The question is: where does the variable x alias or where is the number 1 I passed to function test_const_ref stored ?
The code exhibits undefined behavior - the function test_const_ref returns a reference to a temporary, which lives until the end of the full-expression (the ;), and any dereference of it afterwards accesses a dangling reference.
Appearing to work is a common manifestation of UB. The program is still wrong. With optimization on, for example, Clang 12 -O2 prints: 0.
Note - there's no error in the function test_const_ref itself (apart from a design error). The UB is in main, where the dereference of the dangling int& happens during a call to printf.
Where the temporary int is stored exactly is implementation detail - but in many cases (in a Debug build, when a function isn't inlined), it would be stored on the stack:
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 12], 1 # Here the 1 is stored in the stack frame
lea rdi, [rbp - 12]
call test_const_ref(int const&)
mov qword ptr [rbp - 8], rax
mov rax, qword ptr [rbp - 8]
mov esi, dword ptr [rax]
mov rdx, qword ptr [rbp - 8]
movabs rdi, offset .L.str
mov al, 0
call printf
So any subsequent use of the returned reference will access memory at [rbp - 12], that may already have been re-used for other purposes.
Note also that the compiler doesn't actually generate assembly from C++ code; it merely uses the C++ code to understand the intent, and generates another program that produces the intended output. This is known as the as-if rule. In the presence of undefined behavior, the compiler becomes free from this restriction, and may generate any output, rendering the program meaningless.
Good answers are already been given, but wrapperm explained this topic very well in here. It's going to be stored on the stack in most implementations i'm aware of.
1. The function
The language doesn't define where arguments to functions are stored. Different ABIs, for different platforms, define this.
Typically, a function argument, before any optimization, is stored on the stack. A reference is no different in this respect. What's actually stored would be a pointer to the refered-to object. Think of it this way:
const int* test_const_ref(const int* a) {
return a;
}
2. The temporary
If you were to declare a variable int foo; and call test_const_ref(foo), you know that the result would refer to foo. Since you're calling it with a temporary, all bets are off: As #fabian notes in a comment, the language only guarantees the value exist until the end of the assignment statement. Afterwards
In practice, and in your case: A compiler which allocates stack space for the temporary integer 1 will have x refer to that place, and will not use it for something else before x is defined. But if your compiler optimizes that stack allocation away - e.g. passes 1 via a register - then x has nothing to refer to and may hold junk. It might even be undefined behavior (not quite sure about that). If you're lucky, you'll get a compiler warning about it (GodBolt.org).
I created a minimal C++ program:
int main() {
return 1234;
}
and compiled it with clang++5.0 with optimization disabled (the default -O0). The resulting assembly code is:
pushq %rbp
movq %rsp, %rbp
movl $1234, %eax # imm = 0x4D2
movl $0, -4(%rbp)
popq %rbp
retq
I understand most of the lines, but I do not understand the "movl $0, -4(%rbp)". It seems the program initializes some local variable to 0. Why?
What compiler-internal detail leads to this store that doesn't correspond to anything in the source?
TL;DR : In unoptimized code your CLANG++ set aside 4 bytes for the return value of main and set it to zero as per the C++(including C++11) standards. It generated the code for a main function that didn't need it. This is a side effect of not being optimized. Often an unoptimized compiler will generate code it may need, then doesn't end up needing it, and nothing is done to clean it up.
Because you are compiling with -O0 there is a very minimum of optimizations done on code (-O0 may remove dead code etc). Trying to understand artifacts in unoptimized code is usually a wasted exercise. The results of unoptimized code are extra loads and stores and other artifacts of raw code generation.
In this case main is special because in C99/C11 and C++ the standards effectively say that when reaching the outer block of main the default return value is 0. The C11 standard says:
5.1.2.2.3 Program termination
1 If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument;11) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
The C++11 standard says:
3.6.1 Main function
5) A return statement in main has the effect of leaving the main function (destroying any objects with automatic
storage duration) and calling std::exit with the return value as the argument. If control reaches the end
of main without encountering a return statement, the effect is that of executing
return 0;
In the version of CLANG++ you are using the unoptimized 64-bit code by default has the return value of 0 placed at dword ptr [rbp-4].
The problem is that your test code is a bit too trivial to see how this default return value comes in to play. Here is an example that should be a better demonstration:
int main() {
int a = 3;
if (a > 3) return 5678;
else if (a == 3) return 42;
}
This code has two exit explicit exit points via return 5678 and return 42; but there isn't a final return at the end of the function. If } is reached the default is to return 0. If we examine the godbolt output we see this:
main: # #main
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0 # Default return value of 0
mov dword ptr [rbp - 8], 3
cmp dword ptr [rbp - 8], 3 # Is a > 3
jle .LBB0_2
mov dword ptr [rbp - 4], 5678 # Set return value to 5678
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_2:
cmp dword ptr [rbp - 8], 3 # Is a == 3?
jne .LBB0_4
mov dword ptr [rbp - 4], 42 # Set return value to 42
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_4:
jmp .LBB0_5 # Extraneous unoptimized jump artifact
# This is common exit point of all the returns from `main`
.LBB0_5:
mov eax, dword ptr [rbp - 4] # Use return value from memory
pop rbp
ret
As one can see the compiler has generated a common exit point that sets the return value (EAX) from the stack address dword ptr [rbp-4]. At the beginning of the code dword ptr [rbp-4] is explicitly set to 0. In the simpler case, the unoptimized code still generates that instruction but goes unused.
If you build the code with the option -ffreestanding you should see the default return value for main no longer set to 0. This is because the requirement for a default return value of 0 from main applies to a hosted environment and not a freestanding one.
I created a minimal C++ program:
int main() {
return 1234;
}
and compiled it with clang++5.0 with optimization disabled (the default -O0). The resulting assembly code is:
pushq %rbp
movq %rsp, %rbp
movl $1234, %eax # imm = 0x4D2
movl $0, -4(%rbp)
popq %rbp
retq
I understand most of the lines, but I do not understand the "movl $0, -4(%rbp)". It seems the program initializes some local variable to 0. Why?
What compiler-internal detail leads to this store that doesn't correspond to anything in the source?
TL;DR : In unoptimized code your CLANG++ set aside 4 bytes for the return value of main and set it to zero as per the C++(including C++11) standards. It generated the code for a main function that didn't need it. This is a side effect of not being optimized. Often an unoptimized compiler will generate code it may need, then doesn't end up needing it, and nothing is done to clean it up.
Because you are compiling with -O0 there is a very minimum of optimizations done on code (-O0 may remove dead code etc). Trying to understand artifacts in unoptimized code is usually a wasted exercise. The results of unoptimized code are extra loads and stores and other artifacts of raw code generation.
In this case main is special because in C99/C11 and C++ the standards effectively say that when reaching the outer block of main the default return value is 0. The C11 standard says:
5.1.2.2.3 Program termination
1 If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument;11) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
The C++11 standard says:
3.6.1 Main function
5) A return statement in main has the effect of leaving the main function (destroying any objects with automatic
storage duration) and calling std::exit with the return value as the argument. If control reaches the end
of main without encountering a return statement, the effect is that of executing
return 0;
In the version of CLANG++ you are using the unoptimized 64-bit code by default has the return value of 0 placed at dword ptr [rbp-4].
The problem is that your test code is a bit too trivial to see how this default return value comes in to play. Here is an example that should be a better demonstration:
int main() {
int a = 3;
if (a > 3) return 5678;
else if (a == 3) return 42;
}
This code has two exit explicit exit points via return 5678 and return 42; but there isn't a final return at the end of the function. If } is reached the default is to return 0. If we examine the godbolt output we see this:
main: # #main
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0 # Default return value of 0
mov dword ptr [rbp - 8], 3
cmp dword ptr [rbp - 8], 3 # Is a > 3
jle .LBB0_2
mov dword ptr [rbp - 4], 5678 # Set return value to 5678
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_2:
cmp dword ptr [rbp - 8], 3 # Is a == 3?
jne .LBB0_4
mov dword ptr [rbp - 4], 42 # Set return value to 42
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_4:
jmp .LBB0_5 # Extraneous unoptimized jump artifact
# This is common exit point of all the returns from `main`
.LBB0_5:
mov eax, dword ptr [rbp - 4] # Use return value from memory
pop rbp
ret
As one can see the compiler has generated a common exit point that sets the return value (EAX) from the stack address dword ptr [rbp-4]. At the beginning of the code dword ptr [rbp-4] is explicitly set to 0. In the simpler case, the unoptimized code still generates that instruction but goes unused.
If you build the code with the option -ffreestanding you should see the default return value for main no longer set to 0. This is because the requirement for a default return value of 0 from main applies to a hosted environment and not a freestanding one.
I have this simple C++ code:
int testFunction(int* input, long length) {
int sum = 0;
for (long i = 0; i < length; ++i) {
sum += input[i];
}
return sum;
}
#include <stdlib.h>
#include <iostream>
using namespace std;
int main()
{
union{
int* input;
char* cinput;
};
size_t length = 1024;
input = new int[length];
//cinput++;
cout<<testFunction(input, length-1);
}
If I compile it with g++ 4.9.2 with -O3, it runs fine. I expected that if I uncomment the penultimate line it would run slower, however it outright crashes with SIGSEGV.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400754 in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x00000000004006e0 <+0>: sub $0x8,%rsp
0x00000000004006e4 <+4>: movabs $0x100000000,%rdi
0x00000000004006ee <+14>: callq 0x400690 <_Znam#plt>
0x00000000004006f3 <+19>: lea 0x1(%rax),%rdx
0x00000000004006f7 <+23>: and $0xf,%edx
0x00000000004006fa <+26>: shr $0x2,%rdx
0x00000000004006fe <+30>: neg %rdx
0x0000000000400701 <+33>: and $0x3,%edx
0x0000000000400704 <+36>: je 0x4007cc <main+236>
0x000000000040070a <+42>: cmp $0x1,%rdx
0x000000000040070e <+46>: mov 0x1(%rax),%esi
0x0000000000400711 <+49>: je 0x4007f1 <main+273>
0x0000000000400717 <+55>: add 0x5(%rax),%esi
0x000000000040071a <+58>: cmp $0x3,%rdx
0x000000000040071e <+62>: jne 0x4007e1 <main+257>
0x0000000000400724 <+68>: add 0x9(%rax),%esi
0x0000000000400727 <+71>: mov $0x3ffffffc,%r9d
0x000000000040072d <+77>: mov $0x3,%edi
0x0000000000400732 <+82>: mov $0x3fffffff,%r8d
0x0000000000400738 <+88>: sub %rdx,%r8
0x000000000040073b <+91>: pxor %xmm0,%xmm0
0x000000000040073f <+95>: lea 0x1(%rax,%rdx,4),%rcx
0x0000000000400744 <+100>: xor %edx,%edx
0x0000000000400746 <+102>: nopw %cs:0x0(%rax,%rax,1)
0x0000000000400750 <+112>: add $0x1,%rdx
=> 0x0000000000400754 <+116>: paddd (%rcx),%xmm0
0x0000000000400758 <+120>: add $0x10,%rcx
0x000000000040075c <+124>: cmp $0xffffffe,%rdx
0x0000000000400763 <+131>: jbe 0x400750 <main+112>
0x0000000000400765 <+133>: movdqa %xmm0,%xmm1
0x0000000000400769 <+137>: lea -0x3ffffffc(%r9),%rcx
---Type <return> to continue, or q <return> to quit---
Why does it crash? Is it a compiler bug? Am I causing some undefined behavior? Does the compiler expect that ints are always 4-byte-aligned?
I also tested it on clang and there's no crash.
Here's g++'s assembly output: http://pastebin.com/CJdCDCs4
The code input = new int[length]; cinput++; causes undefined behaviour because the second statement is reading from a union member that is not active.
Even ignoring that, testFunction(input, length-1) would again have undefined behaviour for the same reason.
Even ignoring that, the sum loop accesses an object through a glvalue of the wrong type, which has undefined behaviour.
Even ignoring that, reading from an uninitialized object, as your sum loop does, would again have undefined behaviour.
gcc has vectorized the loop with SSE instructions. paddd (like most SSE instructions) requires 16 byte alignment. I haven't looked at the code previous to paddd in detail but I expect that it assumes 4 byte alignment initially, iterates with scalar code (where misalignment only incurs a performance penalty, not a crash) until it can assume 16 byte alignment, then enters the SIMD loop, processing 4 ints at a time. By adding an offset of 1 byte you are breaking the precondition of 4 byte alignment for the array of ints, and after that all bets are off. If you're going to be doing nasty stuff with misaligned data (and I highly recommend you don't) then you should disable automatic vectorization (gcc -fno-tree-vectorize).
The instruction that crashed is paddd (you highlighted it). The name is short for "packed add doubleword" (see e.g. here) - it is a part of the SSE instruction set. These instructions require aligned pointers; for example, the link above has a description of exceptions that paddd may cause:
GP(0)
...(128-bit operations only)
If a memory operand is not aligned on a 16-byte boundary, regardless of segment.
This is exactly your case. The compiler arranged the code in such a way that it could use these fast 128-bit operations like paddd, and you subverted it with your union trick.
I can guess that code generated by clang doesn't use SSE, so it's not sensitive to alighnment. If so, it's also probably much slower (but you won't notice it with just 1024 iterations).