function call seems not working properly in disassembled code [duplicate] - c++

This question already has an answer here:
What are these seemingly-useless callq instructions in my x86 object files for?
(1 answer)
Closed 1 year ago.
I wrote a simple program and then compiled and assembled it.
tfc.cpp
int i = 0;
void f(int a)
{
i += a;
};
int main()
{
f(9);
return 0;
};
I got the tfc.o by running
$ g++ -c -O1 tfc.cpp
Then I use gobjdump (objdump) to disassemble the binary file.
$ gobjdump -d tfc.o
Then I got
0000000000000000 <__Z1fi>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 01 3d 00 00 00 00 add %edi,0x0(%rip) # a <__Z1fi+0xa>
a: 5d pop %rbp
b: c3 retq
c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000010 <_main>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: bf 09 00 00 00 mov $0x9,%edi
19: e8 00 00 00 00 callq 1e <_main+0xe>
1e: 31 c0 xor %eax,%eax
20: 5d pop %rbp
21: c3 retq
The weird thing happened, the callq instruction is followed by 1e <_main+0xe>. Shouldn't it be the address of <__Z1fi>? If not, how does the main function call the f function?
EDIT
FYI:
$ g++ -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin13.1.0
Thread model: posix

It calls address 0, which is the address of the f function.
e8 is the call instruction in x86 according to this:
http://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf
call uses the displacement relative to the next instruction, at memory location 1e. That becomes memory location 0. So it's callq 1e when in reality it's calling address 0.

Related

objdump gives the same output for object files generated with and without -fPIC

I have two files, a.h and a.cpp:
// a.h
extern "C" void a();
// a.cpp
#include "a.h"
#include <stdio.h>
void a()
{
printf("a\n");
}
I compiled this both with and without -fPIC, and then objdumped both.
Weirdly, I got the same output for both files. For a(), I get this in both cases:
callq 15 <a+0x15>
I also tried to compile object files with -no-pie, still no luck.
Compile your code (or anything) in verbose mode (-v), inspect the output,
and you will find:
Configured with: ... --enable-default-pie ...
which, since GCC 6, means the toolchain is built to compile PIC code and link
PIE executables by default.
To insist on a non-PIC compilation, run e.g.
g++ -Wall -c -fno-PIC -o anopic.o a.cpp
And to insist on a PIC compilation, run e.g.
g++ -Wall -c -fPIC -o apic.o a.cpp
Then run:
$ objdump -d anopic.o
anopic.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <a+0xe>
e: 90 nop
f: 5d pop %rbp
10: c3 retq
and:
$ objdump -d apic.o
apic.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <a+0xb>
b: e8 00 00 00 00 callq 10 <a+0x10>
10: 90 nop
11: 5d pop %rbp
12: c3 retq
and you will see the difference.
You can interleave the relocations with the assembly by:
$ objdump --reloc -d anopic.o
anopic.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: bf 00 00 00 00 mov $0x0,%edi
5: R_X86_64_32 .rodata
9: e8 00 00 00 00 callq e <a+0xe>
a: R_X86_64_PC32 puts-0x4
e: 90 nop
f: 5d pop %rbp
10: c3 retq
and:
$ objdump --reloc -d apic.o
apic.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <a+0xb>
7: R_X86_64_PC32 .rodata-0x4
b: e8 00 00 00 00 callq 10 <a+0x10>
c: R_X86_64_PLT32 puts-0x4
10: 90 nop
11: 5d pop %rbp
12: c3 retq
By default, objdump does not perform relocation processing. Try objdump --reloc instead.
In your case, the compiler and assembler produce a R_X86_64_PLT32 relocation. This is an position-independent relocation. It seems that your compiler defaults to generating PIE binaries. -no-pie is a linker flag, you need to use -fno-pie to change the compiler output. (In this particular case, it does not matter because the final result will be identical after the link editor has run.)

strange behavior when trying to compile a source with tcc against gcc generated .o file

I am trying to compile a source with tcc (ver 0.9.26) against a gcc-generated .o file, but it has strange behavior. The gcc (ver 5.3.0)is from MinGW 64 bit.
More specifically, I have the following two files (te1.c te2.c). I did the following commands on windows7 box
c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O elf64-x86-64 te1.o #this is needed because te1.o from previous step is in COFF format, tcc only understand ELF format
c:\tcc> tcc te2.c te1.o
c:\tcc> te2.exe
567in dummy!!!
Note that it cut off 4 bytes from the string 1234567in dummy!!!\n. Wonder if what could have gone wrong.
Thanks
Jin
========file te1.c===========
#include <stdio.h>
void dummy () {
printf1("1234567in dummy!!!\n");
}
========file te2.c===========
#include <stdio.h>
void printf1(char *p) {
printf("%s\n",p);
}
extern void dummy();
int main(int argc, char *argv[]) {
dummy();
return 0;
}
Update 1
Saw a difference in assembly between te1.o (te1.c compiled by tcc) and te1_gcc.o (te1.c compiled by gcc). In the tcc compiled, I saw lea -0x4(%rip),%rcx, on the gcc compiled, I saw lea 0x0(%rip),%rcx.
Not sure why.
C:\temp>objdump -d te1.o
te1.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 81 ec 20 00 00 00 sub $0x20,%rsp
b: 48 8d 0d fc ff ff ff lea -0x4(%rip),%rcx # e <dummy+0xe>
12: e8 fc ff ff ff callq 13 <dummy+0x13>
17: c9 leaveq
18: c3 retq
19: 00 00 add %al,(%rax)
1b: 00 01 add %al,(%rcx)
1d: 04 02 add $0x2,%al
1f: 05 04 03 01 50 add $0x50010304,%eax
C:\temp>objdump -d te1_gcc.o
te1_gcc.o: file format pe-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 20 sub $0x20,%rsp
8: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx # f <dummy+0xf>
f: e8 00 00 00 00 callq 14 <dummy+0x14>
14: 90 nop
15: 48 83 c4 20 add $0x20,%rsp
19: 5d pop %rbp
1a: c3 retq
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
Update2
Using a binary editor, I changed the machine code in te1.o (produced by gcc) and changed lea 0(%rip),%rcx to lea -0x4(%rip),%rcx and using the tcc to link it, the resulted exe works fine.
More precisely, I did
c:\tcc> gcc -c te1.c
c:\tcc> objcopy -O elf64-x86-64 te1.o
c:\tcc> use a binary editor to the change the bytes from (48 8d 0d 00 00 00 00) to (48 8d 0d fc ff ff ff)
c:\tcc> tcc te2.c te1.o
c:\tcc> te2
1234567in dummy!!!
Update 3
As requested, here is the output of objdump -r te1.o
C:\temp>gcc -c te1.c
C:\temp>objdump -r te1.o
te1.o: file format pe-x86-64
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000000b R_X86_64_PC32 .rdata
0000000000000010 R_X86_64_PC32 printf1
RELOCATION RECORDS FOR [.pdata]:
OFFSET TYPE VALUE
0000000000000000 rva32 .text
0000000000000004 rva32 .text
0000000000000008 rva32 .xdata
C:\temp>objdump -d te1.o
te1.o: file format pe-x86-64
Disassembly of section .text:
0000000000000000 <dummy>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 20 sub $0x20,%rsp
8: 48 8d 0d 00 00 00 00 lea 0x0(%rip),%rcx # f <dummy+0xf>
f: e8 00 00 00 00 callq 14 <dummy+0x14>
14: 90 nop
15: 48 83 c4 20 add $0x20,%rsp
19: 5d pop %rbp
1a: c3 retq
1b: 90 nop
1c: 90 nop
1d: 90 nop
1e: 90 nop
1f: 90 nop
Has nothing to do with tcc or calling conventions. It has to do with different linker conventions for elf64-x86-64 and pe-x86-64 formats.
With PE, the linker will subtract 4 implicitly to calculate the final offset.
With ELF, it does not do this. Because of this, 0 is the correct initial value for PE, and -4 is correct for ELF.
Unfortunately, objcopy does not convert this -> bug in objcopy.
add
extern void printf1(char *p);
to your te1.c file
Or: the compiler will assume argument 32 bit integer since there's no prototype, and pointers are 64-bit long.
Edit: this is still not working. I found out that the function never returns (since calling the printf1 a second time does nothing!). Seems that the 4 first bytes are consumed as return address or something like that. In gcc 32-bit mode it works fine.
Sounds like a calling convention problem to me but still cannot figure it out.
Another clue: calling printf from te1.c side (gcc, using tcc stdlib bindings) crashes with segv.
I disassembled the executable. First part is repeated call from tcc side
40104f: 48 8d 05 b3 0f 00 00 lea 0xfb3(%rip),%rax # 0x402009
401056: 48 89 45 f8 mov %rax,-0x8(%rbp)
40105a: 48 8b 4d f8 mov -0x8(%rbp),%rcx
40105e: e8 9d ff ff ff callq 0x401000
401063: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401067: e8 94 ff ff ff callq 0x401000
40106c: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401070: e8 8b ff ff ff callq 0x401000
401075: 48 8b 4d f8 mov -0x8(%rbp),%rcx
401079: e8 82 ff ff ff callq 0x401000
40107e: e8 0d 00 00 00 callq 0x401090
401083: b8 00 00 00 00 mov $0x0,%eax
401088: e9 00 00 00 00 jmpq 0x40108d
40108d: c9 leaveq
40108e: c3 retq
Second part is repeated (6 times) call to the same function. As you can see the address is different (shifted by 4 bytes, like your data) !!! It kind of works just once because the 4 first instructions are the following:
401000: 55 push %rbp
401001: 48 89 e5 mov %rsp,%rbp
so stack is destroyed if those are skipped!!
40109f: 48 89 45 f8 mov %rax,-0x8(%rbp)
4010a3: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010a7: 48 89 c1 mov %rax,%rcx
4010aa: e8 55 ff ff ff callq 0x401004
4010af: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010b3: 48 89 c1 mov %rax,%rcx
4010b6: e8 49 ff ff ff callq 0x401004
4010bb: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010bf: 48 89 c1 mov %rax,%rcx
4010c2: e8 3d ff ff ff callq 0x401004
4010c7: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010cb: 48 89 c1 mov %rax,%rcx
4010ce: e8 31 ff ff ff callq 0x401004
4010d3: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010d7: 48 89 c1 mov %rax,%rcx
4010da: e8 25 ff ff ff callq 0x401004
4010df: 48 8b 45 f8 mov -0x8(%rbp),%rax
4010e3: 48 89 c1 mov %rax,%rcx
4010e6: e8 19 ff ff ff callq 0x401004
4010eb: 90 nop

odd compiled code

I've compiled some Qt code with google's nacl compiler, but the ncval validator does not grok it. One example among many:
src/corelib/animation/qabstractanimation.cpp:165
Here's the relevant code:
#define Q_GLOBAL_STATIC(TYPE, NAME) \
static TYPE *NAME() \
{ \
static TYPE thisVariable; \
static QGlobalStatic<TYPE > thisGlobalStatic(&thisVariable); \
return thisGlobalStatic.pointer; \
}
#ifndef QT_NO_THREAD
Q_GLOBAL_STATIC(QThreadStorage<QUnifiedTimer *>, unifiedTimer)
#endif
which compiles to:
00000480 <_ZL12unifiedTimerv>:
480: 55 push %ebp
481: 89 e5 mov %esp,%ebp
483: 57 push %edi
484: 56 push %esi
485: 53 push %ebx
486: 83 ec 2c sub $0x2c,%esp
489: c7 04 24 28 00 2e 10 movl $0x102e0028,(%esp)
490: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
494: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
49b: e8 fc ff ff ff call 49c <_ZL12unifiedTimerv+0x1c>
4a0: 84 c0 test %al,%al
4a2: 74 1c je 4c0 <_ZL12unifiedTimerv+0x40>
4a4: 0f b6 05 2c 00 2e 10 movzbl 0x102e002c,%eax
4ab: 83 f0 01 xor $0x1,%eax
4ae: 84 c0 test %al,%al
4b0: 74 0e je 4c0 <_ZL12unifiedTimerv+0x40>
4b2: b8 01 00 00 00 mov $0x1,%eax
4b7: eb 27 jmp 4e0 <_ZL12unifiedTimerv+0x60>
4b9: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
4c0: b8 00 00 00 00 mov $0x0,%eax
4c5: eb 19 jmp 4e0 <_ZL12unifiedTimerv+0x60>
4c7: 90 nop
4c8: 90 nop
4c9: 90 nop
4ca: 90 nop
4cb: 90 nop
Check the call instruction at 49b: it is what the validator cannot grok. What on earth could induce the compiler to issue an instruction that calls into the middle of itself? Is there a way around this? I've compiled with -g -O0 -fno-inline. Compiler bug?
Presumably it's really a call to an external symbol, which will get filled in at link time. Actually what will get called is externalSymbol-4, which is a bit strange -- perhaps this is what is throwing the ncval validator off the scent.
Is this a dynamic library or a static object that is not linked to an executable yet?
In a dynamic library this likely came out because the code was built as position-dependent and linked into a dynamic library. Try "objdump -d -r -R" on it, if you see TEXTREL, that is the case. TEXTREL is not supported in NaCl dynamic linking stories. (solved by having -fPIC flag during compilation of the code)
With a static object try to validate after it was linked into a static executable.

G++ 4.6 -std=gnu++0x: Static Local Variable Constructor Call Timing and Thread Safety

void a() { ... }
void b() { ... }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
...
}
Assume f is called multiple times from various threads (potentially contended) after the entry of main. (and of course that the only calls to a and b are those seen above)
When the above code is compiled with gcc g++ 4.6 in -std=gnu++0x mode:
Q1. Is it guaranteed that a() will be called at least once and return before b() is called? That is to ask, on the first call to f(), is the constructor of x called at the same time an automatic duration local variable (non-static) would be (and not at global static initialization time for example)?
Q2. Is it guaranteed that b() will be called exactly once? Even if two threads execute f for the first time at the same time on different cores? If yes, by which specific mechanism does the GCC generated code provide synchronization? Edit: Additionally could one of the threads calling f() obtain access to x before the constructor of X returns?
Update: I am trying to compile an example and decompile to investigate mechanism...
test.cpp:
struct X;
void ext1(int x);
void ext2(X& x);
void a() { ext1(1); }
void b() { ext1(2); }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
ext2(x);
}
Then:
$ g++ -std=gnu++0x -c -o test.o ./test.cpp
$ objdump -d test.o -M intel > test.dump
test.dump:
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z1av>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: bf 01 00 00 00 mov edi,0x1
9: e8 00 00 00 00 call e <_Z1av+0xe>
e: 5d pop rbp
f: c3 ret
0000000000000010 <_Z1bv>:
10: 55 push rbp
11: 48 89 e5 mov rbp,rsp
14: bf 02 00 00 00 mov edi,0x2
19: e8 00 00 00 00 call 1e <_Z1bv+0xe>
1e: 5d pop rbp
1f: c3 ret
0000000000000020 <_Z1fv>:
20: 55 push rbp
21: 48 89 e5 mov rbp,rsp
24: 41 54 push r12
26: 53 push rbx
27: e8 00 00 00 00 call 2c <_Z1fv+0xc>
2c: b8 00 00 00 00 mov eax,0x0
31: 0f b6 00 movzx eax,BYTE PTR [rax]
34: 84 c0 test al,al
36: 75 2d jne 65 <_Z1fv+0x45>
38: bf 00 00 00 00 mov edi,0x0
3d: e8 00 00 00 00 call 42 <_Z1fv+0x22>
42: 85 c0 test eax,eax
44: 0f 95 c0 setne al
47: 84 c0 test al,al
49: 74 1a je 65 <_Z1fv+0x45>
4b: 41 bc 00 00 00 00 mov r12d,0x0
51: bf 00 00 00 00 mov edi,0x0
56: e8 00 00 00 00 call 5b <_Z1fv+0x3b>
5b: bf 00 00 00 00 mov edi,0x0
60: e8 00 00 00 00 call 65 <_Z1fv+0x45>
65: bf 00 00 00 00 mov edi,0x0
6a: e8 00 00 00 00 call 6f <_Z1fv+0x4f>
6f: 5b pop rbx
70: 41 5c pop r12
72: 5d pop rbp
73: c3 ret
74: 48 89 c3 mov rbx,rax
77: 45 84 e4 test r12b,r12b
7a: 75 0a jne 86 <_Z1fv+0x66>
7c: bf 00 00 00 00 mov edi,0x0
81: e8 00 00 00 00 call 86 <_Z1fv+0x66>
86: 48 89 d8 mov rax,rbx
89: 48 89 c7 mov rdi,rax
8c: e8 00 00 00 00 call 91 <_Z1fv+0x71>
Disassembly of section .text._ZN1XC2Ev:
0000000000000000 <_ZN1XC1Ev>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 10 sub rsp,0x10
8: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
c: e8 00 00 00 00 call 11 <_ZN1XC1Ev+0x11>
11: c9 leave
12: c3 ret
I don't see the synchronization mechanism? Or is it added at linktime?
Update2: Ok when I link it I can see it...
400973: 84 c0 test %al,%al
400975: 75 2d jne 4009a4 <_Z1fv+0x45>
400977: bf 98 20 40 00 mov $0x402098,%edi
40097c: e8 1f fe ff ff callq 4007a0 <__cxa_guard_acquire#plt>
400981: 85 c0 test %eax,%eax
400983: 0f 95 c0 setne %al
400986: 84 c0 test %al,%al
400988: 74 1a je 4009a4 <_Z1fv+0x45>
40098a: 41 bc 00 00 00 00 mov $0x0,%r12d
400990: bf a0 20 40 00 mov $0x4020a0,%edi
400995: e8 a6 00 00 00 callq 400a40 <_ZN1XC1Ev>
40099a: bf 98 20 40 00 mov $0x402098,%edi
40099f: e8 0c fe ff ff callq 4007b0 <__cxa_guard_release#plt>
4009a4: bf a0 20 40 00 mov $0x4020a0,%edi
4009a9: e8 72 ff ff ff callq 400920 <_Z4ext2R1X>
4009ae: 5b pop %rbx
4009af: 41 5c pop %r12
4009b1: 5d pop %rbp
It surrounds it with __cxa_guard_acquire and __cxa_guard_release, whatever they do.
Q1. Yes. According to C++11, 6.7/4:
such a variable is initialized the first time control passes through its declaration
so it will be initialised after the first call to a().
Q2. Under GCC, and any compiler that supports the C++11 thread model: yes, initialisation of local static variables is thread safe. Other compilers might not give that guarantee. The exact mechanism is an implementation detail. I believe GCC uses an atomic flag to indicate whether it's initialised, and a mutex to protect initialisation when the flag is not set, but I could be wrong. Certainly, this thread implies that it was originally implemented like that.
UPDATE: your code does indeed contain the initialisation code. You can see it more clearly if you link it, and then disassemble the program, so that you can see which functions are being called. I also used objdump -SC to interleave the source and demangle C++ names. It uses internal locking functions __cxa_guard_acquire and __cxa_guard_release, to make sure only one thread executes the initialisation code.
#void f()
#{
400724: push rbp
400725: mov rbp,rsp
400728: push r13
40072a: push r12
40072c: push rbx
40072d: sub rsp,0x8
# a();
400731: call 400704 <a()>
# static X x;
# if (!guard) {
400736: mov eax,0x601050
40073b: movzx eax,BYTE PTR [rax]
40073e: test al,al
400740: jne 400792 <f()+0x6e>
# if (__cxa_guard_acquire(&guard)) {
400742: mov edi,0x601050
400747: call 4005c0 <__cxa_guard_acquire#plt>
40074c: test eax,eax
40074e: setne al
400751: test al,al
400753: je 400792 <f()+0x6e>
# // initialise x
400755: mov ebx,0x0
40075a: mov edi,0x601058
40075f: call 4007b2 <X::X()>
# __cxa_guard_release(&guard);
400764: mov edi,0x601050
400769: call 4005e0 <__cxa_guard_release#plt>
# } else {
40076e: jmp 400792 <f()+0x6e>
# // already initialised
400770: mov r12d,edx
400773: mov r13,rax
400776: test bl,bl
400778: jne 400784 <f()+0x60>
40077a: mov edi,0x601050
40077f: call 4005f0 <__cxa_guard_abort#plt>
400784: mov rax,r13
400787: movsxd rdx,r12d
40078a: mov rdi,rax
40078d: 400610 <_Unwind_Resume#plt>
# }
# }
# ext2(x);
400792: mov edi,0x601058
400797: call 4007d1 <_Z4ext2R1X>
#}
As far as I know it is guaranteed that b is only called once. However, it is not guaranteed that the initialisation is performed thread safe, which means another thread could potentially work with a half/not initialized x. (That's kind of funny because static mutexes are basicly useless this way.)

How to enable optimization in G++ with #pragma

I want to enable optimization in g++ without command line parameter.
I know GCC can do it by writing #pragma GCC optimize (2) in my code.
But it seems won't work in G++.
This page may help: http://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html
My compiler version:
$ g++ --version
g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
<suppressed copyright message>
$ gcc --version
gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1
<suppressed copyright message>
I worte some code like this:
#pragma GCC optimize (2)
int main(){
long x;
x=11;
x+=12;
x*=13;
x/=14;
return 0;
}
And compiled it with GCC Not G++. Then I used objdump, which output
08048300 <main>:
8048300: 55 push %ebp
8048301: 31 c0 xor %eax,%eax
8048303: 89 e5 mov %esp,%ebp
8048305: 5d pop %ebp
8048306: c3 ret
8048307: 90 nop
When I removed #param GCC optimize(2) . objdump output:
080483b4 <main>:
80483b4: 55 push %ebp
80483b5: 89 e5 mov %esp,%ebp
80483b7: 83 ec 10 sub $0x10,%esp
80483ba: c7 45 fc 0b 00 00 00 movl $0xb,-0x4(%ebp)
80483c1: 83 45 fc 0c addl $0xc,-0x4(%ebp)
80483c5: 8b 55 fc mov -0x4(%ebp),%edx
80483c8: 89 d0 mov %edx,%eax
80483ca: 01 c0 add %eax,%eax
80483cc: 01 d0 add %edx,%eax
80483ce: c1 e0 02 shl $0x2,%eax
80483d1: 01 d0 add %edx,%eax
80483d3: 89 45 fc mov %eax,-0x4(%ebp)
80483d6: 8b 4d fc mov -0x4(%ebp),%ecx
80483d9: ba 93 24 49 92 mov $0x92492493,%edx
80483de: 89 c8 mov %ecx,%eax
80483e0: f7 ea imul %edx
80483e2: 8d 04 0a lea (%edx,%ecx,1),%eax
80483e5: 89 c2 mov %eax,%edx
80483e7: c1 fa 03 sar $0x3,%edx
80483ea: 89 c8 mov %ecx,%eax
80483ec: c1 f8 1f sar $0x1f,%eax
80483ef: 89 d1 mov %edx,%ecx
80483f1: 29 c1 sub %eax,%ecx
80483f3: 89 c8 mov %ecx,%eax
80483f5: 89 45 fc mov %eax,-0x4(%ebp)
80483f8: b8 00 00 00 00 mov $0x0,%eax
80483fd: c9 leave
80483fe: c3 ret
80483ff: 90 nop
However, it won't work with G++!
This appears to be a bug in g++ (Bug 48026, references another related issue.)
As a workaround, you can mark each function with __attribute__((optimize("whatever"))). Not great.
int main() __attribute__((optimize("-O2")));
int main()
{
long x;
x=11;
x+=12;
x*=13;
x/=14;
return 0;
}
$ g++ -Wall -c t.c
$ objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text.startup:
0000000000000000 <main>:
0: 55 push %rbp
1: 31 c0 xor %eax,%eax
3: 48 89 e5 mov %rsp,%rbp
6: 5d pop %rbp
7: c3 retq