(gdb) n
134 a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
(gdb) n
(gdb) p a
$30 = <value optimized out>
(gdb) p b
$31 = <value optimized out>
(gdb) p c
$32 = 3735928563
How can gdb optimize out my value??
It means you compiled with e.g. gcc -O3 and the gcc optimiser found that some of your variables were redundant in some way that allowed them to be optimised away. In this particular case you appear to have three variables a, b, c with the same value and presumably they can all be aliassed to a single variable. Compile with optimisation disabled, e.g. gcc -O0, if you want to see such variables (this is generally a good idea for debug builds in any case).
Minimal runnable example with disassembly analysis
As usual, I like to see some disassembly to get a better understanding of what is going on.
In this case, the insight we obtain is that if a variable is optimized to be stored only in a register rather than the stack, and then the register it was in gets overwritten, then it shows as <optimized out> as mentioned by R..
Of course, this can only happen if the variable in question is not needed anymore, otherwise the program would lose its value. Therefore it tends to happen that at the start of the function you can see the variable value, but then at the end it becomes <optimized out>.
One typical case which we often are interested in of this is that of the function arguments themselves, since these are:
always defined at the start of the function
may not get used towards the end of the function as more intermediate values are calculated.
tend to get overwritten by further function subcalls which must setup the exact same registers to satisfy the calling convention
This understanding actually has a concrete application: when using reverse debugging, you might be able to recover the value of variables of interest simply by stepping back to their last point of usage: How do I view the value of an <optimized out> variable in C++?
main.c
#include <stdio.h>
int __attribute__((noinline)) f3(int i) {
return i + 1;
}
int __attribute__((noinline)) f2(int i) {
return f3(i) + 1;
}
int __attribute__((noinline)) f1(int i) {
int j = 1, k = 2, l = 3;
i += 1;
j += f2(i);
k += f2(j);
l += f2(k);
return l;
}
int main(int argc, char *argv[]) {
printf("%d\n", f1(argc));
return 0;
}
Compile and run:
gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
gdb -q -nh main.out
Then inside GDB, we have the following session:
Breakpoint 1, f1 (i=1) at main.c:13
13 i += 1;
(gdb) disas
Dump of assembler code for function f1:
=> 0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$1 = 1
(gdb) p j
$2 = 1
(gdb) n
14 j += f2(i);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
=> 0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$3 = 2
(gdb) p j
$4 = 1
(gdb) n
15 k += f2(j);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
=> 0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$5 = <optimized out>
(gdb) p j
$6 = 5
(gdb) n
16 l += f2(k);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
=> 0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$7 = <optimized out>
(gdb) p j
$8 = <optimized out>
To understand what is going on, remember from the x86 Linux calling convention: What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 you should know that:
RDI contains the first argument
RDI can get destroyed in function calls
RAX contains the return value
From this we deduce that:
add $0x1,%edi
corresponds to the:
i += 1;
since i is the first argument of f1, and therefore stored in RDI.
Now, while we were at both:
i += 1;
j += f2(i);
the value of RDI hadn't been modified, and therefore GDB could just query it at anytime in those lines.
However, as soon as the f2 call is made:
the value of i is not needed anymore in the program
lea 0x1(%rax),%edi does EDI = j + RAX + 1, which both:
initializes j = 1
sets up the first argument of the next f2 call to RDI = j
Therefore, when the following line is reached:
k += f2(j);
both of the following instructions have/may have modified RDI, which is the only place i was being stored (f2 may use it as a scratch register, and lea definitely set it to RAX + 1):
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
and so RDI does not contain the value of i anymore. In fact, the value of i was completely lost! Therefore the only possible outcome is:
$3 = <optimized out>
A similar thing happens to the value of j, although j only becomes unnecessary one line later afer the call to k += f2(j);.
Thinking about j also gives us some insight on how smart GDB is. Notably, at i += 1;, the value of j had not yet materialized in any register or memory address, and GDB must have known it based solely on debug information metadata.
-O0 analysis
If we use -O0 instead of -O3 for compilation:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
then the disassembly would look like:
11 int __attribute__((noinline)) f1(int i) {
=> 0x0000555555554673 <+0>: 55 push %rbp
0x0000555555554674 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000555555554677 <+4>: 48 83 ec 18 sub $0x18,%rsp
0x000055555555467b <+8>: 89 7d ec mov %edi,-0x14(%rbp)
12 int j = 1, k = 2, l = 3;
0x000055555555467e <+11>: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
0x0000555555554685 <+18>: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
0x000055555555468c <+25>: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
13 i += 1;
0x0000555555554693 <+32>: 83 45 ec 01 addl $0x1,-0x14(%rbp)
14 j += f2(i);
0x0000555555554697 <+36>: 8b 45 ec mov -0x14(%rbp),%eax
0x000055555555469a <+39>: 89 c7 mov %eax,%edi
0x000055555555469c <+41>: e8 b8 ff ff ff callq 0x555555554659 <f2>
0x00005555555546a1 <+46>: 01 45 f4 add %eax,-0xc(%rbp)
15 k += f2(j);
0x00005555555546a4 <+49>: 8b 45 f4 mov -0xc(%rbp),%eax
0x00005555555546a7 <+52>: 89 c7 mov %eax,%edi
0x00005555555546a9 <+54>: e8 ab ff ff ff callq 0x555555554659 <f2>
0x00005555555546ae <+59>: 01 45 f8 add %eax,-0x8(%rbp)
16 l += f2(k);
0x00005555555546b1 <+62>: 8b 45 f8 mov -0x8(%rbp),%eax
0x00005555555546b4 <+65>: 89 c7 mov %eax,%edi
0x00005555555546b6 <+67>: e8 9e ff ff ff callq 0x555555554659 <f2>
0x00005555555546bb <+72>: 01 45 fc add %eax,-0x4(%rbp)
17 return l;
0x00005555555546be <+75>: 8b 45 fc mov -0x4(%rbp),%eax
18 }
0x00005555555546c1 <+78>: c9 leaveq
0x00005555555546c2 <+79>: c3 retq
From this horrendous disassembly, we see that the value of RDI is moved to the stack at the very start of program execution at:
mov %edi,-0x14(%rbp)
and it then gets retrieved from memory into registers whenever needed, e.g. at:
14 j += f2(i);
0x0000555555554697 <+36>: 8b 45 ec mov -0x14(%rbp),%eax
0x000055555555469a <+39>: 89 c7 mov %eax,%edi
0x000055555555469c <+41>: e8 b8 ff ff ff callq 0x555555554659 <f2>
0x00005555555546a1 <+46>: 01 45 f4 add %eax,-0xc(%rbp)
The same basically happens to j which gets immediately pushed to the stack when when it is initialized:
0x000055555555467e <+11>: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
Therefore, it is easy for GDB to find the values of those variables at any time: they are always present in memory!
This also gives us some insight on why it is not possible to avoid <optimized out> in optimized code: since the number of registers is limited, the only way to do that would be to actually push unneeded registers to memory, which would partly defeat the benefit of -O3.
Extend the lifetime of i
If we edited f1 to return l + i as in:
int __attribute__((noinline)) f1(int i) {
int j = 1, k = 2, l = 3;
i += 1;
j += f2(i);
k += f2(j);
l += f2(k);
return l + i;
}
then we observe that this effectively extends the visibility of i until the end of the function.
This is because with this we force GCC to use an extra variable to keep i around until the end:
0x00005555555546c0 <+0>: lea 0x1(%rdi),%edx
0x00005555555546c3 <+3>: mov %edx,%edi
0x00005555555546c5 <+5>: callq 0x5555555546b0 <f2>
0x00005555555546ca <+10>: lea 0x1(%rax),%edi
0x00005555555546cd <+13>: callq 0x5555555546b0 <f2>
0x00005555555546d2 <+18>: lea 0x2(%rax),%edi
0x00005555555546d5 <+21>: callq 0x5555555546b0 <f2>
0x00005555555546da <+26>: lea 0x3(%rdx,%rax,1),%eax
0x00005555555546de <+30>: retq
which the compiler does by storing i += i in RDX at the very first instruction.
Tested in Ubuntu 18.04, GCC 7.4.0, GDB 8.1.0.
It didn't. Your compiler did, but there's still a debug symbol for the original variable name.
From https://idlebox.net/2010/apidocs/gdb-7.0.zip/gdb_9.html
The values of arguments that were not saved in their stack frames are shown as `value optimized out'.
I'm guessing you compiled with -O(somevalue) and are accessing variables a,b,c in a function where optimization has occurred.
You need to turn off the compiler optimisation.
If you are interested in a particular variable in gdb, you can delare the variable as "volatile" and recompile the code. This will make the compiler turn off compiler optimization for that variable.
volatile int quantity = 0;
Just run "export COPTS='-g -O0';" and rebuild your code. After rebuild, debug it using gdb. You'll not see such error. Thanks.
Related
This question/answer on SO shows how to use GDB to change a value in memory, but in the example given, it chooses an address to set the value that wasn't previously being used
For example, to change the return value to 22, the author does
set {unsigned char}0x00000000004004b9 = 22
However, why would this address 0x00000000004004b9 be the address to change? If you look at the output of disas/r the address 0x00000000004004b9 isn't being used, so why use this one to set to 22? I'm trying to understand how to know which address needs to be changed to (in this example) change the return value, if the output of disas/r doesn't show it.
code
$ cat t.c
int main()
{
return 42;
}
$ gcc t.c && ./a.out; echo $?
42
$ gdb --write -q ./a.out
(gdb) disas/r main
Dump of assembler code for function main:
0x00000000004004b4 <+0>: 55 push %rbp
0x00000000004004b5 <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004004b8 <+4>: b8 2a 00 00 00 mov $0x2a,%eax
0x00000000004004bd <+9>: 5d pop %rbp
0x00000000004004be <+10>: c3 retq
End of assembler dump.
(gdb) set {unsigned char}0x00000000004004b9 = 22
(gdb) disas/r main
Dump of assembler code for function main:
0x00000000004004b4 <+0>: 55 push %rbp
0x00000000004004b5 <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004004b8 <+4>: b8 16 00 00 00 mov $0x16,%eax <<< ---changed
0x00000000004004bd <+9>: 5d pop %rbp
0x00000000004004be <+10>: c3 retq
End of assembler dump.
(gdb) q
$ ./a.out; echo $?
22 <<<--- Just as desired
I'm trying to understand how to know which address needs to be changed to (in this example) change the return value, if the output of disas/r doesn't show it.
To understand this, you need to understand instruction encoding. The instruction here is "move immediate 32-bit constant to register". The constant is part of the instruction (that's what "immediate" means). It may be helpful to compile this instead:
int foo() { return 0x41424344; }
int bar() { return 0x45464748; }
int main() { return foo() + bar(); }
When you do compile it, you should see something similar to:
(gdb) disas/r foo
Dump of assembler code for function foo:
0x00000000004004ed <+0>: 55 push %rbp
0x00000000004004ee <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004004f1 <+4>: b8 44 43 42 41 mov $0x41424344,%eax
0x00000000004004f6 <+9>: 5d pop %rbp
0x00000000004004f7 <+10>: c3 retq
End of assembler dump.
(gdb) disas/r bar
Dump of assembler code for function bar:
0x00000000004004f8 <+0>: 55 push %rbp
0x00000000004004f9 <+1>: 48 89 e5 mov %rsp,%rbp
0x00000000004004fc <+4>: b8 48 47 46 45 mov $0x45464748,%eax
0x0000000000400501 <+9>: 5d pop %rbp
0x0000000000400502 <+10>: c3 retq
End of assembler dump.
Now you can clearly see where in the instruction stream each byte of the immediate constant resides (and also that x86 uses little-endian encoding for them).
The standard reference on instruction encoding for x86 is Intel instruction set reference. You can find 0xB8 instruction on page 3-528.
Background:
I am new to assembly. When I was learning programming, I made a program that implements multiplication tables up to 1000 * 1000. The tables are formatted so that each answer is on the line factor1 << 10 | factor2 (I know, I know, it's isn't pretty). These tables are then loaded into an array: int* tables. Empty lines are filled with 0. Here is a link to the file for the tables (7.3 MB). I know using assembly won't speed up this by much, but I just wanted to do it for fun (and a bit of practice).
Question:
I'm trying to convert this code into inline assembly (tables is a global):
int answer;
// ...
answer = tables [factor1 << 10 | factor2];
This is what I came up with:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
My C++ code works fine, but my assembly fails. What is wrong with my assembly (especially the movl _tables(,%2,4), %0; part), compared to my C++
What I have done to solve it:
I used some random numbers: 89 796 as factor1 and factor2. I know that there is an element at 89 << 10 | 786 (which is 91922) – verified this with C++. When I run it with gdb, I get a SIGSEGV:
Program received signal SIGSEGV, Segmentation fault.
at this line:
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
I added two methods around my asm, which is how I know where the asm block is in the disassembly.
Disassembly of my asm block:
The disassembly from objdump -M att -d looks fine (although I'm not sure, I'm new to assembly, as I said):
402096: 8b 45 08 mov 0x8(%ebp),%eax
402099: 8b 55 0c mov 0xc(%ebp),%edx
40209c: c1 e0 0a shl $0xa,%eax
40209f: 09 c2 or %eax,%edx
4020a1: 8b 04 95 18 e0 47 00 mov 0x47e018(,%edx,4),%eax
4020a8: 89 45 ec mov %eax,-0x14(%ebp)
The disassembly from objdump -M intel -d:
402096: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
402099: 8b 55 0c mov edx,DWORD PTR [ebp+0xc]
40209c: c1 e0 0a shl eax,0xa
40209f: 09 c2 or edx,eax
4020a1: 8b 04 95 18 e0 47 00 mov eax,DWORD PTR [edx*4+0x47e018]
4020a8: 89 45 ec mov DWORD PTR [ebp-0x14],eax
From what I understand, it's moving the first parameter of my void calc ( int factor1, int factor2 ) function into eax. Then it's moving the second parameter into edx. Then it shifts eax to the left by 10 and ors it with edx. A 32-bit integer is 4 bytes, so [edx*4+base_address]. Move result to eax and then put eax into int answer (which, I'm guessing is on -0x14 of the stack). I don't really see much of a problem.
Disassembly of the compiler's .exe:
When I replace the asm block with plain C++ (answer = tables [factor1 << 10 | factor2];) and disassemble it this is what I get in Intel syntax:
402096: a1 18 e0 47 00 mov eax,ds:0x47e018
40209b: 8b 55 08 mov edx,DWORD PTR [ebp+0x8]
40209e: c1 e2 0a shl edx,0xa
4020a1: 0b 55 0c or edx,DWORD PTR [ebp+0xc]
4020a4: c1 e2 02 shl edx,0x2
4020a7: 01 d0 add eax,edx
4020a9: 8b 00 mov eax,DWORD PTR [eax]
4020ab: 89 45 ec mov DWORD PTR [ebp-0x14],eax
AT&T syntax:
402096: a1 18 e0 47 00 mov 0x47e018,%eax
40209b: 8b 55 08 mov 0x8(%ebp),%edx
40209e: c1 e2 0a shl $0xa,%edx
4020a1: 0b 55 0c or 0xc(%ebp),%edx
4020a4: c1 e2 02 shl $0x2,%edx
4020a7: 01 d0 add %edx,%eax
4020a9: 8b 00 mov (%eax),%eax
4020ab: 89 45 ec mov %eax,-0x14(%ebp)
I am not really familiar with the Intel syntax, so I am just going to try and understand the AT&T syntax:
It first moves the base address of the tables array into %eax. Then, is moves the first parameter into %edx. It shifts %edx to the left by 10 then ors it with the second parameter. Then, by shifting %edx to the left by two, it actually multiplies %edx by 4. Then, it adds that to %eax (the base address of the array). So, basically it just did this: [edx*4+0x47e018] (Intel syntax) or 0x47e018(,%edx,4) AT&T. It moves the value of the element it got into %eax and puts it into int answer. This method is more "expanded", but it does the same thing as my hand-written assembly! So why is mine giving a SIGSEGV while the compiler's working fine?
I bet (from the disassembly) that tables is a pointer to an array, not the array itself.
So you need:
asm volatile ( "shll $10, %1;"
movl _tables,%%eax
"orl %1, %2;"
"movl (%%eax,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2) : "eax" )
(Don't forget the extra clobber in the last line).
There are of course variations, this may be more efficient if the code is in a loop:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl (%3,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2), "r"(tables) )
This is intended to be an addition to Mats Petersson's answer - I wrote it simply because it wasn't immediately clear to me why OP's analysis of the disassembly (that his assembly and the compiler-generated one were equivalent) was incorrect.
As Mats Petersson explains, the problem is that tables is actually a pointer to an array, so to access an element, you have to dereference twice. Now to me, it wasn't immediately clear where this happens in the compiler-generated code. The culprit is this innocent-looking line:
a1 18 e0 47 00 mov 0x47e018,%eax
To the untrained eye (that includes mine), this might look like the value 0x47e018 is moved to eax, but it's actually not. The Intel-syntax representation of the same opcodes gives us a clue:
a1 18 e0 47 00 mov eax,ds:0x47e018
Ah - ds: - so it's not actually a value, but an address!
For anyone who is wondering now, the following would be the opcodes and ATT syntax assembly for moving the value 0x47e018 to eax:
b8 18 e0 47 00 mov $0x47e018,%eax
I have a spin lock with the xchg instruction. The C++ function takes in the resource to be locked.
Following is the code
void SpinLock::lock( u32& resource )
{
__asm__ __volatile__
(
"mov ebx, %0\n\t"
"InUseLoop:\n\t"
"mov eax, 0x01\n\t" /* 1=In Use*/
"xchg eax, [ebx]\n\t"
"cmp eax, 0x01\n\t"
"je InUseLoop\n\t"
:"=r"(resource)
:"r"(resource)
:"eax","ebx"
);
}
void SpinLock::unlock(u32& resource )
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"mov ebx, %0\n\t"
"mov DWORD PTR [ebx], 0x00\n\t"
:"=r"(resource)
:"r"(resource)
: "ebx"
);
}
This code is compiled with gcc 4.5.2 -masm=intel on a 64 bit intel machine.
The objdump produces following assembly for the above functions .
0000000000490968 <_ZN8SpinLock4lockERj>:
490968: 55 push %rbp
490969: 48 89 e5 mov %rsp,%rbp
49096c: 53 push %rbx
49096d: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490971: 48 8b 45 f0 mov -0x10(%rbp),%rax
490975: 8b 10 mov (%rax),%edx
490977: 89 d3 mov %edx,%ebx
0000000000490979 <InUseLoop>:
490979: b8 01 00 00 00 mov $0x1,%eax
49097e: 67 87 03 addr32 xchg %eax,(%ebx)
490981: 83 f8 01 cmp $0x1,%eax
490984: 74 f3 je 490979 <InUseLoop>
490986: 48 8b 45 f0 mov -0x10(%rbp),%rax
49098a: 89 10 mov %edx,(%rax)
49098c: 5b pop %rbx
49098d: c9 leaveq
49098e: c3 retq
49098f: 90 nop
0000000000490990 <_ZN8SpinLock6unlockERj>:
490990: 55 push %rbp
490991: 48 89 e5 mov %rsp,%rbp
490994: 53 push %rbx
490995: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490999: 48 8b 45 f0 mov -0x10(%rbp),%rax
49099d: 8b 00 mov (%rax),%eax
49099f: 89 d3 mov %edx,%ebx
4909a1: 67 c7 03 00 00 00 00 addr32 movl $0x0,(%ebx)
4909a8: 48 8b 45 f0 mov -0x10(%rbp),%rax
4909ac: 89 10 mov %edx,(%rax)
4909ae: 5b pop %rbx
4909af: c9 leaveq
4909b0: c3 retq
4909b1: 90 nop
The code dumps core when executing the locking operation.
Is there something grossly wrong here ?
Regards,
-J
First, why are you using truncated 32-bit addresses in your assembly code whereas the rest of the program is compiled to execute in 64-bit mode and operate with 64-bit addresses/pointers? I'm referring to ebx. Why is it not rbx?
Second, why are you trying to return a value from the assembly code with "=r"(resource)? Your functions change the in-memory value with xchg eax, [ebx] and mov DWORD PTR [ebx], 0x00 and return void. Remove "=r"(resource).
Lastly, if you look closely at the disassembly of SpinLock::lock(), can't you see something odd about ebx?:
mov %rdi,-0x10(%rbp)
mov -0x10(%rbp),%rax
mov (%rax),%edx
mov %edx,%ebx
<InUseLoop>:
mov $0x1,%eax
addr32 xchg %eax,(%ebx)
In this code, the ebx value, which is an address/pointer, does not come directly from the function's parameter (rdi), the parameter first gets dereferenced with mov (%rax),%edx, but why? If you throw away all the confusing C++ reference stuff, technically, the function receives a pointer to u32, not a pointer to a pointer to u32, and thus needs no extra dereference anywhere.
The problem is here: "r"(resource). It must be "r"(&resource).
A small 32-bit test app demonstrates this problem:
#include <iostream>
using namespace std;
void unlock1(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(resource)
:"ebx"
);
}
void unlock2(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(&resource)
:"ebx"
);
}
unsigned blah;
int main(void)
{
blah = 3456789012u;
cout << "before unlock2() blah=" << blah << endl;
unlock2(blah);
cout << "after unlock2() blah=" << blah << endl;
blah = 3456789012u;
cout << "before unlock1() blah=" << blah << endl;
unlock1(blah); // may crash here, but if it doesn't, it won't change blah
cout << "after unlock1() blah=" << blah << endl;
return 0;
}
Output:
before unlock2() blah=3456789012
after unlock2() blah=0
before unlock1() blah=3456789012
Exiting due to signal SIGSEGV
General Protection Fault at eip=000015eb
eax=ce0a6a14 ...
is there any benefit to using short instead of int in a for loop?
i.e.
for(short j = 0; j < 5; j++) {
99% of my loops involve numbers below 3000, so I was thinking ints would be a waste of bytes. Thanks!
No, there is no benefit. The short will probably end up taking a full register (which is 32 bits, an int) anyway.
You will lose hours typing the extra two letters in the IDE, too. (That was a joke).
No. The loop variable will likely be allocated to a register, so it will end up taking up the same amount of space regardless.
Look at the generated assembler code and you would probably see that using int generates cleaner code.
c-code:
#include <stdio.h>
int main(void) {
int j;
for(j = 0; j < 5; j++) {
printf("%d", j);
}
}
using short:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: 66 c7 44 24 1e 00 00 movw $0x0,0x1e(%esp)
80483d4: eb 1c jmp 80483f2 <main+0x2e>
80483d6: 0f bf 54 24 1e movswl 0x1e(%esp),%edx
80483db: b8 c0 84 04 08 mov $0x80484c0,%eax
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 66 83 44 24 1e 01 addw $0x1,0x1e(%esp)
80483f2: 66 83 7c 24 1e 04 cmpw $0x4,0x1e(%esp)
80483f8: 7e dc jle 80483d6 <main+0x12>
80483fa: c9 leave
80483fb: c3 ret
using int:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483d4: 00
80483d5: eb 1a jmp 80483f1 <main+0x2d>
80483d7: b8 c0 84 04 08 mov $0x80484c0,%eax
80483dc: 8b 54 24 1c mov 0x1c(%esp),%edx
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 83 44 24 1c 01 addl $0x1,0x1c(%esp)
80483f1: 83 7c 24 1c 04 cmpl $0x4,0x1c(%esp)
80483f6: 7e df jle 80483d7 <main+0x13>
80483f8: c9 leave
80483f9: c3 ret
More often than not, trying to optimize for this will just exacerbate bugs when someone doesn't notice (or forgets) that it's a narrow data type. For instance, check out this bcrypt problem I looked into...pretty typical:
BCrypt says long, similar passwords are equivalent - problem with me, the gem, or the field of cryptography?
Yet the problem is still there as int is a finite size as well. Better to spend your time making sure your program is correct and not creating hazards or security problems from numeric underflows and overflows.
Some of what I talk about w/numeric_limits here might be informative or interesting, if you haven't encountered that yet:
http://hostilefork.com/2009/03/31/modern_cpp_or_modern_art/
Nope. Chances are your counter will end up in a register anyway, and they are typically at least the same size as int
I think there isn't much difference. Your compiler will probably use an entire 32-bit register for the counter variable (in 32-bit mode). You'll waste just two bytes from the stack, at most, in the worst case (when not used a register)
One potential improvement over int as loop counter is unsigned int (or std::size_t where applicable) if the loop index is never going to be negative. Using short instead of int makes no difference in most compilers, here's the ones I have.
Code:
volatile int n;
int main()
{
for(short j = 0; j < 50; j++) // replaced with int in test2
n = j;
}
g++ 4.5.2 -march=native -O3 on x86_64 linux
// using short j // using int j
.L2: .L2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .L2 jne .L2
clang++ 2.9 -march=native -O3 on x86_64 linux
// using short j // using int j
.LBB0_1: .LBB0_1:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .LBB0_1 jne .LBB0_1
Intel C++ 11.1 -fast on x86_64 linux
// using short j // using int j
..B1.2: ..B1.2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %edx incl %eax
movswq %dx, %rax cmpl $50, %eax
cmpl $50, %eax jl ..B1.2
jl ..B1.2
Sun C++ 5.8 -xO5 on sparc
// using short j // using int j
.L900000105: .L900000105:
st %o4,[%o5+%lo(n)] st %o4,[%o5+%lo(n)]
add %o4,1,%o4 add %o4,1,%o4
cmp %o4,49 cmp %o4,49
ble,pt %icc,.L900000105 ble,pt %icc,.L900000105
So of the four compilers I have, only one even had any difference in the result, and, it actually used less bytes in case of int.
As most others have said, computationally there is no advantage and might be worse. However, if the loop variable is used in a computation requiring a short, then it might be justified:
for(short j = 0; j < 5; j++)
{
// void myfunc(short arg1);
myfunc(j);
}
All this really does is prevent a warning message as the value passed would be promoted to an int (depending on compiler, platform, and C++ dialect). But it looks cleaner, IMHO.
Certainly not worth obsessing over. If you are looking to optimize, remember the rules (forget who came up with these):
Don't
Failing Step 1, first Measure
Make a change
If bored, exit, else go to Step 2.
(gdb) n
253 conf.log = log;
Like above,the next statement is conf.log = log;,how can I just disas that?
I tried simply disas,but gdb will disassembly all the current function(I don't need so much)...
(gdb) disas
Dump of assembler code for function ngx_init_cycle:
0x0000000000417c7c <ngx_init_cycle+0>: push %rbp
0x0000000000417c7d <ngx_init_cycle+1>: mov %rsp,%rbp
0x0000000000417c80 <ngx_init_cycle+4>: push %rbx
0x0000000000417c81 <ngx_init_cycle+5>: sub $0x258,%rsp
0x0000000000417c88 <ngx_init_cycle+12>: mov %rdi,-0x228(%rbp)
0x0000000000417c8f <ngx_init_cycle+19>: callq 0x42b2fc <ngx_timezone_update>
0x0000000000417c94 <ngx_init_cycle+24>: mov 0x2b00e5(%rip),%rax # 0x6c7d80 <ngx_cached_time>
0x0000000000417c9b <ngx_init_cycle+31>: mov %rax,-0x88(%rbp)
0x0000000000417ca2 <ngx_init_cycle+38>: mov -0x88(%rbp),%rax
0x0000000000417ca9 <ngx_init_cycle+45>: movq $0x0,(%rax)
0x0000000000417cb0 <ngx_init_cycle+52>: callq 0x4149e7 <ngx_time_update>
0x0000000000417cb5 <ngx_init_cycle+57>: mov -0x228(%rbp),%rax
0x0000000000417cbc <ngx_init_cycle+64>: mov 0x10(%rax),%rax
0x0000000000417cc0 <ngx_init_cycle+68>: mov %rax,-0x90(%rbp)
0x0000000000417cc7 <ngx_init_cycle+75>: mov -0x90(%rbp),%rsi
0x0000000000417cce <ngx_init_cycle+82>: mov $0x4000,%edi
0x0000000000417cd3 <ngx_init_cycle+87>: callq 0x405c6c <ngx_create_pool>
0x0000000000417cd8 <ngx_init_cycle+92>: mov %rax,-0x80(%rbp)
0x0000000000417cdc <ngx_init_cycle+96>: cmpq $0x0,-0x80(%rbp)
---Type <return> to continue, or q <return> to quit---q
UPDATE
(gdb) info line 98
Line 98 of "src/os/unix/ngx_process_cycle.c" starts at address 0x42f6f3 <ngx_master_process_cycle+31>
and ends at 0x42f704 <ngx_master_process_cycle+48>.
(gdb) disas 0x42f6f3,0x42f704
Dump of assembler code for function ngx_master_process_cycle:
0x000000000042f6d4 <ngx_master_process_cycle+0>: push %rbp
0x000000000042f6d5 <ngx_master_process_cycle+1>: mov %rsp,%rbp
0x000000000042f6d8 <ngx_master_process_cycle+4>: push %rbx
0x000000000042f6d9 <ngx_master_process_cycle+5>: sub $0x128,%rsp
0x000000000042f6e0 <ngx_master_process_cycle+12>: mov %rdi,-0x108(%rbp)
0x000000000042f6e7 <ngx_master_process_cycle+19>: lea -0xe0(%rbp),%rdi
0x000000000042f6ee <ngx_master_process_cycle+26>: callq 0x402988 <sigemptyset#plt>
0x000000000042f6f3 <ngx_master_process_cycle+31>: lea -0xe0(%rbp),%rdi
0x000000000042f6fa <ngx_master_process_cycle+38>: mov $0x11,%esi
0x000000000042f6ff <ngx_master_process_cycle+43>: callq 0x402878 <sigaddset#plt>
0x000000000042f704 <ngx_master_process_cycle+48>: lea -0xe0(%rbp),%rdi
0x000000000042f70b <ngx_master_process_cycle+55>: mov $0xe,%esi
0x000000000042f710 <ngx_master_process_cycle+60>: callq 0x402878 <sigaddset#plt>
0x000000000042f715 <ngx_master_process_cycle+65>: lea -0xe0(%rbp),%rdi
0x000000000042f71c <ngx_master_process_cycle+72>: mov $0x1d,%esi
0x000000000042f721 <ngx_master_process_cycle+77>: callq 0x402878 <sigaddset#plt>
0x000000000042f726 <ngx_master_process_cycle+82>: lea -0xe0(%rbp),%rdi
0x000000000042f72d <ngx_master_process_cycle+89>: mov $0x2,%esi
0x000000000042f732 <ngx_master_process_cycle+94>: callq 0x402878 <sigaddset#plt>
---Type <return> to continue, or q <return> to quit---
try something to the effect of:
(gdb) info line 12
Line 12 of "test.c" starts at address 0x4004f4 <main+24>
and ends at 0x4004fe <main+34>.
(gdb) disas 0x4004f4,0x4004fe
Dump of assembler code from 0x4004f4 to 0x4004fe:
0x00000000004004f4 <main+24>: mov $0x0,%eax
0x00000000004004f9 <main+29>: callq 0x4004d0 <bp3>
End of assembler dump.
Or:
(gdb) disas main+24,main+34
Dump of assembler code from 0x4004f4 to 0x4004fe:
0x00000000004004f4 <main+24>: mov $0x0,%eax
0x00000000004004f9 <main+29>: callq 0x4004d0 <bp3>
End of assembler dump.
not sure of a more automatic way offhand.