Benefit of using short instead of int in for... loop

Benefit of using short instead of int in for... loop - c++

is there any benefit to using short instead of int in a for loop?
i.e.
for(short j = 0; j < 5; j++) {
99% of my loops involve numbers below 3000, so I was thinking ints would be a waste of bytes. Thanks!

No, there is no benefit. The short will probably end up taking a full register (which is 32 bits, an int) anyway.
You will lose hours typing the extra two letters in the IDE, too. (That was a joke).

No. The loop variable will likely be allocated to a register, so it will end up taking up the same amount of space regardless.

Look at the generated assembler code and you would probably see that using int generates cleaner code.
c-code:
#include <stdio.h>
int main(void) {
int j;
for(j = 0; j < 5; j++) {
printf("%d", j);
}
}
using short:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: 66 c7 44 24 1e 00 00 movw $0x0,0x1e(%esp)
80483d4: eb 1c jmp 80483f2 <main+0x2e>
80483d6: 0f bf 54 24 1e movswl 0x1e(%esp),%edx
80483db: b8 c0 84 04 08 mov $0x80484c0,%eax
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 66 83 44 24 1e 01 addw $0x1,0x1e(%esp)
80483f2: 66 83 7c 24 1e 04 cmpw $0x4,0x1e(%esp)
80483f8: 7e dc jle 80483d6 <main+0x12>
80483fa: c9 leave
80483fb: c3 ret
using int:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483d4: 00
80483d5: eb 1a jmp 80483f1 <main+0x2d>
80483d7: b8 c0 84 04 08 mov $0x80484c0,%eax
80483dc: 8b 54 24 1c mov 0x1c(%esp),%edx
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 83 44 24 1c 01 addl $0x1,0x1c(%esp)
80483f1: 83 7c 24 1c 04 cmpl $0x4,0x1c(%esp)
80483f6: 7e df jle 80483d7 <main+0x13>
80483f8: c9 leave
80483f9: c3 ret

More often than not, trying to optimize for this will just exacerbate bugs when someone doesn't notice (or forgets) that it's a narrow data type. For instance, check out this bcrypt problem I looked into...pretty typical:
BCrypt says long, similar passwords are equivalent - problem with me, the gem, or the field of cryptography?
Yet the problem is still there as int is a finite size as well. Better to spend your time making sure your program is correct and not creating hazards or security problems from numeric underflows and overflows.
Some of what I talk about w/numeric_limits here might be informative or interesting, if you haven't encountered that yet:
http://hostilefork.com/2009/03/31/modern_cpp_or_modern_art/

Nope. Chances are your counter will end up in a register anyway, and they are typically at least the same size as int

I think there isn't much difference. Your compiler will probably use an entire 32-bit register for the counter variable (in 32-bit mode). You'll waste just two bytes from the stack, at most, in the worst case (when not used a register)

One potential improvement over int as loop counter is unsigned int (or std::size_t where applicable) if the loop index is never going to be negative. Using short instead of int makes no difference in most compilers, here's the ones I have.
Code:
volatile int n;
int main()
{
for(short j = 0; j < 50; j++) // replaced with int in test2
n = j;
}
g++ 4.5.2 -march=native -O3 on x86_64 linux
// using short j // using int j
.L2: .L2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .L2 jne .L2
clang++ 2.9 -march=native -O3 on x86_64 linux
// using short j // using int j
.LBB0_1: .LBB0_1:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .LBB0_1 jne .LBB0_1
Intel C++ 11.1 -fast on x86_64 linux
// using short j // using int j
..B1.2: ..B1.2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %edx incl %eax
movswq %dx, %rax cmpl $50, %eax
cmpl $50, %eax jl ..B1.2
jl ..B1.2
Sun C++ 5.8 -xO5 on sparc
// using short j // using int j
.L900000105: .L900000105:
st %o4,[%o5+%lo(n)] st %o4,[%o5+%lo(n)]
add %o4,1,%o4 add %o4,1,%o4
cmp %o4,49 cmp %o4,49
ble,pt %icc,.L900000105 ble,pt %icc,.L900000105
So of the four compilers I have, only one even had any difference in the result, and, it actually used less bytes in case of int.

As most others have said, computationally there is no advantage and might be worse. However, if the loop variable is used in a computation requiring a short, then it might be justified:
for(short j = 0; j < 5; j++)
{
// void myfunc(short arg1);
myfunc(j);
}
All this really does is prevent a warning message as the value passed would be promoted to an int (depending on compiler, platform, and C++ dialect). But it looks cleaner, IMHO.
Certainly not worth obsessing over. If you are looking to optimize, remember the rules (forget who came up with these):
Don't
Failing Step 1, first Measure
Make a change
If bored, exit, else go to Step 2.

Related

Why can't my template function be inlined

I wrote a class that can store lambda function in it to make sure all resources released before a function exits.
I test my code in MSVC2015, Release mode with /O2.
However, I find that GenerateScopeGuard cannot be inlined and a small function is generated.
int main()
{
01031C00 55 push ebp
01031C01 8B EC mov ebp,esp
01031C03 51 push ecx
auto func = GenerateScopeGuard([] {printf("hello\n"); });
01031C04 8D 4D FC lea ecx,[func]
01031C07 E8 24 00 00 00 call GenerateScopeGuard<<lambda_8b2f3596146f3fc3f8311b4d76487aed> > (01031C30h)
return 0;
01031C0C 80 7D FD 00 cmp byte ptr [ebp-3],0
01031C10 75 0D jne main+1Fh (01031C1Fh)
01031C12 68 78 61 03 01 push offset string "hello\n" (01036178h)
01031C17 E8 24 00 00 00 call printf (01031C40h)
01031C1C 83 C4 04 add esp,4
01031C1F 33 C0 xor eax,eax
}
01031C21 8B E5 mov esp,ebp
01031C23 5D pop ebp
01031C24 C3 ret
return ScopeGuard<T>(std::forward<T>(func));
01031C30 C6 41 01 00 mov byte ptr [ecx+1],0
01031C34 8B C1 mov eax,ecx
}
01031C36 C3 ret
Looks like exception handling related. Indeed if I disable C++ exception the function is inlined, but don't work with /EHsc. Why?
Here is my code.
template<typename T>
class ScopeGuard
{
public:
ScopeGuard(const ScopeGuard&) = delete;
ScopeGuard& operator=(const ScopeGuard&) = delete;
explicit ScopeGuard(T&& func) :
func_(std::forward<T>(func))
{}
ScopeGuard(ScopeGuard&& right) :
func_(std::move(right.func_))
{}
~ScopeGuard()
{
if (!dismissed_) func_();
}
void Dismiss()
{
dismissed_ = true;
}
private:
T func_;
bool dismissed_ = false;
};
template<typename T>
__forceinline ScopeGuard<T> GenerateScopeGuard(T&& func) noexcept
{
return ScopeGuard<T>(std::forward<T>(func));
}
int main()
{
auto func = GenerateScopeGuard([] {printf("hello\n"); });
return 0;
}

It seems MSVC compiler is getting confused by recently introduced non static member initialization feature. Constructor with initialization seems to fix the problem:
ScopeGuard()
: dismissed_(false)
{
}

Compiling with -O3 -std=c++14:
output from gcc 5.4:
.LC0:
.string "hello"
main:
subq $8, %rsp
movl $.LC0, %edi
call puts
xorl %eax, %eax
addq $8, %rsp
ret
output from clang 3.8:
main: # #main
pushq %rax
movl $.Lstr, %edi
callq puts
xorl %eax, %eax
popq %rcx
retq
.Lstr:
.asciz "hello"
This is without the nonstandard __forceinline attribute.
Are you sure you've enabled optimisation?
Is it that this function is generated but not called by main?

gcc '-m32' option changes floating-point rounding when not running valgrind

I am getting different floating-point rounding under different build/execute scenarios. Notice the 2498 in the second run below...
#include <iostream>
#include <cassert>
#include <typeinfo>
using std::cerr;
void domath( int n, double c, double & q1, double & q2 )
{
q1=n*c;
q2=int(n*c);
}
int main()
{
int n=2550;
double c=0.98, q1, q2;
domath( n, c, q1, q2 );
cerr<<"sizeof(int)="<<sizeof(int)<<", sizeof(double)="<<sizeof(double)<<", sizeof(n*c)="<<sizeof(n*c)<<"\n";
cerr<<"n="<<n<<", int(q1)="<<int(q1)<<", int(q2)="<<int(q2)<<"\n";
assert( typeid(q1) == typeid(n*c) );
}
Running as a 64-bit executable...
$ g++ -m64 -Wall rounding_test.cpp -o rounding_test && ./rounding_test
sizeof(int)=4, sizeof(double)=8, sizeof(n*c)=8
n=2550, int(q1)=2499, int(q2)=2499
Running as a 32-bit executable...
$ g++ -m32 -Wall rounding_test.cpp -o rounding_test && ./rounding_test
sizeof(int)=4, sizeof(double)=8, sizeof(n*c)=8
n=2550, int(q1)=2499, int(q2)=2498
Running as a 32-bit executable under valgrind...
$ g++ -m32 -Wall rounding_test.cpp -o rounding_test && valgrind --quiet ./rounding_test
sizeof(int)=4, sizeof(double)=8, sizeof(n*c)=8
n=2550, int(q1)=2499, int(q2)=2499
Why am I seeing different results when compiling with -m32, and why are the results different again when running valgrind?
My system is Ubuntu 14.04.1 LTS x86_64, and my gcc is version 4.8.2.
EDIT:
In response to the request for disassembly, I have refactored the code a bit so that I could isolate the relevant portion. The approach taken between -m64 and -m32 is clearly much different. I'm not too concerned about why these give a different rounding result since I can fix that by applying the round() function. The most interesting question is: why does valgrind change the result?
rounding_test: file format elf64-x86-64
<
000000000040090d <_Z6domathidRdS_>: <
40090d: 55 push %rbp <
40090e: 48 89 e5 mov %rsp,%rbp <
400911: 89 7d fc mov %edi,-0x4(%rbp <
400914: f2 0f 11 45 f0 movsd %xmm0,-0x10(%r <
400919: 48 89 75 e8 mov %rsi,-0x18(%rb <
40091d: 48 89 55 e0 mov %rdx,-0x20(%rb <
400921: f2 0f 2a 45 fc cvtsi2sdl -0x4(%rbp), <
400926: f2 0f 59 45 f0 mulsd -0x10(%rbp),%x <
40092b: 48 8b 45 e8 mov -0x18(%rbp),%r <
40092f: f2 0f 11 00 movsd %xmm0,(%rax) <
400933: f2 0f 2a 45 fc cvtsi2sdl -0x4(%rbp), <
400938: f2 0f 59 45 f0 mulsd -0x10(%rbp),%x <
40093d: f2 0f 2c c0 cvttsd2si %xmm0,%eax <
400941: f2 0f 2a c0 cvtsi2sd %eax,%xmm0 <
400945: 48 8b 45 e0 mov -0x20(%rbp),%r <
400949: f2 0f 11 00 movsd %xmm0,(%rax) <
40094d: 5d pop %rbp <
40094e: c3 retq <
| rounding_test: file format elf32-i386
> 0804871d <_Z6domathidRdS_>:
> 804871d: 55 push %ebp
> 804871e: 89 e5 mov %esp,%ebp
> 8048720: 83 ec 10 sub $0x10,%esp
> 8048723: 8b 45 0c mov 0xc(%ebp),%eax
> 8048726: 89 45 f8 mov %eax,-0x8(%ebp
> 8048729: 8b 45 10 mov 0x10(%ebp),%ea
> 804872c: 89 45 fc mov %eax,-0x4(%ebp
> 804872f: db 45 08 fildl 0x8(%ebp)
> 8048732: dc 4d f8 fmull -0x8(%ebp)
> 8048735: 8b 45 14 mov 0x14(%ebp),%ea
> 8048738: dd 18 fstpl (%eax)
> 804873a: db 45 08 fildl 0x8(%ebp)
> 804873d: dc 4d f8 fmull -0x8(%ebp)
> 8048740: d9 7d f6 fnstcw -0xa(%ebp)
> 8048743: 0f b7 45 f6 movzwl -0xa(%ebp),%ea
> 8048747: b4 0c mov $0xc,%ah
> 8048749: 66 89 45 f4 mov %ax,-0xc(%ebp)
> 804874d: d9 6d f4 fldcw -0xc(%ebp)
> 8048750: db 5d f0 fistpl -0x10(%ebp)
> 8048753: d9 6d f6 fldcw -0xa(%ebp)
> 8048756: 8b 45 f0 mov -0x10(%ebp),%e
> 8048759: 89 45 f0 mov %eax,-0x10(%eb
> 804875c: db 45 f0 fildl -0x10(%ebp)
> 804875f: 8b 45 18 mov 0x18(%ebp),%ea
> 8048762: dd 18 fstpl (%eax)
> 8048764: c9 leave
> 8048765: c3 ret

Edit: It would seem that, at least a long time back, valgrind's floating point calculations wheren't quite as accurate as the "real" calculations. In other words, this MAY explain why you get different results. See this question and answer on the valgrind mailing list.
Edit2: And the current valgrind.org documentation has it in it's "core limitations" section here - so I would expect that it is indeed "still valid". In other words the documentation for valgrind says to expect differences between valgrind and x87 FPU calculations. "You have been warned!" (And as we can see, using sse instructions to do the same math produces the same result as valgrind, confirming that it's a "rounding from 80 bits to 64 bits" difference)
Floating point calculations WILL differ slightly depending on exactly how the calculation is performed. I'm not sure exactly what you want to have an answer to, so here's a long rambling "answer of a sort".
Valgrind DOES indeed change the exact behaviour of your program in various ways (it emulates certain instructions, rather than actually executing the real instructions - which may include saving the intermediate results of calculations). Also, floating point calculations are well known to "not be precise" - it's just a matter of luck/bad luck if the calculation comes out precise or not. 0.98 is one of many, many numbers that can't be described precisely in floating point format [at least not the common IEEE formats].
By adding:
cerr<<"c="<<std::setprecision(30)<<c <<"\n";
we see that the output is c=0.979999999999999982236431605997 (yes, the actual value is 0.979999...99982 or some such, remaining digits is just the residual value, since it's not an "even" binary number, there's always going to be something left.
This is the n = 2550;, c = 0.98 and q = n * c part of the code as generated by gcc:
movl $2550, -28(%ebp) ; n
fldl .LC0
fstpl -40(%ebp) ; c
fildl -28(%ebp)
fmull -40(%ebp)
fstpl -48(%ebp) ; q - note that this is stored as a rouned 64-bit value.
This is the int(q) and int(n*c) part of the code:
fildl -28(%ebp) ; n
fmull -40(%ebp) ; c
fnstcw -58(%ebp) ; Save control word
movzwl -58(%ebp), %eax
movb $12, %ah
movw %ax, -60(%ebp) ; Save float control word.
fldcw -60(%ebp)
fistpl -64(%ebp) ; Store as integer (directly from 80-bit result)
fldcw -58(%ebp) ; restore float control word.
movl -64(%ebp), %ebx ; result of int(n * c)
fldl -48(%ebp) ; q
fldcw -60(%ebp) ; Load float control word as saved above.
fistpl -64(%ebp) ; Store as integer.
fldcw -58(%ebp) ; Restore control word.
movl -64(%ebp), %esi ; result of int(q)
Now, if the intermediate result is stored (and thus rounded) from the internal 80-bit precision in the middle of one of those calculations, the result may be subtly different from the result if the calculation happens without saving intermediate values.
I get identical results from both g++ 4.9.2 and clang++ -mno-sse - but if I enable sse in the clang case, it gives the same result as 64-bit build. Using gcc -msse2 -m32 gives the 2499 answer everywhere. This indicates that the error is caused by "storing intermediate results" in some way or another.
Likewise, optimising in gcc to -O1 will give the 2499 in all places - but this is a coincidence, not a result of some "clever thinking". If you want correctly rounded integer values of your calculations, you're much better off rounding yourself, because sooner or later int(someDoubleValue) will come up "one short".
Edit3: And finally, using g++ -mno-sse -m64 will also produce the same 2498 answer, where using valgrind on the same binary produces the 2499 answer.

The 32-bit version uses X87 floating point instructions. X87 internally uses 80-bit floating point numbers, which will cause trouble when numbers are converted to and from other precisions. In your case the 64-bit precision approximation for 0.98 is slightly less than the true value. When the CPU converts it to an 80-bit value you get the exact same numerical value, which is an equally bad approximation - having more bits doesn't get you a better approximation. The FPU then multiplies that number by 2550, and gets a figure that's slightly less than 2499. If the CPU used 64-bit numbers all the way it should compute exactly 2499, like it does in the 64-bit version.

SIGSEGV When accessing array element using assembly

Background:
I am new to assembly. When I was learning programming, I made a program that implements multiplication tables up to 1000 * 1000. The tables are formatted so that each answer is on the line factor1 << 10 | factor2 (I know, I know, it's isn't pretty). These tables are then loaded into an array: int* tables. Empty lines are filled with 0. Here is a link to the file for the tables (7.3 MB). I know using assembly won't speed up this by much, but I just wanted to do it for fun (and a bit of practice).
Question:
I'm trying to convert this code into inline assembly (tables is a global):
int answer;
// ...
answer = tables [factor1 << 10 | factor2];
This is what I came up with:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
My C++ code works fine, but my assembly fails. What is wrong with my assembly (especially the movl _tables(,%2,4), %0; part), compared to my C++
What I have done to solve it:
I used some random numbers: 89 796 as factor1 and factor2. I know that there is an element at 89 << 10 | 786 (which is 91922) – verified this with C++. When I run it with gdb, I get a SIGSEGV:
Program received signal SIGSEGV, Segmentation fault.
at this line:
"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );
I added two methods around my asm, which is how I know where the asm block is in the disassembly.
Disassembly of my asm block:
The disassembly from objdump -M att -d looks fine (although I'm not sure, I'm new to assembly, as I said):
402096: 8b 45 08 mov 0x8(%ebp),%eax
402099: 8b 55 0c mov 0xc(%ebp),%edx
40209c: c1 e0 0a shl $0xa,%eax
40209f: 09 c2 or %eax,%edx
4020a1: 8b 04 95 18 e0 47 00 mov 0x47e018(,%edx,4),%eax
4020a8: 89 45 ec mov %eax,-0x14(%ebp)
The disassembly from objdump -M intel -d:
402096: 8b 45 08 mov eax,DWORD PTR [ebp+0x8]
402099: 8b 55 0c mov edx,DWORD PTR [ebp+0xc]
40209c: c1 e0 0a shl eax,0xa
40209f: 09 c2 or edx,eax
4020a1: 8b 04 95 18 e0 47 00 mov eax,DWORD PTR [edx*4+0x47e018]
4020a8: 89 45 ec mov DWORD PTR [ebp-0x14],eax
From what I understand, it's moving the first parameter of my void calc ( int factor1, int factor2 ) function into eax. Then it's moving the second parameter into edx. Then it shifts eax to the left by 10 and ors it with edx. A 32-bit integer is 4 bytes, so [edx*4+base_address]. Move result to eax and then put eax into int answer (which, I'm guessing is on -0x14 of the stack). I don't really see much of a problem.
Disassembly of the compiler's .exe:
When I replace the asm block with plain C++ (answer = tables [factor1 << 10 | factor2];) and disassemble it this is what I get in Intel syntax:
402096: a1 18 e0 47 00 mov eax,ds:0x47e018
40209b: 8b 55 08 mov edx,DWORD PTR [ebp+0x8]
40209e: c1 e2 0a shl edx,0xa
4020a1: 0b 55 0c or edx,DWORD PTR [ebp+0xc]
4020a4: c1 e2 02 shl edx,0x2
4020a7: 01 d0 add eax,edx
4020a9: 8b 00 mov eax,DWORD PTR [eax]
4020ab: 89 45 ec mov DWORD PTR [ebp-0x14],eax
AT&T syntax:
402096: a1 18 e0 47 00 mov 0x47e018,%eax
40209b: 8b 55 08 mov 0x8(%ebp),%edx
40209e: c1 e2 0a shl $0xa,%edx
4020a1: 0b 55 0c or 0xc(%ebp),%edx
4020a4: c1 e2 02 shl $0x2,%edx
4020a7: 01 d0 add %edx,%eax
4020a9: 8b 00 mov (%eax),%eax
4020ab: 89 45 ec mov %eax,-0x14(%ebp)
I am not really familiar with the Intel syntax, so I am just going to try and understand the AT&T syntax:
It first moves the base address of the tables array into %eax. Then, is moves the first parameter into %edx. It shifts %edx to the left by 10 then ors it with the second parameter. Then, by shifting %edx to the left by two, it actually multiplies %edx by 4. Then, it adds that to %eax (the base address of the array). So, basically it just did this: [edx*4+0x47e018] (Intel syntax) or 0x47e018(,%edx,4) AT&T. It moves the value of the element it got into %eax and puts it into int answer. This method is more "expanded", but it does the same thing as my hand-written assembly! So why is mine giving a SIGSEGV while the compiler's working fine?

I bet (from the disassembly) that tables is a pointer to an array, not the array itself.
So you need:
asm volatile ( "shll $10, %1;"
movl _tables,%%eax
"orl %1, %2;"
"movl (%%eax,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2) : "eax" )
(Don't forget the extra clobber in the last line).
There are of course variations, this may be more efficient if the code is in a loop:
asm volatile ( "shll $10, %1;"
"orl %1, %2;"
"movl (%3,%2,4)",
: "=r" (answer) : "r" (factor1), "r" (factor2), "r"(tables) )

This is intended to be an addition to Mats Petersson's answer - I wrote it simply because it wasn't immediately clear to me why OP's analysis of the disassembly (that his assembly and the compiler-generated one were equivalent) was incorrect.
As Mats Petersson explains, the problem is that tables is actually a pointer to an array, so to access an element, you have to dereference twice. Now to me, it wasn't immediately clear where this happens in the compiler-generated code. The culprit is this innocent-looking line:
a1 18 e0 47 00 mov 0x47e018,%eax
To the untrained eye (that includes mine), this might look like the value 0x47e018 is moved to eax, but it's actually not. The Intel-syntax representation of the same opcodes gives us a clue:
a1 18 e0 47 00 mov eax,ds:0x47e018
Ah - ds: - so it's not actually a value, but an address!
For anyone who is wondering now, the following would be the opcodes and ATT syntax assembly for moving the value 0x47e018 to eax:
b8 18 e0 47 00 mov $0x47e018,%eax

Break executed when it shouldn't

int main(int argc, char *argv[]) {
int i = 0;
for (i = 0; i < 50; i++)
if (false) break;
}
Compiled and executed with VS 2010 (same issue in VS 2008). I put a breakpoint at the last line (closing bracket) and look via debugger into variable i. This code leaves i at 0. Why?
int main(int argc, char *argv[]) {
int i = 0;
for (i = 0; i < 50; i++)
if (false) break;;
}
After this - please notice the second semicolon after break - i is 50 as expected.
Can someone please explain this strange behaviour to me?

Looking at the generated assembly code with objdump -d -S we can see a possible reason that GDB jumps over the loop:
0000000000400584 <main>:
int main()
{
400584: 55 push %rbp
400585: 48 89 e5 mov %rsp,%rbp
volatile int i;
for (i = 0; i < 5; i++)
400588: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40058f: eb 09 jmp 40059a <main+0x16>
400591: 8b 45 fc mov -0x4(%rbp),%eax
400594: 83 c0 01 add $0x1,%eax
400597: 89 45 fc mov %eax,-0x4(%rbp)
40059a: 8b 45 fc mov -0x4(%rbp),%eax
40059d: 83 f8 04 cmp $0x4,%eax
4005a0: 0f 9e c0 setle %al
4005a3: 84 c0 test %al,%al
4005a5: 75 ea jne 400591 <main+0xd>
4005a7: b8 00 00 00 00 mov $0x0,%eax
if (false)
break;
}
4005ac: c9 leaveq
4005ad: c3 retq
4005ae: 90 nop
4005af: 90 nop
Even when though compiled with optimizations turned off (-O0 flag to g++) no code is actually generated for the loop body. This might means that GDB will see the loop as a single statement, and will not step through the loop properly.
I used GCC version 4.4.5, and GDB version 7.0.1.

This code doesn’t compile on a conforming compiler, as it’s invalid C++ (void main).
That said, the resulting value of i is irrelevant: the compiler can do whatever it wants.
The reason is that i is never read outside the loop (which itself provably has no effect) and not declared volatile so the compiler is trivially able to prove that there is no observable side-effect, no matter the value of i.

In MSVC10, this is reproducible. I checked the disassembly. It seems a problem in the pdb file generation. The jump instruction to go back to the beginning of the loop is mixed with the next source line, that's it.
If you press step to next line, it will go back to the beginning of the for loop from the return statement and execute the loop 50 times as expected.
.

What does <value optimized out> mean in gdb?

(gdb) n
134 a = b = c = 0xdeadbeef + ((uint32_t)length) + initval;
(gdb) n
(gdb) p a
$30 = <value optimized out>
(gdb) p b
$31 = <value optimized out>
(gdb) p c
$32 = 3735928563
How can gdb optimize out my value??

It means you compiled with e.g. gcc -O3 and the gcc optimiser found that some of your variables were redundant in some way that allowed them to be optimised away. In this particular case you appear to have three variables a, b, c with the same value and presumably they can all be aliassed to a single variable. Compile with optimisation disabled, e.g. gcc -O0, if you want to see such variables (this is generally a good idea for debug builds in any case).

Minimal runnable example with disassembly analysis
As usual, I like to see some disassembly to get a better understanding of what is going on.
In this case, the insight we obtain is that if a variable is optimized to be stored only in a register rather than the stack, and then the register it was in gets overwritten, then it shows as <optimized out> as mentioned by R..
Of course, this can only happen if the variable in question is not needed anymore, otherwise the program would lose its value. Therefore it tends to happen that at the start of the function you can see the variable value, but then at the end it becomes <optimized out>.
One typical case which we often are interested in of this is that of the function arguments themselves, since these are:
always defined at the start of the function
may not get used towards the end of the function as more intermediate values are calculated.
tend to get overwritten by further function subcalls which must setup the exact same registers to satisfy the calling convention
This understanding actually has a concrete application: when using reverse debugging, you might be able to recover the value of variables of interest simply by stepping back to their last point of usage: How do I view the value of an <optimized out> variable in C++?
main.c
#include <stdio.h>
int __attribute__((noinline)) f3(int i) {
return i + 1;
}
int __attribute__((noinline)) f2(int i) {
return f3(i) + 1;
}
int __attribute__((noinline)) f1(int i) {
int j = 1, k = 2, l = 3;
i += 1;
j += f2(i);
k += f2(j);
l += f2(k);
return l;
}
int main(int argc, char *argv[]) {
printf("%d\n", f1(argc));
return 0;
}
Compile and run:
gcc -ggdb3 -O3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
gdb -q -nh main.out
Then inside GDB, we have the following session:
Breakpoint 1, f1 (i=1) at main.c:13
13 i += 1;
(gdb) disas
Dump of assembler code for function f1:
=> 0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$1 = 1
(gdb) p j
$2 = 1
(gdb) n
14 j += f2(i);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
=> 0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$3 = 2
(gdb) p j
$4 = 1
(gdb) n
15 k += f2(j);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
=> 0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$5 = <optimized out>
(gdb) p j
$6 = 5
(gdb) n
16 l += f2(k);
(gdb) disas
Dump of assembler code for function f1:
0x00005555555546c0 <+0>: add $0x1,%edi
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
0x00005555555546cb <+11>: callq 0x5555555546b0 <f2>
0x00005555555546d0 <+16>: lea 0x2(%rax),%edi
=> 0x00005555555546d3 <+19>: callq 0x5555555546b0 <f2>
0x00005555555546d8 <+24>: add $0x3,%eax
0x00005555555546db <+27>: retq
End of assembler dump.
(gdb) p i
$7 = <optimized out>
(gdb) p j
$8 = <optimized out>
To understand what is going on, remember from the x86 Linux calling convention: What are the calling conventions for UNIX & Linux system calls on i386 and x86-64 you should know that:
RDI contains the first argument
RDI can get destroyed in function calls
RAX contains the return value
From this we deduce that:
add $0x1,%edi
corresponds to the:
i += 1;
since i is the first argument of f1, and therefore stored in RDI.
Now, while we were at both:
i += 1;
j += f2(i);
the value of RDI hadn't been modified, and therefore GDB could just query it at anytime in those lines.
However, as soon as the f2 call is made:
the value of i is not needed anymore in the program
lea 0x1(%rax),%edi does EDI = j + RAX + 1, which both:
initializes j = 1
sets up the first argument of the next f2 call to RDI = j
Therefore, when the following line is reached:
k += f2(j);
both of the following instructions have/may have modified RDI, which is the only place i was being stored (f2 may use it as a scratch register, and lea definitely set it to RAX + 1):
0x00005555555546c3 <+3>: callq 0x5555555546b0 <f2>
0x00005555555546c8 <+8>: lea 0x1(%rax),%edi
and so RDI does not contain the value of i anymore. In fact, the value of i was completely lost! Therefore the only possible outcome is:
$3 = <optimized out>
A similar thing happens to the value of j, although j only becomes unnecessary one line later afer the call to k += f2(j);.
Thinking about j also gives us some insight on how smart GDB is. Notably, at i += 1;, the value of j had not yet materialized in any register or memory address, and GDB must have known it based solely on debug information metadata.
-O0 analysis
If we use -O0 instead of -O3 for compilation:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
then the disassembly would look like:
11 int __attribute__((noinline)) f1(int i) {
=> 0x0000555555554673 <+0>: 55 push %rbp
0x0000555555554674 <+1>: 48 89 e5 mov %rsp,%rbp
0x0000555555554677 <+4>: 48 83 ec 18 sub $0x18,%rsp
0x000055555555467b <+8>: 89 7d ec mov %edi,-0x14(%rbp)
12 int j = 1, k = 2, l = 3;
0x000055555555467e <+11>: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
0x0000555555554685 <+18>: c7 45 f8 02 00 00 00 movl $0x2,-0x8(%rbp)
0x000055555555468c <+25>: c7 45 fc 03 00 00 00 movl $0x3,-0x4(%rbp)
13 i += 1;
0x0000555555554693 <+32>: 83 45 ec 01 addl $0x1,-0x14(%rbp)
14 j += f2(i);
0x0000555555554697 <+36>: 8b 45 ec mov -0x14(%rbp),%eax
0x000055555555469a <+39>: 89 c7 mov %eax,%edi
0x000055555555469c <+41>: e8 b8 ff ff ff callq 0x555555554659 <f2>
0x00005555555546a1 <+46>: 01 45 f4 add %eax,-0xc(%rbp)
15 k += f2(j);
0x00005555555546a4 <+49>: 8b 45 f4 mov -0xc(%rbp),%eax
0x00005555555546a7 <+52>: 89 c7 mov %eax,%edi
0x00005555555546a9 <+54>: e8 ab ff ff ff callq 0x555555554659 <f2>
0x00005555555546ae <+59>: 01 45 f8 add %eax,-0x8(%rbp)
16 l += f2(k);
0x00005555555546b1 <+62>: 8b 45 f8 mov -0x8(%rbp),%eax
0x00005555555546b4 <+65>: 89 c7 mov %eax,%edi
0x00005555555546b6 <+67>: e8 9e ff ff ff callq 0x555555554659 <f2>
0x00005555555546bb <+72>: 01 45 fc add %eax,-0x4(%rbp)
17 return l;
0x00005555555546be <+75>: 8b 45 fc mov -0x4(%rbp),%eax
18 }
0x00005555555546c1 <+78>: c9 leaveq
0x00005555555546c2 <+79>: c3 retq
From this horrendous disassembly, we see that the value of RDI is moved to the stack at the very start of program execution at:
mov %edi,-0x14(%rbp)
and it then gets retrieved from memory into registers whenever needed, e.g. at:
14 j += f2(i);
0x0000555555554697 <+36>: 8b 45 ec mov -0x14(%rbp),%eax
0x000055555555469a <+39>: 89 c7 mov %eax,%edi
0x000055555555469c <+41>: e8 b8 ff ff ff callq 0x555555554659 <f2>
0x00005555555546a1 <+46>: 01 45 f4 add %eax,-0xc(%rbp)
The same basically happens to j which gets immediately pushed to the stack when when it is initialized:
0x000055555555467e <+11>: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
Therefore, it is easy for GDB to find the values of those variables at any time: they are always present in memory!
This also gives us some insight on why it is not possible to avoid <optimized out> in optimized code: since the number of registers is limited, the only way to do that would be to actually push unneeded registers to memory, which would partly defeat the benefit of -O3.
Extend the lifetime of i
If we edited f1 to return l + i as in:
int __attribute__((noinline)) f1(int i) {
int j = 1, k = 2, l = 3;
i += 1;
j += f2(i);
k += f2(j);
l += f2(k);
return l + i;
}
then we observe that this effectively extends the visibility of i until the end of the function.
This is because with this we force GCC to use an extra variable to keep i around until the end:
0x00005555555546c0 <+0>: lea 0x1(%rdi),%edx
0x00005555555546c3 <+3>: mov %edx,%edi
0x00005555555546c5 <+5>: callq 0x5555555546b0 <f2>
0x00005555555546ca <+10>: lea 0x1(%rax),%edi
0x00005555555546cd <+13>: callq 0x5555555546b0 <f2>
0x00005555555546d2 <+18>: lea 0x2(%rax),%edi
0x00005555555546d5 <+21>: callq 0x5555555546b0 <f2>
0x00005555555546da <+26>: lea 0x3(%rdx,%rax,1),%eax
0x00005555555546de <+30>: retq
which the compiler does by storing i += i in RDX at the very first instruction.
Tested in Ubuntu 18.04, GCC 7.4.0, GDB 8.1.0.

It didn't. Your compiler did, but there's still a debug symbol for the original variable name.

From https://idlebox.net/2010/apidocs/gdb-7.0.zip/gdb_9.html
The values of arguments that were not saved in their stack frames are shown as `value optimized out'.
I'm guessing you compiled with -O(somevalue) and are accessing variables a,b,c in a function where optimization has occurred.

You need to turn off the compiler optimisation.
If you are interested in a particular variable in gdb, you can delare the variable as "volatile" and recompile the code. This will make the compiler turn off compiler optimization for that variable.
volatile int quantity = 0;

Just run "export COPTS='-g -O0';" and rebuild your code. After rebuild, debug it using gdb. You'll not see such error. Thanks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Benefit of using short instead of int in for... loop - c++

is there any benefit to using short instead of int in a for loop? i.e. for(short j = 0; j < 5; j++) { 99% of my loops involve numbers below 3000, so I was thinking ints would be a waste of bytes. Thanks!

No, there is no benefit. The short will probably end up taking a full register (which is 32 bits, an int) anyway. You will lose hours typing the extra two letters in the IDE, too. (That was a joke).

No. The loop variable will likely be allocated to a register, so it will end up taking up the same amount of space regardless.

Nope. Chances are your counter will end up in a register anyway, and they are typically at least the same size as int

I think there isn't much difference. Your compiler will probably use an entire 32-bit register for the counter variable (in 32-bit mode). You'll waste just two bytes from the stack, at most, in the worst case (when not used a register)

Related

Why can't my template function be inlined

gcc '-m32' option changes floating-point rounding when not running valgrind

SIGSEGV When accessing array element using assembly

Break executed when it shouldn't

What does <value optimized out> mean in gdb?

Categories

Resources