Why can't my template function be inlined

Why can't my template function be inlined - c++

I wrote a class that can store lambda function in it to make sure all resources released before a function exits.
I test my code in MSVC2015, Release mode with /O2.
However, I find that GenerateScopeGuard cannot be inlined and a small function is generated.
int main()
{
01031C00 55 push ebp
01031C01 8B EC mov ebp,esp
01031C03 51 push ecx
auto func = GenerateScopeGuard([] {printf("hello\n"); });
01031C04 8D 4D FC lea ecx,[func]
01031C07 E8 24 00 00 00 call GenerateScopeGuard<<lambda_8b2f3596146f3fc3f8311b4d76487aed> > (01031C30h)
return 0;
01031C0C 80 7D FD 00 cmp byte ptr [ebp-3],0
01031C10 75 0D jne main+1Fh (01031C1Fh)
01031C12 68 78 61 03 01 push offset string "hello\n" (01036178h)
01031C17 E8 24 00 00 00 call printf (01031C40h)
01031C1C 83 C4 04 add esp,4
01031C1F 33 C0 xor eax,eax
}
01031C21 8B E5 mov esp,ebp
01031C23 5D pop ebp
01031C24 C3 ret
return ScopeGuard<T>(std::forward<T>(func));
01031C30 C6 41 01 00 mov byte ptr [ecx+1],0
01031C34 8B C1 mov eax,ecx
}
01031C36 C3 ret
Looks like exception handling related. Indeed if I disable C++ exception the function is inlined, but don't work with /EHsc. Why?
Here is my code.
template<typename T>
class ScopeGuard
{
public:
ScopeGuard(const ScopeGuard&) = delete;
ScopeGuard& operator=(const ScopeGuard&) = delete;
explicit ScopeGuard(T&& func) :
func_(std::forward<T>(func))
{}
ScopeGuard(ScopeGuard&& right) :
func_(std::move(right.func_))
{}
~ScopeGuard()
{
if (!dismissed_) func_();
}
void Dismiss()
{
dismissed_ = true;
}
private:
T func_;
bool dismissed_ = false;
};
template<typename T>
__forceinline ScopeGuard<T> GenerateScopeGuard(T&& func) noexcept
{
return ScopeGuard<T>(std::forward<T>(func));
}
int main()
{
auto func = GenerateScopeGuard([] {printf("hello\n"); });
return 0;
}

It seems MSVC compiler is getting confused by recently introduced non static member initialization feature. Constructor with initialization seems to fix the problem:
ScopeGuard()
: dismissed_(false)
{
}

Compiling with -O3 -std=c++14:
output from gcc 5.4:
.LC0:
.string "hello"
main:
subq $8, %rsp
movl $.LC0, %edi
call puts
xorl %eax, %eax
addq $8, %rsp
ret
output from clang 3.8:
main: # #main
pushq %rax
movl $.Lstr, %edi
callq puts
xorl %eax, %eax
popq %rcx
retq
.Lstr:
.asciz "hello"
This is without the nonstandard __forceinline attribute.
Are you sure you've enabled optimisation?
Is it that this function is generated but not called by main?

Related

How to find VPTR in C++ assembly code?

class Base {
public:
Base() {}
virtual void Get() { }
};
class Derivered : public Base {
public:
virtual void Get() { }
};
int main() {
Base* base = new Derivered();
base->Get();
return 0;
}
I use gcc 5.4.0 to compile the code, and use objdump -S a.out to disassemble binary file. I want to find Base's vptr, but only display an unknown address 0x80487d4. The max address number is 0x80487b7, I cann't understand.
command list: g++ test.cpp -O0; objdump -S a.out
080486fe <_ZN4BaseC1Ev>:
80486fe: 55 push %ebp
80486ff: 89 e5 mov %esp,%ebp
8048701: ba d4 87 04 08 mov $0x80487d4,%edx
8048706: 8b 45 08 mov 0x8(%ebp),%eax
8048709: 89 10 mov %edx,(%eax)

080486fe <_ZN4BaseC1Ev>:
80486fe: 55 push %ebp
80486ff: 89 e5 mov %esp,%ebp
8048701: ba d4 87 04 08 mov $0x80487d4,%edx
8048706: 8b 45 08 mov 0x8(%ebp),%eax
8048709: 89 10 mov %edx,(%eax)
Is...
push %ebp ;- save frame pointer
mov %esp, %ebp ;- mov esp-> ebp -ebp is frame pointer
mov $0x80487d4, %edx ; load vptr address into edx
mov 0x8(%ebp), %eax ; ld eax with address of this
mov %edx,(%eax) ; store vptr in this byte 0

Compile-time or Run-time evaluation of expression in constructor

I have the following class:
template<ItType I, LockType L>
class ArcItBase;
with a (one of them) constructor:
ArcItBase ( StableRootedDigraph& g_, Node const n_ ) noexcept :
srd ( g_ ),
arc ( I == ItType::in
? srd.nodes [ n_ ].head_in
: srd.nodes [ n_ ].head_out ) { }
The question is (which I don't see how to test) whether the value of the expression for the constructor of arc will be determined at compile-time or at run-time (Release, full optimization, clang-cl and VC14), given that I == ItType::in can be evaluated (is known, I is either ItType::in or ItType::out) at compile-time to either true or false?

It is not possible to have your code compiling without knowing the ItType at compile time.
The template parameter is evaluated at compile time and the conditional is a core constant expression, standard reference is C++11 5.19/2.
In the contrasting case the compiler would have to generate code that is equivalent to
arc(true ? : )
Which if you would actually write it would be optimized. However the rest of the conditional will not be optimized since you are accessing a what seems to be a non static member and cannot be evaluated as a core constant expression.
However, compilers may not always work as we expect so if you would actually want to test this you should dump the disassembled object file
objdump -DS file.o
and then you can better navigate the output.
Another option would be to launch the debugger and inspect the code.
Don't forget that you can always have your symbols even in case of optimizing, e.g.
g++ -O3 -g -c foo.cpp
Below you will find a toy implementation . In the first case values are given to the constructor of arcbase is called as:
arcbase<true> a(10,9);
Whereas in the second it is given non const random values that cannot be known at compile time.
After compiling with g++ --stc=c++11 -c -O3 -g the first case creates:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 89 75 f4 mov %esi,-0xc(%rbp)
f: 89 55 f0 mov %edx,-0x10(%rbp)
12: 48 8b 45 f8 mov -0x8(%rbp),%rax
16: 8b 55 f0 mov -0x10(%rbp),%edx
19: 8b 4d f4 mov -0xc(%rbp),%ecx
1c: 89 ce mov %ecx,%esi
1e: 48 89 c7 mov %rax,%rdi
21: e8 00 00 00 00 callq 26 <arcbase<true>::arcbase(int, int)+0x26>
26: 48 8b 45 f8 mov -0x8(%rbp),%rax
2a: 8b 00 mov (%rax),%eax
2c: 48 8b 55 f8 mov -0x8(%rbp),%rdx
30: 48 83 c2 08 add $0x8,%rdx
34: 89 c6 mov %eax,%esi
36: 48 89 d7 mov %rdx,%rdi
39: e8 00 00 00 00 callq 3e <arcbase<true>::arcbase(int, int)+0x3e>
3e: c9 leaveq
3f: c3 retq
Whereas the second case:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 53 push %rbx
1: 48 89 fb mov %rdi,%rbx
4: e8 00 00 00 00 callq 9 <arcbase<true>::arcbase(int, int)+0x9>
9: 48 8d 7b 08 lea 0x8(%rbx),%rdi
d: 8b 33 mov (%rbx),%esi
f: 5b pop %rbx
10: e9 00 00 00 00 jmpq 15 <arcbase<true>::arcbase(int, int)+0x15>
Looking at the dissasembly you should notice that even in the first case the value of 10 is not directly passed as is to the constructor, but instead only placed in the register from where is is retrieved.
Here is the output from gdb :
0x400910 <_ZN3arcC2Ei> mov %esi,(%rdi)
0x400912 <_ZN3arcC2Ei+2> retq
0x400913 nop
0x400914 nop
0x400915 nop
0x400916 nop
0x400917 nop
0x400918 nop
0x400919 nop
0x40091a nop
0x40091b nop
0x40091c nop
0x40091d nop
0x40091e nop
0x40091f nop
0x400920 <_ZN7arcbaseILb1EEC2Eii> push %rbx
0x400921 <_ZN7arcbaseILb1EEC2Eii+1> mov %rdi,%rbx
0x400924 <_ZN7arcbaseILb1EEC2Eii+4> callq 0x400900 <_ZN3srdC2Eii>
0x400929 <_ZN7arcbaseILb1EEC2Eii+9> lea 0x8(%rbx),%rdi
0x40092d <_ZN7arcbaseILb1EEC2Eii+13> mov (%rbx),%esi
0x40092f <_ZN7arcbaseILb1EEC2Eii+15> pop %rbx
0x400930 <_ZN7arcbaseILb1EEC2Eii+16> jmpq 0x400910 <_ZN3arcC2Ei>
The code for the second case is :
struct llist
{
int head_in;
int head_out;
llist(int a , int b ) : head_in(a), head_out(b) {}
};
struct srd
{
llist nodes;
srd(int a, int b) : nodes(a,b) {}
};
struct arc
{
int y;
arc( int x):y(x) {}
};
template< bool I > class arcbase
{
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
void print()
{
std::cout << iarc.y << std::endl;
}
};
int main(void)
{
std::srand(time(0));
volatile int a_ = std::rand()%100;
volatile int b_ = std::rand()%4;
arcbase<true> a(a_,b_);
a.print();
return 0;
}

Pros and Cons...Constants as a value or as a reference? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
What are the pros and cons of both approaches:
Case#1: const int& x = 5;
vs
Case#2: const int x = 5;
I know that in Case#1, x refers to an location in the executable's text, and I believe so does x in Case#2.
Why would you use one and not the other?

Using gcc explorer:
Case 1 (-O3):
main: # #main
xorl %eax, %eax
ret
global_constant:
.quad reference temporary for global_constant
reference temporary for global_constant:
.long 2048 # 0x800
Case 2 (-O3):
main: # #main
xorl %eax, %eax
ret
Using Mingw 4.8.1 g++ Windows 8 (it is the same result just looks more messy)..
Case 1 (-O3):
.file "TestConstantsAsm.cpp"
.def __main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.p2align 4,,15
.def _GLOBAL__sub_I_global_constant; .scl 3; .type 32; .endef
.seh_proc _GLOBAL__sub_I_global_constant
_GLOBAL__sub_I_global_constant:
.seh_endprologue
leaq _ZGR15global_constant0(%rip), %rax
movq %rax, global_constant(%rip)
ret
.seh_endproc
.section .ctors,"w"
.align 8
.quad _GLOBAL__sub_I_global_constant
.globl global_constant
.bss
.align 16
global_constant:
.space 8
.data
.align 4
_ZGR15global_constant0:
.long 2048
.ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Case 2 (-O3):
.file "TestConstantsAsm.cpp"
.def __main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
subq $40, %rsp
.seh_stackalloc 40
.seh_endprologue
call __main
xorl %eax, %eax
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Without (-O3), the 2nd case is still better than the first case. It always is..
The commands I used were:
g++ -S -O3 TestConstantsAsm.cpp
g++ -S -O2 TestConstantsAsm.cpp
g++ -S -O TestConstantsAsm.cpp
and the program was:
const int& global_constant = 2048;
int main()
{
}
and
const int global_constant = 2048;
int main()
{
}
Even if you std::cout<<global_constant<<"\n"; within main (in both cases), the assembly for case 2 is still cleaner than case 1.

Using Visual C++ 2013, compiled in debug mode.
Case 1
The following code:
const int& VALUE = 2048;
int main() {
int A = VALUE;
return 0;
}
...produces this assembly output:
1: const int& VALUE = 2048;
013F1380 55 push ebp
013F1381 8B EC mov ebp,esp
013F1383 81 EC C0 00 00 00 sub esp,0C0h
013F1389 53 push ebx
013F138A 56 push esi
013F138B 57 push edi
013F138C 8D BD 40 FF FF FF lea edi,[ebp-0C0h]
013F1392 B9 30 00 00 00 mov ecx,30h
013F1397 B8 CC CC CC CC mov eax,0CCCCCCCCh
013F139C F3 AB rep stos dword ptr es:[edi]
013F139E C7 05 34 91 3F 01 00 08 00 00 mov dword ptr ds:[13F9134h],800h
013F13A8 C7 05 30 91 3F 01 34 91 3F 01 mov dword ptr ds:[13F9130h],13F9134h
013F13B2 5F pop edi
013F13B3 5E pop esi
013F13B4 5B pop ebx
013F13B5 8B E5 mov esp,ebp
013F13B7 5D pop ebp
013F13B8 C3 ret
(...)
2:
3: int main() {
00C42280 55 push ebp
00C42281 8B EC mov ebp,esp
00C42283 81 EC CC 00 00 00 sub esp,0CCh
00C42289 53 push ebx
00C4228A 56 push esi
00C4228B 57 push edi
00C4228C 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
00C42292 B9 33 00 00 00 mov ecx,33h
00C42297 B8 CC CC CC CC mov eax,0CCCCCCCCh
00C4229C F3 AB rep stos dword ptr es:[edi]
4: int A = VALUE;
00C4229E A1 30 91 C4 00 mov eax,dword ptr ds:[00C49130h]
00C422A3 8B 08 mov ecx,dword ptr [eax]
00C422A5 89 4D F8 mov dword ptr [A],ecx
5: return 0;
00C422A8 33 C0 xor eax,eax
6: }
00C422AA 5F pop edi
6: }
00C422AB 5E pop esi
00C422AC 5B pop ebx
00C422AD 8B E5 mov esp,ebp
00C422AF 5D pop ebp
00C422B0 C3 ret
In case 1, the value 2048 (0x800) is stored in memory, which is accessed when the constant is referenced in the main code.
Case 2
The following code:
const int VALUE = 2048;
int main() {
int A = VALUE;
return 0;
}
...produces this assembly output:
1: const int VALUE = 2048;
2:
3: int main() {
00F02280 55 push ebp
00F02281 8B EC mov ebp,esp
00F02283 81 EC CC 00 00 00 sub esp,0CCh
00F02289 53 push ebx
00F0228A 56 push esi
00F0228B 57 push edi
00F0228C 8D BD 34 FF FF FF lea edi,[ebp-0CCh]
00F02292 B9 33 00 00 00 mov ecx,33h
00F02297 B8 CC CC CC CC mov eax,0CCCCCCCCh
00F0229C F3 AB rep stos dword ptr es:[edi]
4: int A = VALUE;
00F0229E C7 45 F8 00 08 00 00 mov dword ptr [A],800h
5: return 0;
00F022A5 33 C0 xor eax,eax
6: }
00F022A7 5F pop edi
00F022A8 5E pop esi
00F022A9 5B pop ebx
00F022AA 8B E5 mov esp,ebp
00F022AC 5D pop ebp
00F022AD C3 ret
In case 2, no code is generated for the global constant declaration itself. When the constant is used, it is simply a "move immediate" instruction which inserts the value 2048 (0x800) directly into the variable. If the constant is not used, it will not appear in the assembly at all.

Using a C++ reference in inline assembly with GCC

I have a spin lock with the xchg instruction. The C++ function takes in the resource to be locked.
Following is the code
void SpinLock::lock( u32& resource )
{
__asm__ __volatile__
(
"mov ebx, %0\n\t"
"InUseLoop:\n\t"
"mov eax, 0x01\n\t" /* 1=In Use*/
"xchg eax, [ebx]\n\t"
"cmp eax, 0x01\n\t"
"je InUseLoop\n\t"
:"=r"(resource)
:"r"(resource)
:"eax","ebx"
);
}
void SpinLock::unlock(u32& resource )
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"mov ebx, %0\n\t"
"mov DWORD PTR [ebx], 0x00\n\t"
:"=r"(resource)
:"r"(resource)
: "ebx"
);
}
This code is compiled with gcc 4.5.2 -masm=intel on a 64 bit intel machine.
The objdump produces following assembly for the above functions .
0000000000490968 <_ZN8SpinLock4lockERj>:
490968: 55 push %rbp
490969: 48 89 e5 mov %rsp,%rbp
49096c: 53 push %rbx
49096d: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490971: 48 8b 45 f0 mov -0x10(%rbp),%rax
490975: 8b 10 mov (%rax),%edx
490977: 89 d3 mov %edx,%ebx
0000000000490979 <InUseLoop>:
490979: b8 01 00 00 00 mov $0x1,%eax
49097e: 67 87 03 addr32 xchg %eax,(%ebx)
490981: 83 f8 01 cmp $0x1,%eax
490984: 74 f3 je 490979 <InUseLoop>
490986: 48 8b 45 f0 mov -0x10(%rbp),%rax
49098a: 89 10 mov %edx,(%rax)
49098c: 5b pop %rbx
49098d: c9 leaveq
49098e: c3 retq
49098f: 90 nop
0000000000490990 <_ZN8SpinLock6unlockERj>:
490990: 55 push %rbp
490991: 48 89 e5 mov %rsp,%rbp
490994: 53 push %rbx
490995: 48 89 7d f0 mov %rdi,-0x10(%rbp)
490999: 48 8b 45 f0 mov -0x10(%rbp),%rax
49099d: 8b 00 mov (%rax),%eax
49099f: 89 d3 mov %edx,%ebx
4909a1: 67 c7 03 00 00 00 00 addr32 movl $0x0,(%ebx)
4909a8: 48 8b 45 f0 mov -0x10(%rbp),%rax
4909ac: 89 10 mov %edx,(%rax)
4909ae: 5b pop %rbx
4909af: c9 leaveq
4909b0: c3 retq
4909b1: 90 nop
The code dumps core when executing the locking operation.
Is there something grossly wrong here ?
Regards,
-J

First, why are you using truncated 32-bit addresses in your assembly code whereas the rest of the program is compiled to execute in 64-bit mode and operate with 64-bit addresses/pointers? I'm referring to ebx. Why is it not rbx?
Second, why are you trying to return a value from the assembly code with "=r"(resource)? Your functions change the in-memory value with xchg eax, [ebx] and mov DWORD PTR [ebx], 0x00 and return void. Remove "=r"(resource).
Lastly, if you look closely at the disassembly of SpinLock::lock(), can't you see something odd about ebx?:
mov %rdi,-0x10(%rbp)
mov -0x10(%rbp),%rax
mov (%rax),%edx
mov %edx,%ebx
<InUseLoop>:
mov $0x1,%eax
addr32 xchg %eax,(%ebx)
In this code, the ebx value, which is an address/pointer, does not come directly from the function's parameter (rdi), the parameter first gets dereferenced with mov (%rax),%edx, but why? If you throw away all the confusing C++ reference stuff, technically, the function receives a pointer to u32, not a pointer to a pointer to u32, and thus needs no extra dereference anywhere.
The problem is here: "r"(resource). It must be "r"(&resource).
A small 32-bit test app demonstrates this problem:
#include <iostream>
using namespace std;
void unlock1(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(resource)
:"ebx"
);
}
void unlock2(unsigned& resource)
{
__asm__ __volatile__
(
/* "mov DWORD PTR ds:[%0],0x00\n\t" */
"movl %0, %%ebx\n\t"
"movl $0, (%%ebx)\n\t"
:
:"r"(&resource)
:"ebx"
);
}
unsigned blah;
int main(void)
{
blah = 3456789012u;
cout << "before unlock2() blah=" << blah << endl;
unlock2(blah);
cout << "after unlock2() blah=" << blah << endl;
blah = 3456789012u;
cout << "before unlock1() blah=" << blah << endl;
unlock1(blah); // may crash here, but if it doesn't, it won't change blah
cout << "after unlock1() blah=" << blah << endl;
return 0;
}
Output:
before unlock2() blah=3456789012
after unlock2() blah=0
before unlock1() blah=3456789012
Exiting due to signal SIGSEGV
General Protection Fault at eip=000015eb
eax=ce0a6a14 ...

Benefit of using short instead of int in for... loop

is there any benefit to using short instead of int in a for loop?
i.e.
for(short j = 0; j < 5; j++) {
99% of my loops involve numbers below 3000, so I was thinking ints would be a waste of bytes. Thanks!

No, there is no benefit. The short will probably end up taking a full register (which is 32 bits, an int) anyway.
You will lose hours typing the extra two letters in the IDE, too. (That was a joke).

No. The loop variable will likely be allocated to a register, so it will end up taking up the same amount of space regardless.

Look at the generated assembler code and you would probably see that using int generates cleaner code.
c-code:
#include <stdio.h>
int main(void) {
int j;
for(j = 0; j < 5; j++) {
printf("%d", j);
}
}
using short:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: 66 c7 44 24 1e 00 00 movw $0x0,0x1e(%esp)
80483d4: eb 1c jmp 80483f2 <main+0x2e>
80483d6: 0f bf 54 24 1e movswl 0x1e(%esp),%edx
80483db: b8 c0 84 04 08 mov $0x80484c0,%eax
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 66 83 44 24 1e 01 addw $0x1,0x1e(%esp)
80483f2: 66 83 7c 24 1e 04 cmpw $0x4,0x1e(%esp)
80483f8: 7e dc jle 80483d6 <main+0x12>
80483fa: c9 leave
80483fb: c3 ret
using int:
080483c4 <main>:
80483c4: 55 push %ebp
80483c5: 89 e5 mov %esp,%ebp
80483c7: 83 e4 f0 and $0xfffffff0,%esp
80483ca: 83 ec 20 sub $0x20,%esp
80483cd: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%esp)
80483d4: 00
80483d5: eb 1a jmp 80483f1 <main+0x2d>
80483d7: b8 c0 84 04 08 mov $0x80484c0,%eax
80483dc: 8b 54 24 1c mov 0x1c(%esp),%edx
80483e0: 89 54 24 04 mov %edx,0x4(%esp)
80483e4: 89 04 24 mov %eax,(%esp)
80483e7: e8 08 ff ff ff call 80482f4 <printf#plt>
80483ec: 83 44 24 1c 01 addl $0x1,0x1c(%esp)
80483f1: 83 7c 24 1c 04 cmpl $0x4,0x1c(%esp)
80483f6: 7e df jle 80483d7 <main+0x13>
80483f8: c9 leave
80483f9: c3 ret

More often than not, trying to optimize for this will just exacerbate bugs when someone doesn't notice (or forgets) that it's a narrow data type. For instance, check out this bcrypt problem I looked into...pretty typical:
BCrypt says long, similar passwords are equivalent - problem with me, the gem, or the field of cryptography?
Yet the problem is still there as int is a finite size as well. Better to spend your time making sure your program is correct and not creating hazards or security problems from numeric underflows and overflows.
Some of what I talk about w/numeric_limits here might be informative or interesting, if you haven't encountered that yet:
http://hostilefork.com/2009/03/31/modern_cpp_or_modern_art/

Nope. Chances are your counter will end up in a register anyway, and they are typically at least the same size as int

I think there isn't much difference. Your compiler will probably use an entire 32-bit register for the counter variable (in 32-bit mode). You'll waste just two bytes from the stack, at most, in the worst case (when not used a register)

One potential improvement over int as loop counter is unsigned int (or std::size_t where applicable) if the loop index is never going to be negative. Using short instead of int makes no difference in most compilers, here's the ones I have.
Code:
volatile int n;
int main()
{
for(short j = 0; j < 50; j++) // replaced with int in test2
n = j;
}
g++ 4.5.2 -march=native -O3 on x86_64 linux
// using short j // using int j
.L2: .L2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .L2 jne .L2
clang++ 2.9 -march=native -O3 on x86_64 linux
// using short j // using int j
.LBB0_1: .LBB0_1:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %eax incl %eax
cmpl $50, %eax cmpl $50, %eax
jne .LBB0_1 jne .LBB0_1
Intel C++ 11.1 -fast on x86_64 linux
// using short j // using int j
..B1.2: ..B1.2:
movl %eax, n(%rip) movl %eax, n(%rip)
incl %edx incl %eax
movswq %dx, %rax cmpl $50, %eax
cmpl $50, %eax jl ..B1.2
jl ..B1.2
Sun C++ 5.8 -xO5 on sparc
// using short j // using int j
.L900000105: .L900000105:
st %o4,[%o5+%lo(n)] st %o4,[%o5+%lo(n)]
add %o4,1,%o4 add %o4,1,%o4
cmp %o4,49 cmp %o4,49
ble,pt %icc,.L900000105 ble,pt %icc,.L900000105
So of the four compilers I have, only one even had any difference in the result, and, it actually used less bytes in case of int.

As most others have said, computationally there is no advantage and might be worse. However, if the loop variable is used in a computation requiring a short, then it might be justified:
for(short j = 0; j < 5; j++)
{
// void myfunc(short arg1);
myfunc(j);
}
All this really does is prevent a warning message as the value passed would be promoted to an int (depending on compiler, platform, and C++ dialect). But it looks cleaner, IMHO.
Certainly not worth obsessing over. If you are looking to optimize, remember the rules (forget who came up with these):
Don't
Failing Step 1, first Measure
Make a change
If bored, exit, else go to Step 2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why can't my template function be inlined - c++

It seems MSVC compiler is getting confused by recently introduced non static member initialization feature. Constructor with initialization seems to fix the problem: ScopeGuard() : dismissed_(false) { }

Related

How to find VPTR in C++ assembly code?

Compile-time or Run-time evaluation of expression in constructor

Pros and Cons...Constants as a value or as a reference? [closed]

Using a C++ reference in inline assembly with GCC

Benefit of using short instead of int in for... loop

Categories

Resources