Does empty "else" cost any time - c++

Is there any difference in computational cost of
if(something){
return something;
}else{
return somethingElse;
}
and
if(something){
return something;
}
//else (put in comments for readibility purposes)
return somethingElse;
In theory we have command (else) but it doesn't seem it should make an actuall difference.
Edit:
After running code for different set sizes, I found that there actually is a differrence, code without else appears to be about 1.5% more efficient. But it most likely depends on compiler, as stated by many people below. Code I tested it on:
int withoutElse(bool a){
if(a)
return 0;
return 1;
}
int withElse(bool a){
if(a)
return 0;
else
return 1;
}
int main(){
using namespace std;
bool a=true;
clock_t begin,end;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withElse(a);
}
end = clock();
cout<<end-begin<<endl;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withoutElse(a);
}
end = clock();
cout<<end-begin<<endl;
return 0;
}
Checked on loops from 1 000 000 to 1 000 000 000, and results were consistently different
Edit 2:
Assembly code (once again, generated using Visual Studio 2010) also shows small difference (appareantly, I'm no good with assemblers :()
?withElse##YAH_N#Z PROC ; withElse, COMDAT
; Line 12
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 13
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN2#withElse
; Line 14
xor eax, eax
jmp SHORT $LN3#withElse
; Line 15
jmp SHORT $LN3#withElse
$LN2#withElse:
; Line 16
mov eax, 1
$LN3#withElse:
; Line 17
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withElse##YAH_N#Z ENDP ; withElse
and
?withoutElse##YAH_N#Z PROC ; withoutElse, COMDAT
; Line 4
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 5
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN1#withoutEls
; Line 6
xor eax, eax
jmp SHORT $LN2#withoutEls
$LN1#withoutEls:
; Line 7
mov eax, 1
$LN2#withoutEls:
; Line 9
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withoutElse##YAH_N#Z ENDP ; withoutElse

It is generically different, but the compiler may decide to execute the same jump in both cases (it will practically do this always).
The best way to see what a compiler does is reading the assembler. Assuming that you are using gcc you can try with
gcc -g -c -fverbose-asm myfile.c; objdump -d -M intel -S myfile.o > myfile.s
which creates a mix of assembler/c code and makes the job easier at the beginning.
As for your example it is:
CASE1
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
}else{
return 0;
2e: b8 00 00 00 00 mov eax,0x0
}
CASE2
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
return 0;
2e: b8 00 00 00 00 mov eax,0x0
As you could imagine there are no differences!

It won't compile if you type `return
Think that once the code gets compiled, all ifs, elses and loops are changed to goto's
If (cond) { code A } code B
turns to
if cond is false jump to code b
code A
code B
and
If (cond) { code A } else { code B } code C
turns to
if cond is false jump to code B
code A
ALWAYS jump to code C
code B
code C
Most processors 'guess' whether they're going to jump or not before checking if they actually jump. Depending on the processor, it might affect the performance to fail a guess.
So the answer is YES! (unless there's an ALWAYS jump at the end of first comparison) It will take 2-3 cycles to do the ALWAYS jump which isn't in the first if.

Related

Why is GCC subtracting 1 and comparing <= 2? Is cmp faster with powers of two in assembly?

I was writing some code to clear the screen to a particular color. C++ code:
void clear_screen(unsigned int color, void *memory, int height, int width) {
unsigned int *pixel = (unsigned int *)memory;
for (auto y = 0; y < height; y++)
for (auto x = 0; x < width; x++)
*pixel++ = color;
}
I used g++ and objconv to generate the corresponding assembly. This is what I got, and I've commented what I think some of the lines do too.
renderer_clear_screen:
push r13
push r12
push rbp
push rdi
push rsi
push rbx
mov r11d, ecx ; move the color into r11d
mov ebx, r8d ; move the height into ebx
mov rcx, rdx ; 000E _ 48: 89. D1st
test r8d, r8d ;
jle _cls_return ; basically, return if width or height is 0
test r9d, r9d ; ( window minimized )
jle _cls_return ;
mov r8d, r9d ; height = width
mov esi, r9d ; esi = width
mov edi, r9d ; edi = width
xor r10d, r10d ; r10d = 0
shr esi, 2 ; esi = width / 2
movd xmm1, r11d ; move the lower 32-bits of the color into xmm1
lea r12d, [r9-1] ; r12d = width - 1
shl rsi, 4 ; 003F _ 48: C1. E6, 04
mov ebp, r8d ; 0043 _ 44: 89. C5
shl rdi, 2 ; 0046 _ 48: C1. E7, 02
pshufd xmm0, xmm1, 0 ; 004A _ 66: 0F 70. C1, 00
shl rbp, 2 ; 004F _ 48: C1. E5, 02
ALIGN 8
?_001: cmp r12d, 2
jbe ?_006 ; if (width - 1 <= 2) { ?_006 }
mov rax, rcx ; 005E _ 48: 89. C8
lea rdx, [rcx+rsi] ; 0061 _ 48: 8D. 14 31
ALIGN 8
?_002: movups oword [rax], xmm0 ; 0068 _ 0F 11. 00
add rax, 16 ; 006B _ 48: 83. C0, 10
cmp rdx, rax ; 006F _ 48: 39. C2
jnz ?_002 ; 0072 _ 75, F4
lea rdx, [rcx+rbp] ; 0074 _ 48: 8D. 14 29
mov eax, r8d ; 0078 _ 44: 89. C0
cmp r9d, r8d ; 007B _ 45: 39. C1
jz ?_004 ; 007E _ 74, 1C
?_003: lea r13d, [rax+1H] ; 0080 _ 44: 8D. 68, 01
mov dword [rdx], r11d ; 0084 _ 44: 89. 1A
cmp r13d, r9d ; 0087 _ 45: 39. CD
jge ?_004 ; 008A _ 7D, 10
add eax, 2 ; 008C _ 83. C0, 02
mov dword [rdx+4H], r11d ; 008F _ 44: 89. 5A, 04
cmp r9d, eax ; 0093 _ 41: 39. C1
jle ?_004 ; 0096 _ 7E, 04
mov dword [rdx+8H], r11d ; 0098 _ 44: 89. 5A, 08
?_004: add r10d, 1 ; 009C _ 41: 83. C2, 01
add rcx, rdi ; 00A0 _ 48: 01. F9
cmp ebx, r10d ; 00A3 _ 44: 39. D3
jnz ?_001 ; 00A6 _ 75, B0
_cls_return:
pop rbx ;
pop rsi ;
pop rdi ;
pop rbp ;
pop r12 ;
pop r13 ; pop all the saved registers
ret ;
?_006: ; Local function
mov rdx, rcx ; 00B1 _ 48: 89. CA
xor eax, eax ; 00B4 _ 31. C0
jmp ?_003 ; 00B6 _ EB, C8
Now, in ?_001, the compiler compares width - 1 to 2, which is the same thing as comparing the width to 3. My question is, with -O3, why did the compiler choose two instead of three, and waste a lea (to move width - 1 into r12d).
The only thing which makes sense to me is that powers of two are somehow faster to compare. Or maybe it's a compiler quirk?
The usual reason for GCC tweaking compare constants is to create smaller immediates, which helps it fit in an immediate of whatever width. Understanding gcc output for if (a>=3) / GCC seems to prefer small immediate values in comparisons. Is there a way to avoid that? (It always does it, instead of checking whether it's actually useful with this constant on the target ISA.) This heuristic works well for most ISAs, but sometimes not for AArch64 or ARM Thumb which can encode some immediates as a bit-range / bit-pattern, so it's not always the case that a smaller-magnitude number is better.
The width-1 is not part of that. The -1 is part of a range check to skip the auto-vectorized loop (16 bytes at a time with movups) and go straight to the cleanup, 1..3 scalar stores.
It seems to be checking width >= 1 && width <= 3, i.e. cleanup needed but total size is less than a full vector width. It's not equivalent to signed or unsigned width <= 3 for width=0. Note the unsigned compare: 0 - 1 is above 2U, because -1U is UINT_MAX.
But it already excluded width <= 0 with test r9d, r9d / jle _cls_return, so it would have been better for GCC to just check width <= 3U instead of doing extra work to exclude zero from the range-check. (An lea, and save/restore of R12 which isn't otherwise used!)
(The cleanup could also looks over-complicated, e.g. using movq [rdx], xmm0 if more than 1 uint is needed, and some weird branching around for various cases. And even better, if the total size is >= 4 uints, just do another movups that ends at the end of the range, possibly overlapping with previous stores.)
Yes, this is a missed optimization, you can report it on https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc (now that you know it's a missed optimization; it's good that you asked here first instead of filing a bug without first figuring out if the instruction could be avoided.)
The only thing which makes sense to me is that powers of two are somehow faster to compare.
No, it's not faster; cmp performance is not data-dependent at all. (No integer instructions are, except sometimes [i]div. And on AMD CPUs before Zen3, pext / pdep. But anyway, not simple integer add/compare/shift stuff. See https://uops.info/).
And BTW, we can reproduce your GCC asm output on Godbolt by telling it this function is __attribute__((ms_abi)), or there's a command-line option to set the calling convention default. (It's really only useful for looking at the asm; it's still using GNU/Linux headers and x86-64 System V type widths like 64-bit long. Only a proper MinGW (cross-)compiler could show you what GCC would really do when targeting Windows.)
It's GAS .intel_syntax noprefix, which is MASM-like, not NASM, but the difference would only be obvious with addressing modes involving global variables.

MSVC++ inline assembly unhandled exception 0x80000004: Single step

I am writing a code using inline asm with VC++ 2019 32bit. I have written a function to switch coroutine.This is the source code :
I tested it and it works well. The argument is a uintptr_t array that contains the register value. This function will exchagne register value except ebx.
The problem is the "Unhandled exception at 0x5514704E (pevm.dll) in tool.exe: 0x80000004: Single step.".
Register value : EAX = 00000246 EBX = 0019F5A0 ECX = E2F13240 EDX = 0019F5A0 ESI = 0019F3A8 EDI = 0019F3C8 EIP = 5514704E ESP = 0019F2BC EBP = 0019F2C0 EFL = 00000202
I can not understand why "pop eax" throw exception ?
Maybe my code destroy some "internal data structure" and the program happened to stop here, like double free. Any suggestions to how to debug ?
inline __declspec(naked) void switchCoroutine(uintptr_t* vreg)
{
//discard ebx
__asm
{
push ebp
mov ebp, esp
//save
push eax
//argument
mov ebx, [ebp + 8]
//exchange eflags
pushfd
pop eax
push[ebx]
popfd
mov[ebx], eax
pop eax
//exchange eax ,ecx,edx,esi,edi
XCHG eax, [ebx + type int]
xchg ecx, [ebx + 3 * type int]
xchg edx, [ebx + 4 * type int]
xchg esi, [ebx + 5 * type int]
xchg edi, [ebx + 6 * type int]
//exchange ebp,esp
mov esp, ebp
pop ebp
xchg ebp, [ebx + 7 * type int]
xchg esp, [ebx + 8 * type int]
//go eip
ret
}
}
55147031 C2 04 00 ret 4
--- No source file -------------------------------------------------------------
55147034 CC int 3
55147035 CC int 3
55147036 CC int 3
55147037 CC int 3
55147038 CC int 3
55147039 CC int 3
5514703A CC int 3
5514703B CC int 3
5514703C CC int 3
5514703D CC int 3
5514703E CC int 3
5514703F CC int 3
--- D:\code\c++\PEVM\core\vm\vdata.h -------------------------------------------
643: //discard ebx
644: __asm
645: {
646: push ebp
55147040 55 push ebp
647: mov ebp, esp
55147041 8B EC mov ebp,esp
648: //save
649: push eax
55147043 50 push eax
650: //argument
651: mov ebx, [ebp + 8]
55147044 8B 5D 08 mov ebx,dword ptr [vreg]
652:
653: //exchange eflags
654: pushfd
55147047 9C pushfd
655: pop eax
55147048 58 pop eax
656: push[ebx]
55147049 FF 33 push dword ptr [ebx]
657: popfd
5514704B 9D popfd
658: mov[ebx], eax
5514704C 89 03 mov dword ptr [ebx],eax
659:
660: pop eax
5514704E 58 pop eax //HERE **Unhandled exception at 0x5514704E (pevm.dll) in tool.exe: 0x80000004: Single step.**
661: //exchange eax ,ecx,edx,esi,edi
662: XCHG eax, [ebx + type int]
5514704F 87 43 04 xchg eax,dword ptr [ebx+4]
663: xchg ecx, [ebx + 3 * type int]
55147052 87 4B 0C xchg ecx,dword ptr [ebx+0Ch]
664: xchg edx, [ebx + 4 * type int]
55147055 87 53 10 xchg edx,dword ptr [ebx+10h]
665: xchg esi, [ebx + 5 * type int]
55147058 87 73 14 xchg esi,dword ptr [ebx+14h]
666: xchg edi, [ebx + 6 * type int]
5514705B 87 7B 18 xchg edi,dword ptr [ebx+18h]
667:
668: //exchange ebp,esp
669: mov esp, ebp
5514705E 8B E5 mov esp,ebp
670: pop ebp
55147060 5D pop ebp
671: xchg ebp, [ebx + 7 * type int]
55147061 87 6B 1C xchg ebp,dword ptr [ebx+1Ch]
672: xchg esp, [ebx + 8 * type int]
55147064 87 63 20 xchg esp,dword ptr [ebx+20h]
673:
674: //go eip
675: ret
55147067 C3 ret
--- No source file -------------------------------------------------------------
55147068 CC int 3
55147069 CC int 3
5514706A CC int 3
5514706B CC int 3
5514706C CC int 3
5514706D CC int 3
5514706E CC int 3
5514706F CC int 3
At 0x5514704B you set EFLAGS. When it has TF flag set, a debug exception (#DB) will be generated by the CPU after next executed instruction. Next after popfd is mov[ebx], eax, thus the exception is generated after it's execution. Since #DB is a trap, eip points to address after the executed instruction, pop eax in your case.
Check if push[ebx] at 0x55147048 has TF bit set.

Is vftable[0] stores the first virtual function or RTTI Complete Object Locator?

It's known to us that C++ use a vftable to dynamicly decide which virtual function should be called. And I want to find out the mechanism behind it when we call virtual function. I have compiled the following code to assembly.
using namespace std;
class Animal {
int age;
public:
virtual void speak() {}
virtual void wash() {}
};
class Cat : public Animal {
public:
virtual void speak() {}
virtual void wash() {}
};
void main()
{
Animal* animal = new Cat;
animal->speak();
animal->wash();
}
The assembly code is massive. I don't quite understand the following part.
CONST SEGMENT
??_7Cat##6B# DD FLAT:??_R4Cat##6B# ; Cat::`vftable'
DD FLAT:?speak#Cat##UAEXXZ
DD FLAT:?wash#Cat##UAEXXZ
CONST ENDS
This part defines the vftable of Cat. But it have three entries. The first entry is RTTI Complete Object Locator. The second is Cat::speak. The third is Cat::wash. So I think vftable[0] should imply RTTI Complete Object Locator. But when I checking the assembly code in main PROC and Cat::Cat PROC, the invoke to animal->speak() is implemented by calling vftable[0], and the invoke to animal->wash() is implemented by calling vftable[4]. Why not vftable[4] and vftable[8]?
The assembly code of PROC main and Cat::Cat shows below.
_TEXT SEGMENT
tv75 = -12 ; size = 4
$T1 = -8 ; size = 4
_animal$ = -4 ; size = 4
_main PROC
; 23 : {
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
; 24 : Animal* animal = new Cat;
push 8
call ??2#YAPAXI#Z ; operator new
add esp, 4
mov DWORD PTR $T1[ebp], eax
cmp DWORD PTR $T1[ebp], 0
je SHORT $LN3#main
mov ecx, DWORD PTR $T1[ebp]
call ??0Cat##QAE#XZ
mov DWORD PTR tv75[ebp], eax
jmp SHORT $LN4#main
$LN3#main:
mov DWORD PTR tv75[ebp], 0
$LN4#main:
mov eax, DWORD PTR tv75[ebp]
mov DWORD PTR _animal$[ebp], eax
; 25 : animal->speak();
mov ecx, DWORD PTR _animal$[ebp]
mov edx, DWORD PTR [ecx]
mov ecx, DWORD PTR _animal$[ebp]
mov eax, DWORD PTR [edx]
call eax
; 26 : animal->wash();
mov ecx, DWORD PTR _animal$[ebp]
mov edx, DWORD PTR [ecx]
mov ecx, DWORD PTR _animal$[ebp]
mov eax, DWORD PTR [edx+4]
call eax
; 27 : }
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
; Function compile flags: /Odtp
; COMDAT ??0Cat##QAE#XZ
_TEXT SEGMENT
_this$ = -4 ; size = 4
??0Cat##QAE#XZ PROC ; Cat::Cat, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov ecx, DWORD PTR _this$[ebp]
call ??0Animal##QAE#XZ
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7Cat##6B#
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0Cat##QAE#XZ ENDP ; Cat::Cat
_TEXT ENDS
Supplement: MSVC Compiler x86 19.00.23026
The layout of vtables is implementation dependent. In your particular case, when compiling your example code, the Microsoft C++ compiler generates a vtable for Cat where the speak virtual function is at offset 0, and the wash function is at offset 4. The RTTI information is located before these functions at offset -4.
The problem here is that Microsoft's assembly output is lying. The generated assembly code puts the RTTI information at offset 0 and the speak and wash functions at offset 4 and 8. However this is not how it's actually laid out in the object file the compiler generates. Disassembling the object file reveals this layout:
.new_section .rdata, "dr2"
0000 00 00 00 00 .long ??_R4Cat##6B#
0004 ??_7Cat##6B#:
0004 00 00 00 00 .long ?speak#Cat##UAEXXZ
0008 00 00 00 00 .long ?wash#Cat##UAEXXZ
Unfortunately the assembly output of Microsoft's C/C++ compiler is meant only to be informational. It's not an accurate and complete representation of the actual code the compiler generates. In particular it can't be reliably assembled into a working object file.

C vs C++ Static Initialization of Objects

I have a question regarding initialization of fairly large sets of static data.
See my three examples below of initializing sets of static data. I'd like to understand the program load time & memory footprint implications of the methods shown below. I really don't know how to evaluate that on my own at the moment. My build environment is still on the desktop using Visual Studio, however the embedded targets will be compiled for VxWorks using GCC.
Traditionally, I've used basic C-structs for this sort of thing, although there's good reason to move this data into C++ classes moving forward. Dynamic memory allocation is frowned upon in the embedded application and is avoided wherever possible.
As far as I know, the only way to initialize a C++ class is through its constructor, shown below in Method 2. I am wondering how that compares to Method 1. Is there any appreciable additional overhead in terms of ROM (Program footprint), RAM (Memory Footprint), or program load time? It seems to me that the compiler may be able to optimize away the rather trivial constructor, but I'm not sure if that's common behavior.
I listed Method 3, as I've considered it, although it just seems like a bad idea. Is there something else that I'm missing here? Anyone else out there initialize data in a similar manner?
/* C-Style Struct Storage */
typedef struct{
int a;
int b;
}DATA_C;
/* CPP Style Class Storage */
class DATA_CPP{
public:
int a;
int b;
DATA_CPP(int,int);
};
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn;
b = bIn;
}
/* METHOD 1: Direct C-Style Static Initialization */
DATA_C MyCData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10}
};
/* METHOD 2: Direct CPP-Style Initialization */
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
DATA_CPP(3,4),
DATA_CPP(5,6),
DATA_CPP(7,8),
DATA_CPP(9,10),
};
/* METHOD 3: Cast C-Struct to CPP class */
DATA_CPP* pMyCppData2 = (DATA_CPP*) MyCData;
In C++11 , you can write this:
DATA_CPP obj = {1,2}; //Or simply : DATA_CPP obj {1,2}; i.e omit '='
instead of
DATA_CPP obj(1,2);
Extending this, you can write:
DATA_CPP MyCppData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10},
};
instead of this:
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
DATA_CPP(3,4),
DATA_CPP(5,6),
DATA_CPP(7,8),
DATA_CPP(9,10),
};
Read about uniform initialization.
I've done a bit of research in this, thought I'd post the results. I used Visual Studio 2008 in all cases here.
Here's the disassembly view of the code from Visual Studio in Debug Mode:
/* METHOD 1: Direct C-Style Static Initialization */
DATA_C MyCData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10},
};
/* METHOD 2: Direct CPP-Style Initialization */
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
010345C0 push ebp
010345C1 mov ebp,esp
010345C3 sub esp,0C0h
010345C9 push ebx
010345CA push esi
010345CB push edi
010345CC lea edi,[ebp-0C0h]
010345D2 mov ecx,30h
010345D7 mov eax,0CCCCCCCCh
010345DC rep stos dword ptr es:[edi]
010345DE push 2
010345E0 push 1
010345E2 mov ecx,offset MyCppData (1038184h)
010345E7 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(3,4),
010345EC push 4
010345EE push 3
010345F0 mov ecx,offset MyCppData+8 (103818Ch)
010345F5 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(5,6),
010345FA push 6
010345FC push 5
010345FE mov ecx,offset MyCppData+10h (1038194h)
01034603 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(7,8),
01034608 push 8
0103460A push 7
0103460C mov ecx,offset MyCppData+18h (103819Ch)
01034611 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(9,10),
01034616 push 0Ah
01034618 push 9
0103461A mov ecx,offset MyCppData+20h (10381A4h)
0103461F call DATA_CPP::DATA_CPP (103119Ah)
};
01034624 pop edi
01034625 pop esi
01034626 pop ebx
01034627 add esp,0C0h
0103462D cmp ebp,esp
0103462F call #ILT+325(__RTC_CheckEsp) (103114Ah)
01034634 mov esp,ebp
01034636 pop ebp
Interesting thing to note here is that there is definitely some overhead in program memory usage and load time, at least in non-optimized debug mode. Notice that Method 1 has zero assembly instructions, while method two has about 44 instructions.
I also ran compiled the program in Release mode with optimization enabled, here is the abridged assembly output:
?MyCData##3PAUDATA_C##A DD 01H ; MyCData
DD 02H
DD 03H
DD 04H
DD 05H
DD 06H
DD 07H
DD 08H
DD 09H
DD 0aH
?MyCppData##3PAVDATA_CPP##A DD 01H ; MyCppData
DD 02H
DD 03H
DD 04H
DD 05H
DD 06H
DD 07H
DD 08H
DD 09H
DD 0aH
END
Seems like the compiler indeed optimized away the calls to the C++ constructor. I could find no evidence of the constructor ever being called anywhere in the assembly code.
I thought I'd try something a bit more. I changed the constructor to:
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn + bIn;
b = bIn;
}
Again, the compiler optimized this away, resulting in a static dataset:
?MyCppData##3PAVDATA_CPP##A DD 03H ; MyCppData
DD 02H
DD 07H
DD 04H
DD 0bH
DD 06H
DD 0fH
DD 08H
DD 013H
DD 0aH
END
Interesting, the compiler was able to evaluate the constructor code on all the static data during compilation and create a static dataset, still not calling the constructor.
I thought I'd try something still a bit more, operate on a global variable in the constructor:
int globalvar;
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn + globalvar;
globalvar += a;
b = bIn;
}
And in this case, the compiler now generated assembly code to call the constructor during initialization:
??__EMyCppData##YAXXZ PROC ; `dynamic initializer for 'MyCppData'', COMDAT
; 35 : DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
00000 a1 00 00 00 00 mov eax, DWORD PTR ?globalvar##3HA ; globalvar
00005 8d 48 01 lea ecx, DWORD PTR [eax+1]
00008 03 c1 add eax, ecx
0000a 89 0d 00 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A, ecx
; 36 : DATA_CPP(3,4),
00010 8d 48 03 lea ecx, DWORD PTR [eax+3]
00013 03 c1 add eax, ecx
00015 89 0d 08 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+8, ecx
; 37 : DATA_CPP(5,6),
0001b 8d 48 05 lea ecx, DWORD PTR [eax+5]
0001e 03 c1 add eax, ecx
00020 89 0d 10 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+16, ecx
; 38 : DATA_CPP(7,8),
00026 8d 48 07 lea ecx, DWORD PTR [eax+7]
00029 03 c1 add eax, ecx
0002b 89 0d 18 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+24, ecx
; 39 : DATA_CPP(9,10),
00031 8d 48 09 lea ecx, DWORD PTR [eax+9]
00034 03 c1 add eax, ecx
00036 89 0d 20 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+32, ecx
0003c a3 00 00 00 00 mov DWORD PTR ?globalvar##3HA, eax ; globalvar
; 40 : };
00041 c3 ret 0
??__EMyCppData##YAXXZ ENDP ; `dynamic initializer for 'MyCppData''
FYI, I found this page helpful in setting up visual studio to output assembly:
How do I get the assembler output from a C file in VS2005

_asm swap and compiler additions

I've just written a bubble_sort of an integer array (see previous question) and decided to ignore the standard swap and implement an assembly swap, which looks like this:
int swap(int* x, int* y)
{
if(x != y)
{
_asm
{
mov eax,[x];
mov ebx, [y];
mov [y],eax;
mov [x], ebx;
}
}
return 0;
}
I was actually sure that it will be inserted into the resulting code as is and will work.
Well, my code which uses this swap does work, but I've looked into what the complier turned it into, and my swap was changed into this:
if(x != y)
00E01A6F inc ebp
00E01A70 or byte ptr [ebx],bh
00E01A72 inc ebp
00E01A73 or al,74h
if(x != y)
00E01A75 or al,8Bh
{
_asm
{
mov eax,[x];
00E01A77 inc ebp
00E01A78 or byte ptr [ebx+45890C5Dh],cl
mov [y],eax;
00E01A7E or al,89h
mov [x], ebx;
00E01A80 pop ebp
00E01A81 or byte ptr [ebx],dh
}
}
return 0;
00E01A83 rcr byte ptr [edi+5Eh],5Bh
}
I've compiled it in MS VS 2012.
What do all those extra lines mean, and why are they there? Why can't my _asm fragment just be used?
Can you tell us how you've compiled that function and how you got the disassembly?
When I compile using
cl /FAsc -c test.c
I get the following in the assembly listing for the inline assembler part:
; 4 : {
; 5 : _asm
0000a 53 push ebx
; 6 : {
; 7 : mov eax,[x];
0000b 8b 44 24 08 mov eax, DWORD PTR _x$[esp]
; 8 : mov ebx, [y];
0000f 8b 5c 24 0c mov ebx, DWORD PTR _y$[esp]
; 9 : mov [y],eax;
00013 89 44 24 0c mov DWORD PTR _y$[esp], eax
; 10 : mov [x], ebx;
00017 89 5c 24 08 mov DWORD PTR _x$[esp], ebx
; 4 : {
; 5 : _asm
0001b 5b pop ebx
$LN4#swap:
; 11 : }
One thing to note is that you aren't swapping what you'd really like to swap - your swapping the pointers that are passed to the function, not the items the pointers refer to. So when the function returns, the swapped data is thrown away. The function is just one big nop.
You might want to try somethign like:
_asm
{
mov eax,[x];
mov ebx,[y];
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
But frankly, performing the swap in C would likely result in similar (or better) code.
It's missing the first and last bytes. If you look at what the code is now:
inc ebp ; 45
or byte ptr [ebx],bh ; 08 3B
inc ebp ; 45
or al,74h ; 0C 74
or al,8Bh ; 0C 8B
inc ebp ; 45
or byte ptr [ebx+45890C5Dh],cl ; 08 8B 5D 0C 89 45
or al,89h ; 0C 89
pop ebp ; 5B
or byte ptr [ebx],dh ; 08 33
rcr byte ptr [edi+5Eh],5Bh ; C0 5F 5E 5B
If you ignore the first two bytes, you get this:
cmp eax, [ebp + 12] ; 3B 45 0C
jz skip ; 74 0C
mov eax, [ebx + 8] ; 8B 45 08
mov ebx, [ebp + 12] ; 8B 5D 0C
mov [ebp + 12], eax ; 89 45 0C
mov [ebx + 8], ebx ; 89 5B 08
skip:
xor eax, eax ; 33 C0
pop edi ; 5F
pop esi ; 5E
pop ebp ; 5B
It's missing the ret at the end, and, crucially, some instruction that has eax and [ebp + 8] as arguments (a mov would make sense there). The missing first byte desynchronized the disassembly with the instruction stream.
It's also missing the prologue, of course.
You need to push at the beginning and pop in the end if you want to see the end of main() :)
_asm
{
push eax //back-up of registers
push ebx
mov eax,[x];
mov ebx, [y];
mov [y],eax;
mov [x], ebx;
pop ebx //resume the registers where they were
pop eax // so, compiler can continue working normally
}
Because compiler uses them for other things too!
You could also have used xchg
mov eax,[x]
xchg eax, [y]
mov [x],eax
You 64 bit? Then there is a single read, single swap, single write. You can search it.
Good day!