Returning Strings from DLL Functions - c++

For some reason, returning a string from a DLL function crashes my program on runtime with the error Unhandled exception at 0x775dfbae in Cranberry Library Tester.exe: Microsoft C++ exception: std::out_of_range at memory location 0x001ef604...
I have verified it's not a problem with the function itself by compiling the DLL code as an .exe and doing a few simple tests in the main function.
Functions with other return types (int, double, etc.) work perfectly.
Why does this happen?
Is there a way to work around this behavior?
Source code for DLL:
// Library.h
#include <string>
std::string GetGreeting();
.
// Library.cpp
#include "Library.h"
std::string GetGreeting()
{
return "Hello, world!";
}
Source code for tester:
// Tester.cpp
#include <iostream>
#include <Library.h>
int main()
{
std::cout << GetGreeting()
}
EDIT: I'm using VS2010.
Conclusion
A workaround is to make sure the library and source are compiled using the same compiler with the same options, etc.

This occurs because you're allocating memory in one DLL (using std::string's constructor), and freeing it in another DLL. You can't do that because each DLL typically sets up it's own heap.

Since your error message indicates you're using Microsoft C++ I'll offer an MS specific answer.
As long as you compile both the EXE and the DLL with the SAME compiler, and both link the the SAME version of the runtime DYNAMICALLY then you'll be just fine. For example, using "Multi-threaded DLL" for both.
If you link against the runtime statically, or link against different versions of the runtime then you're SOL for the reasons #Billy ONeal points out (memory will be allocated in one heap and freed in another).

You may return a std::string from a class as long as you inline the function that creates the string. This is done, for example, by the Qt toolkit in their QString::toStdString method:
inline std::string QString::toStdString() const
{ const QByteArray asc = toAscii(); return std::string(asc.constData(), asc.length()); }
The inline function is compiled with the program that uses the dll, not when the dll is compiled. This avoids any problems encountered by incompatible runtimes or compiler switches.
So in fact, you should only return standard types from your dll (such as const char * and int above) and add inline functions to convert them to std::string().
Passing in a string & parameter may also be unsafe because they are often implemented as copy on write.

Update:
I have compiled your sample in both VS2008 and VS2010 and I was able to successfully compile and execute without a problem. I compiled the library both as a static and a dynamic library.
Original:
The following relates to my discussion with bdk and imaginaryboy. I did not delete it as it might be of some interest to someone.
Ok this question really bothers me because it looks like its being passed by value not by reference. There does not appear to be any objects created in the heap it looks to be entirely stack based.
I did a quick test to check how objects are passed in Visual Studios (compiled in release mode without link time optimization and optimization disabled).
The Code:
class Foo {
int i, j;
public:
Foo() {}
Foo(int i, int j) : i(i), j(j) { }
};
Foo builder(int i, int j)
{
Foo built(i, j);
return built;
}
int main()
{
int i = sizeof(Foo);
int j = sizeof(int);
Foo buildMe;
buildMe = builder(i, j);
//std::string test = GetGreeting();
//std::cout << test;
return 0;
}
The Disassembly:
int main()
{
00AD1030 push ebp
00AD1031 mov ebp,esp
00AD1033 sub esp,18h
int i = sizeof(Foo);
00AD1036 mov dword ptr [i],8
int j = sizeof(int);
00AD103D mov dword ptr [j],4
Foo buildMe;
buildMe = builder(i, j);
00AD1044 mov eax,dword ptr [j] ;param j
00AD1047 push eax
00AD1048 mov ecx,dword ptr [i] ;param i
00AD104B push ecx
00AD104C lea edx,[ebp-18h] ;buildMe
00AD104F push edx
00AD1050 call builder (0AD1000h)
00AD1055 add esp,0Ch
00AD1058 mov ecx,dword ptr [eax] ;builder i
00AD105A mov edx,dword ptr [eax+4] ;builder j
00AD105D mov dword ptr [buildMe],ecx
00AD1060 mov dword ptr [ebp-8],edx
return 0;
00AD1063 xor eax,eax
}
00AD1065 mov esp,ebp
00AD1067 pop ebp
00AD1068 ret
Foo builder(int i, int j)
{
01041000 push ebp
01041001 mov ebp,esp
01041003 sub esp,8
Foo built(i, j);
01041006 mov eax,dword ptr [i]
01041009 mov dword ptr [built],eax ;ebp-8 built i
0104100C mov ecx,dword ptr [j]
0104100F mov dword ptr [ebp-4],ecx ;ebp-4 built j
return built;
01041012 mov edx,dword ptr [ebp+8] ;buildMe
01041015 mov eax,dword ptr [built]
01041018 mov dword ptr [edx],eax ;buildMe (i)
0104101A mov ecx,dword ptr [ebp-4]
0104101D mov dword ptr [edx+4],ecx ;buildMe (j)
01041020 mov eax,dword ptr [ebp+8]
}
The Stack:
0x003DF964 08 00 00 00 ....
0x003DF968 04 00 00 00 ....
0x003DF96C 98 f9 3d 00 ˜ù=.
0x003DF970 55 10 ad 00 U.­.
0x003DF974 80 f9 3d 00 €ù=.
0x003DF978 08 00 00 00 .... ;builder i param
0x003DF97C 04 00 00 00 .... ;builder j param
0x003DF980 08 00 00 00 .... ;builder return j
0x003DF984 04 00 00 00 .... ;builder return i
0x003DF988 04 00 00 00 .... ;j
0x003DF98C 08 00 00 00 .... ;buildMe i param
0x003DF990 04 00 00 00 .... ;buildMe j param
0x003DF994 08 00 00 00 .... ;i
0x003DF998 dc f9 3d 00 Üù=. ;esp
Why it applies:
Even if the code is in a seperate DLL the string that is returned is copied by value into the callers stack. There is a hidden param that passes the object to GetGreetings(). I do not see any heap getting created. I do not see the heap having anything to do with the problem.
int main()
{
01021020 push ebp
01021021 mov ebp,esp
01021023 push 0FFFFFFFFh
01021025 push offset __ehhandler$_main (10218A9h)
0102102A mov eax,dword ptr fs:[00000000h]
01021030 push eax
01021031 sub esp,24h
01021034 mov eax,dword ptr [___security_cookie (1023004h)]
01021039 xor eax,ebp
0102103B mov dword ptr [ebp-10h],eax
0102103E push eax
0102103F lea eax,[ebp-0Ch]
01021042 mov dword ptr fs:[00000000h],eax
std::string test = GetGreeting();
01021048 lea eax,[ebp-2Ch] ;mov test string to eax
0102104B push eax
0102104C call GetGreeting (1021000h)
01021051 add esp,4
01021054 mov dword ptr [ebp-4],0
std::cout << test;
0102105B lea ecx,[ebp-2Ch]
0102105E push ecx
0102105F mov edx,dword ptr [__imp_std::cout (1022038h)]
01021065 push edx
01021066 call dword ptr [__imp_std::operator<<<char,std::char_traits<char>,std::allocator<char> > (102203Ch)]
0102106C add esp,8
return 0;
0102106F mov dword ptr [ebp-30h],0
01021076 mov dword ptr [ebp-4],0FFFFFFFFh
0102107D lea ecx,[ebp-2Ch]
01021080 call dword ptr [__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char> >::~basic_string<char,std::char_traits<char>,std::allocator<char> > (1022044h)]
01021086 mov eax,dword ptr [ebp-30h]
}
std::string GetGreeting()
{
01021000 push ecx ;ret + 4
01021001 push esi ;ret + 4 + 4 = 0C
01021002 mov esi,dword ptr [esp+0Ch] ; this is test string
std::string greet("Hello, world!");
01021006 push offset string "Hello, world!" (1022124h)
0102100B mov ecx,esi
0102100D mov dword ptr [esp+8],0
01021015 call dword ptr [__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char> >::basic_string<char,std::char_traits<char>,std::allocator<char> > (1022040h)]
return greet;
0102101B mov eax,esi ;put back test
0102101D pop esi
//return "Hello, world!";
}

I had such problem that has been solved by:
using runtime library as DLL for both the application and the DLL (so that allocation is handle in a unique place)
making sure all project settings (compiler, linker) are the same on both project

Just like snmacdonald I was unable to repro your crash, under normal circumstances. Others already mentioned:
Configuration Properties -> Code Generation -> Runtime Library must be exactly the same
If I change one of them I get your crash. I have both the DLL and EXE set to Multi-Threaded DLL.

Just like user295190 and Adam said that it would work fine if with the exactly same compilation settings.
For example, in Qt QString::toStdString() would return a std::string and you could use it in your EXE from QtCore.dll.
It would crash if DLL and EXE had different compiling settings and linking settings. For example, DLL linked to MD and EXE linked to MT CRT library.

Related

Is vftable[0] stores the first virtual function or RTTI Complete Object Locator?

It's known to us that C++ use a vftable to dynamicly decide which virtual function should be called. And I want to find out the mechanism behind it when we call virtual function. I have compiled the following code to assembly.
using namespace std;
class Animal {
int age;
public:
virtual void speak() {}
virtual void wash() {}
};
class Cat : public Animal {
public:
virtual void speak() {}
virtual void wash() {}
};
void main()
{
Animal* animal = new Cat;
animal->speak();
animal->wash();
}
The assembly code is massive. I don't quite understand the following part.
CONST SEGMENT
??_7Cat##6B# DD FLAT:??_R4Cat##6B# ; Cat::`vftable'
DD FLAT:?speak#Cat##UAEXXZ
DD FLAT:?wash#Cat##UAEXXZ
CONST ENDS
This part defines the vftable of Cat. But it have three entries. The first entry is RTTI Complete Object Locator. The second is Cat::speak. The third is Cat::wash. So I think vftable[0] should imply RTTI Complete Object Locator. But when I checking the assembly code in main PROC and Cat::Cat PROC, the invoke to animal->speak() is implemented by calling vftable[0], and the invoke to animal->wash() is implemented by calling vftable[4]. Why not vftable[4] and vftable[8]?
The assembly code of PROC main and Cat::Cat shows below.
_TEXT SEGMENT
tv75 = -12 ; size = 4
$T1 = -8 ; size = 4
_animal$ = -4 ; size = 4
_main PROC
; 23 : {
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
; 24 : Animal* animal = new Cat;
push 8
call ??2#YAPAXI#Z ; operator new
add esp, 4
mov DWORD PTR $T1[ebp], eax
cmp DWORD PTR $T1[ebp], 0
je SHORT $LN3#main
mov ecx, DWORD PTR $T1[ebp]
call ??0Cat##QAE#XZ
mov DWORD PTR tv75[ebp], eax
jmp SHORT $LN4#main
$LN3#main:
mov DWORD PTR tv75[ebp], 0
$LN4#main:
mov eax, DWORD PTR tv75[ebp]
mov DWORD PTR _animal$[ebp], eax
; 25 : animal->speak();
mov ecx, DWORD PTR _animal$[ebp]
mov edx, DWORD PTR [ecx]
mov ecx, DWORD PTR _animal$[ebp]
mov eax, DWORD PTR [edx]
call eax
; 26 : animal->wash();
mov ecx, DWORD PTR _animal$[ebp]
mov edx, DWORD PTR [ecx]
mov ecx, DWORD PTR _animal$[ebp]
mov eax, DWORD PTR [edx+4]
call eax
; 27 : }
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
_TEXT ENDS
; Function compile flags: /Odtp
; COMDAT ??0Cat##QAE#XZ
_TEXT SEGMENT
_this$ = -4 ; size = 4
??0Cat##QAE#XZ PROC ; Cat::Cat, COMDAT
; _this$ = ecx
push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx
mov ecx, DWORD PTR _this$[ebp]
call ??0Animal##QAE#XZ
mov eax, DWORD PTR _this$[ebp]
mov DWORD PTR [eax], OFFSET ??_7Cat##6B#
mov eax, DWORD PTR _this$[ebp]
mov esp, ebp
pop ebp
ret 0
??0Cat##QAE#XZ ENDP ; Cat::Cat
_TEXT ENDS
Supplement: MSVC Compiler x86 19.00.23026
The layout of vtables is implementation dependent. In your particular case, when compiling your example code, the Microsoft C++ compiler generates a vtable for Cat where the speak virtual function is at offset 0, and the wash function is at offset 4. The RTTI information is located before these functions at offset -4.
The problem here is that Microsoft's assembly output is lying. The generated assembly code puts the RTTI information at offset 0 and the speak and wash functions at offset 4 and 8. However this is not how it's actually laid out in the object file the compiler generates. Disassembling the object file reveals this layout:
.new_section .rdata, "dr2"
0000 00 00 00 00 .long ??_R4Cat##6B#
0004 ??_7Cat##6B#:
0004 00 00 00 00 .long ?speak#Cat##UAEXXZ
0008 00 00 00 00 .long ?wash#Cat##UAEXXZ
Unfortunately the assembly output of Microsoft's C/C++ compiler is meant only to be informational. It's not an accurate and complete representation of the actual code the compiler generates. In particular it can't be reliably assembled into a working object file.

Dumping memory of a function

I would like to dump a functions memory (void) to a byte array (unsinged char[]). Aftwerwards, a dummy function shall be pointed to the byte array and the dummy function shall be executed.
The function I want to dump:
void CallMessageBoxExA()
{
message = "ManualMessageBoxExA";
caption = "Caption";
pAddr = GetProcAddress(GetModuleHandle(L"User32.dll"), "MessageBoxExA");
__asm // Call MessageBoxA
{
push dword ptr 0 //--- push languageID: 0
push dword ptr 0 //--- push style: 0
push dword ptr caption //--- push DWORD parameter (caption)
push dword ptr message //--- push DWORD parameter (message)
push dword ptr 0 //--- push hOwner: 0
mov eax, pAddr
call eax //-- call address of the function, which is currently in EAX
}
}
Dumping the memory:
string DumpMemory(void *pAddress, int maxLength)
{
string result = "";
const unsigned char * p = reinterpret_cast< const unsigned char *>(pAddress);
cout << "Memory location: 0x" << hex << (unsigned int)p << endl;
for (unsigned int i = 0; i < maxLength; i++) {
string code = "";
stringstream ss;
ss << hex << int(p[i]);
ss >> code;
result += code;
}
return result;
}
When looking at the memory location DumpMemory prints to the console, ollydbg shows a JMP instruction at this location:
CPU Disasm
Address Hex dump Command Comments
00281627 $-/E9 044C0000 JMP CallMessageBoxExA
Is this the correct memory-location, or do I have to follow the JMP?
Memory location jump leads to:
CPU Disasm
Address Hex dump Command Comments
00286230 /$ 55 PUSH EBP ; ASM.CallMessageBoxExA(void)
00286231 |. 8BEC MOV EBP,ESP
00286233 |. 81EC C0000000 SUB ESP,0C0
00286239 |. 53 PUSH EBX
0028623A |. 56 PUSH ESI
0028623B |. 57 PUSH EDI
0028623C |. 8DBD 40FFFFFF LEA EDI,[EBP-0C0]
00286242 |. B9 30000000 MOV ECX,30
00286247 |. B8 CCCCCCCC MOV EAX,CCCCCCCC
0028624C |. F3:AB REP STOS DWORD PTR ES:[EDI]
0028624E |. C705 A8562900 MOV DWORD PTR DS:[message],OFFSET 00291D ; ASCII "ManualMessageBoxExA"
00286258 |. C705 AC562900 MOV DWORD PTR DS:[caption],OFFSET 00291D ; ASCII "Caption"
00286262 |. 8BF4 MOV ESI,ESP
00286264 |. 68 041E2900 PUSH OFFSET 00291E04 ; ASCII "MessageBoxExA"
00286269 |. 8BFC MOV EDI,ESP
0028626B |. 68 D01D2900 PUSH OFFSET 00291DD0 ; /ModuleName = "User32.dll"
00286270 |. FF15 00602900 CALL DWORD PTR DS:[<&KERNEL32.GetModuleH ; \KERNEL32.GetModuleHandleW
00286276 |. 3BFC CMP EDI,ESP
00286278 |. E8 5BB2FFFF CALL 002814D8 ; [_RTC_CheckEsp
0028627D |. 50 PUSH EAX ; |hModule
0028627E |. FF15 04602900 CALL DWORD PTR DS:[<&KERNEL32.GetProcAdd ; \KERNEL32.GetProcAddress
00286284 |. 3BF4 CMP ESI,ESP
00286286 |. E8 4DB2FFFF CALL 002814D8 ; [_RTC_CheckEsp
0028628B |. A3 B0562900 MOV DWORD PTR DS:[pAddr],EAX
00286290 |. 6A 00 PUSH 0
00286292 |. 6A 00 PUSH 0
00286294 |. FF35 AC562900 PUSH DWORD PTR DS:[caption]
0028629A |. FF35 A8562900 PUSH DWORD PTR DS:[message]
002862A0 |. 6A 00 PUSH 0
002862A2 |. A1 B0562900 MOV EAX,DWORD PTR DS:[pAddr]
002862A7 |. FFD0 CALL EAX
002862A9 |. 5F POP EDI
002862AA |. 5E POP ESI
002862AB |. 5B POP EBX
002862AC |. 81C4 C0000000 ADD ESP,0C0
002862B2 |. 3BEC CMP EBP,ESP
002862B4 |. E8 1FB2FFFF CALL 002814D8 ; [_RTC_CheckEsp
002862B9 |. 8BE5 MOV ESP,EBP
002862BB |. 5D POP EBP
002862BC \. C3 RETN
Pointing the dummy function at the byte array:
void(*func_ptr)();
func_ptr = (void(*)()) &foo[0]; // make function point to foo[]
(*func_ptr)(); // Call the function
Is this the correct way to make the dummy function point to the byte array?
At which point is the function's end reached? Should I simply check for the different return opcodes (C3 -> return near to caller, CB -> return far to caller, ...)?
PS:
A simple (e.g. not very elaborate) solution is preferred as I am new to C++.
Edit: I want to achieve this in a Windows environment.
You need to store the "copied" function on a block of memory allocated using VirtualAllocEx. On modern OSs, there is a bit on each page which declares whether its contents are executable or not. This is used to minimize the damage of buffer overruns. By default, your memory is not executable. If you use VirtualAllocEx with the PAGE_EXECUTE_READWRITE protection mode, you'll be able to write to a block of memory and then execute from it.
As for your question of "when do you reach the end of a function," that is actually not answerable. There are common patterns you can look for, but x86 lacks any way of identifying the "end" of a function.
It looks like you need to follow the jump. When you follow the jump, the code you're seeing matches what you compiled above.
Also, your DumpMemory is using the address of pAddressIn. Your function is being passed a variable called pAddress. Either this is a typo, or you're referencing a variable declared somewhere else. I assume you meant to use pAddress.
The memory you allocate may need special privilege to be allowed to run. As is, the memory you allocate with the raw function data will be marked as "data". "Data Execution Prevention" may stop this depending on your environment.

C vs C++ Static Initialization of Objects

I have a question regarding initialization of fairly large sets of static data.
See my three examples below of initializing sets of static data. I'd like to understand the program load time & memory footprint implications of the methods shown below. I really don't know how to evaluate that on my own at the moment. My build environment is still on the desktop using Visual Studio, however the embedded targets will be compiled for VxWorks using GCC.
Traditionally, I've used basic C-structs for this sort of thing, although there's good reason to move this data into C++ classes moving forward. Dynamic memory allocation is frowned upon in the embedded application and is avoided wherever possible.
As far as I know, the only way to initialize a C++ class is through its constructor, shown below in Method 2. I am wondering how that compares to Method 1. Is there any appreciable additional overhead in terms of ROM (Program footprint), RAM (Memory Footprint), or program load time? It seems to me that the compiler may be able to optimize away the rather trivial constructor, but I'm not sure if that's common behavior.
I listed Method 3, as I've considered it, although it just seems like a bad idea. Is there something else that I'm missing here? Anyone else out there initialize data in a similar manner?
/* C-Style Struct Storage */
typedef struct{
int a;
int b;
}DATA_C;
/* CPP Style Class Storage */
class DATA_CPP{
public:
int a;
int b;
DATA_CPP(int,int);
};
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn;
b = bIn;
}
/* METHOD 1: Direct C-Style Static Initialization */
DATA_C MyCData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10}
};
/* METHOD 2: Direct CPP-Style Initialization */
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
DATA_CPP(3,4),
DATA_CPP(5,6),
DATA_CPP(7,8),
DATA_CPP(9,10),
};
/* METHOD 3: Cast C-Struct to CPP class */
DATA_CPP* pMyCppData2 = (DATA_CPP*) MyCData;
In C++11 , you can write this:
DATA_CPP obj = {1,2}; //Or simply : DATA_CPP obj {1,2}; i.e omit '='
instead of
DATA_CPP obj(1,2);
Extending this, you can write:
DATA_CPP MyCppData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10},
};
instead of this:
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
DATA_CPP(3,4),
DATA_CPP(5,6),
DATA_CPP(7,8),
DATA_CPP(9,10),
};
Read about uniform initialization.
I've done a bit of research in this, thought I'd post the results. I used Visual Studio 2008 in all cases here.
Here's the disassembly view of the code from Visual Studio in Debug Mode:
/* METHOD 1: Direct C-Style Static Initialization */
DATA_C MyCData[5] = { {1,2},
{3,4},
{5,6},
{7,8},
{9,10},
};
/* METHOD 2: Direct CPP-Style Initialization */
DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
010345C0 push ebp
010345C1 mov ebp,esp
010345C3 sub esp,0C0h
010345C9 push ebx
010345CA push esi
010345CB push edi
010345CC lea edi,[ebp-0C0h]
010345D2 mov ecx,30h
010345D7 mov eax,0CCCCCCCCh
010345DC rep stos dword ptr es:[edi]
010345DE push 2
010345E0 push 1
010345E2 mov ecx,offset MyCppData (1038184h)
010345E7 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(3,4),
010345EC push 4
010345EE push 3
010345F0 mov ecx,offset MyCppData+8 (103818Ch)
010345F5 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(5,6),
010345FA push 6
010345FC push 5
010345FE mov ecx,offset MyCppData+10h (1038194h)
01034603 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(7,8),
01034608 push 8
0103460A push 7
0103460C mov ecx,offset MyCppData+18h (103819Ch)
01034611 call DATA_CPP::DATA_CPP (103119Ah)
DATA_CPP(9,10),
01034616 push 0Ah
01034618 push 9
0103461A mov ecx,offset MyCppData+20h (10381A4h)
0103461F call DATA_CPP::DATA_CPP (103119Ah)
};
01034624 pop edi
01034625 pop esi
01034626 pop ebx
01034627 add esp,0C0h
0103462D cmp ebp,esp
0103462F call #ILT+325(__RTC_CheckEsp) (103114Ah)
01034634 mov esp,ebp
01034636 pop ebp
Interesting thing to note here is that there is definitely some overhead in program memory usage and load time, at least in non-optimized debug mode. Notice that Method 1 has zero assembly instructions, while method two has about 44 instructions.
I also ran compiled the program in Release mode with optimization enabled, here is the abridged assembly output:
?MyCData##3PAUDATA_C##A DD 01H ; MyCData
DD 02H
DD 03H
DD 04H
DD 05H
DD 06H
DD 07H
DD 08H
DD 09H
DD 0aH
?MyCppData##3PAVDATA_CPP##A DD 01H ; MyCppData
DD 02H
DD 03H
DD 04H
DD 05H
DD 06H
DD 07H
DD 08H
DD 09H
DD 0aH
END
Seems like the compiler indeed optimized away the calls to the C++ constructor. I could find no evidence of the constructor ever being called anywhere in the assembly code.
I thought I'd try something a bit more. I changed the constructor to:
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn + bIn;
b = bIn;
}
Again, the compiler optimized this away, resulting in a static dataset:
?MyCppData##3PAVDATA_CPP##A DD 03H ; MyCppData
DD 02H
DD 07H
DD 04H
DD 0bH
DD 06H
DD 0fH
DD 08H
DD 013H
DD 0aH
END
Interesting, the compiler was able to evaluate the constructor code on all the static data during compilation and create a static dataset, still not calling the constructor.
I thought I'd try something still a bit more, operate on a global variable in the constructor:
int globalvar;
DATA_CPP::DATA_CPP(int aIn, int bIn){
a = aIn + globalvar;
globalvar += a;
b = bIn;
}
And in this case, the compiler now generated assembly code to call the constructor during initialization:
??__EMyCppData##YAXXZ PROC ; `dynamic initializer for 'MyCppData'', COMDAT
; 35 : DATA_CPP MyCppData[5] = { DATA_CPP(1,2),
00000 a1 00 00 00 00 mov eax, DWORD PTR ?globalvar##3HA ; globalvar
00005 8d 48 01 lea ecx, DWORD PTR [eax+1]
00008 03 c1 add eax, ecx
0000a 89 0d 00 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A, ecx
; 36 : DATA_CPP(3,4),
00010 8d 48 03 lea ecx, DWORD PTR [eax+3]
00013 03 c1 add eax, ecx
00015 89 0d 08 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+8, ecx
; 37 : DATA_CPP(5,6),
0001b 8d 48 05 lea ecx, DWORD PTR [eax+5]
0001e 03 c1 add eax, ecx
00020 89 0d 10 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+16, ecx
; 38 : DATA_CPP(7,8),
00026 8d 48 07 lea ecx, DWORD PTR [eax+7]
00029 03 c1 add eax, ecx
0002b 89 0d 18 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+24, ecx
; 39 : DATA_CPP(9,10),
00031 8d 48 09 lea ecx, DWORD PTR [eax+9]
00034 03 c1 add eax, ecx
00036 89 0d 20 00 00
00 mov DWORD PTR ?MyCppData##3PAVDATA_CPP##A+32, ecx
0003c a3 00 00 00 00 mov DWORD PTR ?globalvar##3HA, eax ; globalvar
; 40 : };
00041 c3 ret 0
??__EMyCppData##YAXXZ ENDP ; `dynamic initializer for 'MyCppData''
FYI, I found this page helpful in setting up visual studio to output assembly:
How do I get the assembler output from a C file in VS2005

Does empty "else" cost any time

Is there any difference in computational cost of
if(something){
return something;
}else{
return somethingElse;
}
and
if(something){
return something;
}
//else (put in comments for readibility purposes)
return somethingElse;
In theory we have command (else) but it doesn't seem it should make an actuall difference.
Edit:
After running code for different set sizes, I found that there actually is a differrence, code without else appears to be about 1.5% more efficient. But it most likely depends on compiler, as stated by many people below. Code I tested it on:
int withoutElse(bool a){
if(a)
return 0;
return 1;
}
int withElse(bool a){
if(a)
return 0;
else
return 1;
}
int main(){
using namespace std;
bool a=true;
clock_t begin,end;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withElse(a);
}
end = clock();
cout<<end-begin<<endl;
begin= clock();
for(__int64 i=0;i<1000000000;i++){
a=!a;
withoutElse(a);
}
end = clock();
cout<<end-begin<<endl;
return 0;
}
Checked on loops from 1 000 000 to 1 000 000 000, and results were consistently different
Edit 2:
Assembly code (once again, generated using Visual Studio 2010) also shows small difference (appareantly, I'm no good with assemblers :()
?withElse##YAH_N#Z PROC ; withElse, COMDAT
; Line 12
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 13
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN2#withElse
; Line 14
xor eax, eax
jmp SHORT $LN3#withElse
; Line 15
jmp SHORT $LN3#withElse
$LN2#withElse:
; Line 16
mov eax, 1
$LN3#withElse:
; Line 17
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withElse##YAH_N#Z ENDP ; withElse
and
?withoutElse##YAH_N#Z PROC ; withoutElse, COMDAT
; Line 4
push ebp
mov ebp, esp
sub esp, 192 ; 000000c0H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-192]
mov ecx, 48 ; 00000030H
mov eax, -858993460 ; ccccccccH
rep stosd
; Line 5
movzx eax, BYTE PTR _a$[ebp]
test eax, eax
je SHORT $LN1#withoutEls
; Line 6
xor eax, eax
jmp SHORT $LN2#withoutEls
$LN1#withoutEls:
; Line 7
mov eax, 1
$LN2#withoutEls:
; Line 9
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?withoutElse##YAH_N#Z ENDP ; withoutElse
It is generically different, but the compiler may decide to execute the same jump in both cases (it will practically do this always).
The best way to see what a compiler does is reading the assembler. Assuming that you are using gcc you can try with
gcc -g -c -fverbose-asm myfile.c; objdump -d -M intel -S myfile.o > myfile.s
which creates a mix of assembler/c code and makes the job easier at the beginning.
As for your example it is:
CASE1
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
}else{
return 0;
2e: b8 00 00 00 00 mov eax,0x0
}
CASE2
if(something){
23: 83 7d fc 00 cmp DWORD PTR [ebp-0x4],0x0
27: 74 05 je 2e <main+0x19>
return something;
29: 8b 45 fc mov eax,DWORD PTR [ebp-0x4]
2c: eb 05 jmp 33 <main+0x1e>
return 0;
2e: b8 00 00 00 00 mov eax,0x0
As you could imagine there are no differences!
It won't compile if you type `return
Think that once the code gets compiled, all ifs, elses and loops are changed to goto's
If (cond) { code A } code B
turns to
if cond is false jump to code b
code A
code B
and
If (cond) { code A } else { code B } code C
turns to
if cond is false jump to code B
code A
ALWAYS jump to code C
code B
code C
Most processors 'guess' whether they're going to jump or not before checking if they actually jump. Depending on the processor, it might affect the performance to fail a guess.
So the answer is YES! (unless there's an ALWAYS jump at the end of first comparison) It will take 2-3 cycles to do the ALWAYS jump which isn't in the first if.

_asm swap and compiler additions

I've just written a bubble_sort of an integer array (see previous question) and decided to ignore the standard swap and implement an assembly swap, which looks like this:
int swap(int* x, int* y)
{
if(x != y)
{
_asm
{
mov eax,[x];
mov ebx, [y];
mov [y],eax;
mov [x], ebx;
}
}
return 0;
}
I was actually sure that it will be inserted into the resulting code as is and will work.
Well, my code which uses this swap does work, but I've looked into what the complier turned it into, and my swap was changed into this:
if(x != y)
00E01A6F inc ebp
00E01A70 or byte ptr [ebx],bh
00E01A72 inc ebp
00E01A73 or al,74h
if(x != y)
00E01A75 or al,8Bh
{
_asm
{
mov eax,[x];
00E01A77 inc ebp
00E01A78 or byte ptr [ebx+45890C5Dh],cl
mov [y],eax;
00E01A7E or al,89h
mov [x], ebx;
00E01A80 pop ebp
00E01A81 or byte ptr [ebx],dh
}
}
return 0;
00E01A83 rcr byte ptr [edi+5Eh],5Bh
}
I've compiled it in MS VS 2012.
What do all those extra lines mean, and why are they there? Why can't my _asm fragment just be used?
Can you tell us how you've compiled that function and how you got the disassembly?
When I compile using
cl /FAsc -c test.c
I get the following in the assembly listing for the inline assembler part:
; 4 : {
; 5 : _asm
0000a 53 push ebx
; 6 : {
; 7 : mov eax,[x];
0000b 8b 44 24 08 mov eax, DWORD PTR _x$[esp]
; 8 : mov ebx, [y];
0000f 8b 5c 24 0c mov ebx, DWORD PTR _y$[esp]
; 9 : mov [y],eax;
00013 89 44 24 0c mov DWORD PTR _y$[esp], eax
; 10 : mov [x], ebx;
00017 89 5c 24 08 mov DWORD PTR _x$[esp], ebx
; 4 : {
; 5 : _asm
0001b 5b pop ebx
$LN4#swap:
; 11 : }
One thing to note is that you aren't swapping what you'd really like to swap - your swapping the pointers that are passed to the function, not the items the pointers refer to. So when the function returns, the swapped data is thrown away. The function is just one big nop.
You might want to try somethign like:
_asm
{
mov eax,[x];
mov ebx,[y];
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
But frankly, performing the swap in C would likely result in similar (or better) code.
It's missing the first and last bytes. If you look at what the code is now:
inc ebp ; 45
or byte ptr [ebx],bh ; 08 3B
inc ebp ; 45
or al,74h ; 0C 74
or al,8Bh ; 0C 8B
inc ebp ; 45
or byte ptr [ebx+45890C5Dh],cl ; 08 8B 5D 0C 89 45
or al,89h ; 0C 89
pop ebp ; 5B
or byte ptr [ebx],dh ; 08 33
rcr byte ptr [edi+5Eh],5Bh ; C0 5F 5E 5B
If you ignore the first two bytes, you get this:
cmp eax, [ebp + 12] ; 3B 45 0C
jz skip ; 74 0C
mov eax, [ebx + 8] ; 8B 45 08
mov ebx, [ebp + 12] ; 8B 5D 0C
mov [ebp + 12], eax ; 89 45 0C
mov [ebx + 8], ebx ; 89 5B 08
skip:
xor eax, eax ; 33 C0
pop edi ; 5F
pop esi ; 5E
pop ebp ; 5B
It's missing the ret at the end, and, crucially, some instruction that has eax and [ebp + 8] as arguments (a mov would make sense there). The missing first byte desynchronized the disassembly with the instruction stream.
It's also missing the prologue, of course.
You need to push at the beginning and pop in the end if you want to see the end of main() :)
_asm
{
push eax //back-up of registers
push ebx
mov eax,[x];
mov ebx, [y];
mov [y],eax;
mov [x], ebx;
pop ebx //resume the registers where they were
pop eax // so, compiler can continue working normally
}
Because compiler uses them for other things too!
You could also have used xchg
mov eax,[x]
xchg eax, [y]
mov [x],eax
You 64 bit? Then there is a single read, single swap, single write. You can search it.
Good day!