assembly output of a simple C++ program - c++

I am trying to understand the assembly output of a simple c++ program. This is my C++ program.
void func()
{}
int main()
{
func();
}
when I use g++ with --save-temps option to get the assembly code for the above program I get the following assembly code.
.file "main.cpp"
.text
.globl _Z4funcv
.type _Z4funcv, #function
_Z4funcv:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size _Z4funcv, .-_Z4funcv
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
call _Z4funcv
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2"
.section .note.GNU-stack,"",#progbits
According to my knowledge on assembly there should be 3 sections of any assembly program which are data, text and bss. Also text section should start with 'global _start'. I can't see any of them in this assembly code.
Can someone please help me to understand the above assembly code. If you can relate to C++ code as well, It would be great.
Any kind of help is greatly appreciated.

Well, here it is line by line...
.file "main.cpp" # Debugging info (not essential)
.text # Start of text section (i.e. your code)
.globl _Z4funcv # Let the function _Z4funcv be callable
# from outside (e.g. from your main routine)
.type _Z4funcv, #function # Debugging info (possibly not essential)
_Z4funcv: # _Z4funcv is effectively the "name" of your
# function (C++ "mangles" the name; exactly
# how depends on your compiler -- Google "C++
# name mangling" for more).
.LFB0: # Debugging info (possibly not essential)
.cfi_startproc # Provides additional debug info (ditto)
pushq %rbp # Store base pointer of caller function
# (standard function prologue -- Google
# "calling convention" or "cdecl")
.cfi_def_cfa_offset 16 # Provides additional debug info (ditto)
.cfi_offset 6, -16 # Provides additional debug info (ditto)
movq %rsp, %rbp # Reset base pointer to a sensible place
# for this function to put its local
# variables (if any). Standard function
# prologue.
.cfi_def_cfa_register 6 # Debug ...
popq %rbp # Restore the caller's base pointer
# Standard function epilogue
.cfi_def_cfa 7, 8 # Debug...
ret # Return from function
.cfi_endproc # Debug...
.LFE0: # Debug...
.size _Z4funcv, .-_Z4funcv # Debug...
.globl main # Declares that the main function
# is callable from outside
.type main, #function # Debug...
main: # Your main routine (name not mangled)
.LFB1: # Debug...
.cfi_startproc # Debug...
pushq %rbp # Store caller's base pointer
# (standard prologue)
.cfi_def_cfa_offset 16 # Debug...
.cfi_offset 6, -16 # Debug...
movq %rsp, %rbp # Reset base pointer
# (standard prologue)
.cfi_def_cfa_register 6 # Debug...
call _Z4funcv # Call `func` (note name mangled)
movl $0, %eax # Put `0` in eax (eax is return value)
popq %rbp # Restore caller's base pointer
# (standard epilogue)
.cfi_def_cfa 7, 8 # Debug...
ret # Return from main function
.cfi_endproc # Debug...
.LFE1:
.size main, .-main # Debug...
.ident "GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2" # fluff
.section .note.GNU-stack,"",#progbits # fluff
The linker knows to look for main (and not start) if it is using the standard C or C++ library (which it usually is, unless you tell it otherwise). It links some stub code (which contains start) into the final executable.
So, really, the only important bits are...
.text
.globl _Z4funcv
_Z4funcv:
pushq %rbp
movq %rsp, %rbp
popq %rbp
ret
.globl main
main:
pushq %rbp
movq %rsp, %rbp
call _Z4funcv
movl $0, %eax
popq %rbp
ret
If you want to start from scratch, and not have all the complicated standard library stuff getting in the way of your discovery, you can do something like this and achieve the same result as your C++ code:
.text
.globl _func
_func: # Just as above, really
push %ebp
mov %esp, %ebp
pop %ebp
ret
.globl _start
_start: # A few changes here
push %ebp
mov %esp, %ebp
call _func
movl $1, %eax # Invoke the Linux 'exit' syscall
movl $0, %ebx # With a return value of 0 (pick any char!)
int $0x80 # Actual invocation
The exit syscall is a bit painful, but necessary. If you don't have it, it tries to keep going and run the code that is "past" your code. As that could be important code or data, the machine should stop you with a Segmentation Fault error. Having the exit call avoids all this. If you are using the standard library (as will happen automatically in your C++ example) the exit stuff is taken care of by the linker.
Compile with gcc -nostdlib -o test test.s (noting that gcc is specifically told not to use the standard library). I should say that this is for a 32-bit system, and quite likely will not work on 64-bit. I don't have a 64-bit system to test on, but perhaps some helpful StackOverflower will chip in with a 64-bit translation.

Related

Will string literals always be compiled into binary? [duplicate]

In my rendering code, I pass string literals as a parameter to my rendering blocks so I can identify them in debug dumps and profiling. It looks similar to this:
// rendering client
draw.begin("east abbey foyer");
// rendering code
draw.end();
void Draw::begin(const char * debugName){
#ifdef DEBUG
// code that uses debugName
#endif
// the rest of the code doesn't use debugName at all
}
In the final program, I wouldn't want these strings to be around anymore. But I also want to avoid using macros in my rendering client code to do this; in fact, I would want the rendering client code to KEEP the strings (in the code itself) but not actually compile them into the final program.
So what I'm wondering is, if I change the code of draw.begin(const char*) to not use it's parameter at all, will my compiler optimize that parameter and it's associated costs away (perhaps even going so far as to exclude it from the string table)?
Beware of anyone giving you concrete answers to questions like this. The only way you can be really certain what your compiler is doing in your configuration is to look at the results.
One of the easiest ways to do this is to step through the disassembly in the debugger.
Or you can get the compiler to output an assembly listing (see How to view the assembly behind the code using Visual C++?) and examine exactly what it is really doing.
In practice, it depends on whether the compiler has access to the source code of the implementation of Draw::begin()
here's an illustration:
#include <iostream>
void do_nothing(const char* txt)
{
}
int main()
{
using namespace std;
do_nothing("hello");
return 0;
}
compile with -O3:
yields:
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
i.e. the string and the call to do_nothing is entirely optimised away
However, define do_nothing in another module and we get:
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
callq __Z10do_nothingPKc
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "hello"
i.e. a redundant load, call and the address of the string is passed.
So if you want to optimise debug info away you'll want to implement Draw::begin() inline (in the same way the stl does).
Yes, you could muck about with optimization and that may or may not work depending on the version of your compiler etc.
A much better way to do this is explicitly, not with MACROs (god forbid), but with a simple function that either generates the tag you want for debug purposes or does nothing for production code.
You want to look at std::assert and perhaps use it.
The possibility of doing so will depend on visibility of function in the call site. If function body is not visible, optimization discussed will not be possible at all. If function is visible, than yes - you can not even guarantee there will be a function call!

C++: How aggressively are unused parameters optimized?

In my rendering code, I pass string literals as a parameter to my rendering blocks so I can identify them in debug dumps and profiling. It looks similar to this:
// rendering client
draw.begin("east abbey foyer");
// rendering code
draw.end();
void Draw::begin(const char * debugName){
#ifdef DEBUG
// code that uses debugName
#endif
// the rest of the code doesn't use debugName at all
}
In the final program, I wouldn't want these strings to be around anymore. But I also want to avoid using macros in my rendering client code to do this; in fact, I would want the rendering client code to KEEP the strings (in the code itself) but not actually compile them into the final program.
So what I'm wondering is, if I change the code of draw.begin(const char*) to not use it's parameter at all, will my compiler optimize that parameter and it's associated costs away (perhaps even going so far as to exclude it from the string table)?
Beware of anyone giving you concrete answers to questions like this. The only way you can be really certain what your compiler is doing in your configuration is to look at the results.
One of the easiest ways to do this is to step through the disassembly in the debugger.
Or you can get the compiler to output an assembly listing (see How to view the assembly behind the code using Visual C++?) and examine exactly what it is really doing.
In practice, it depends on whether the compiler has access to the source code of the implementation of Draw::begin()
here's an illustration:
#include <iostream>
void do_nothing(const char* txt)
{
}
int main()
{
using namespace std;
do_nothing("hello");
return 0;
}
compile with -O3:
yields:
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
i.e. the string and the call to do_nothing is entirely optimised away
However, define do_nothing in another module and we get:
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
leaq L_.str(%rip), %rdi
callq __Z10do_nothingPKc
xorl %eax, %eax
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "hello"
i.e. a redundant load, call and the address of the string is passed.
So if you want to optimise debug info away you'll want to implement Draw::begin() inline (in the same way the stl does).
Yes, you could muck about with optimization and that may or may not work depending on the version of your compiler etc.
A much better way to do this is explicitly, not with MACROs (god forbid), but with a simple function that either generates the tag you want for debug purposes or does nothing for production code.
You want to look at std::assert and perhaps use it.
The possibility of doing so will depend on visibility of function in the call site. If function body is not visible, optimization discussed will not be possible at all. If function is visible, than yes - you can not even guarantee there will be a function call!

how does the std::sqrt() function work? [duplicate]

This question already has answers here:
How is the square root function implemented? [closed]
(15 answers)
Closed 4 years ago.
Does anyone know how the std::sqrt() function works? (or at least have an idea?)
I've seen methods on the internet that seemed really slow, using lots of approximations and iterations.
Everyone knows sqrt() function is slow, but I'd like to know how the one from std works so I could have a vague idea of when it is beneficial to avoid it. (yes, if I want to be sure I can profile, but it's still nice to have a vague idea)
EDIT: Didn't really formulate the question too well... What I'm interested in:
how would the fastest C++ function calculating a square root look like? (more or less, I just want to know the actual logic behind it)
Nowadays, on modern machines, floating point functions are passed off to the hardware (floating point unit or math-coprocessor).
Sometimes, it uses what the CPU offers:
$ cat main.cc
#include <cmath>
#include <ctime>
#include <cstdlib>
int main(){
srand (clock());
const double d = rand();
return std::sqrt(d) > 2 ? 1 : 0;
}
(the blahblah is just so nothing relevant is optimized away, don't run that program!)
$ g++ -S main.cc
$ cat main.s
.file "main.cc"
.text
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB106:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call clock
movl %eax, %edi
call srand
call rand
cvtsi2sd %eax, %xmm1
sqrtsd %xmm1, %xmm0
ucomisd %xmm0, %xmm0
jp .L5
.L2:
xorl %eax, %eax
ucomisd .LC0(%rip), %xmm0
seta %al
addq $8, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L5:
.cfi_restore_state
movapd %xmm1, %xmm0
call sqrt
jmp .L2
.cfi_endproc
.LFE106:
.size main, .-main
.section .rodata.cst8,"aM",#progbits,8
.align 8
.LC0:
.long 0
.long 1073741824
.ident "GCC: (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2"
.section .note.GNU-stack,"",#progbits
(hint: it is using a sqrt-cpu-instruction)
sqrt(); function Behind the scenes.
It always checks for the mid-points in a graph.
Example: sqrt(16)=4;
sqrt(4)=2;
Now if you give any input inside 16 or 4 like sqrt(10)==?
It finds the mid point of 2 and 4 i.e = x ,then again it finds the mid point of x and 4 (It excludes lower bound in this input). It repeats this step again and again until it gets the perfect answer i.e sqrt(10)==3.16227766017 .It lies b/w 2 and 4.All this in-built function are created using calculus,differentiation and Integration.
The standard does not specify a particular implementation.
One option is to look at a typical implementation, but you'll probably find it's heavily-optimised assembler.

Questions re: assembly generated from my C++ by gcc

Compiling this code:
int main ()
{
return 0;
}
using:
gcc -S filename.cpp
...generates this assembly:
.file "heloworld.cpp"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl $0, %eax
popl %ebp
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
My questions:
Is everything after "." a comment?
What is .LFB0:?
What is .LFE0:?
Why is it so big code only for "int main ()" and "return 0;"?
P.S. I read alot of assembly net books, a lot (at least 30) of tutorials and all I can do is copy code and paste it or rewrite it. Now I'm trying a different approach to try to learn it somehow. The problem is I do understand what are movl, pop, etc, but don't understand how to combine these things to make code "flow". I don't know where or how to correctly start writing a program in asm is. I'm still static not dynamic as in C++ but I want to learn assembly.
As other have said, .file, .text, ... are assembler directives and .LFB0, .LFE0 are local labels. The only instruction in the generated code are:
pushl %ebp
movl %esp, %ebp
movl $0, %eax
popl %ebp
ret
The first two instruction are the function prologue. The frame pointer is stored on the stack and updated. The next intruction store 0 in eax register (i386 ABI states that integer return value are returned via the eax register). The two last instructions are function epilogue. The frame pointer is restored, and then the function return to its caller via the ret instruction.
If you compile your code with -O3 -fomit-frame-pointer, the code will be compiled to just two instructions:
xorl %eax,%eax
ret
The first set eax to 0 (it only takes two bytes to encode, while movl 0,%eax take 5 bytes), and the second is the ret instruction. The frame pointer manipulation is there to ease debugging (it is possible to get backtrace without it, but it is more difficult).
.file, .text, etc are assembler directives.
.LFB0, .LFE0 are local labels, which are normally used as branch destinations within a function.
As for the size, there are really only a few actual instructions - most of the above listing consists of directives, etc. For future reference you might also want to turn up the optimisation level to remove otherwise redudant instructions, i.e. gcc -Wall -O3 -S ....
It's just that there's a lot going on behind your simple program.
If you intend to read assembler outputs, by no means compile C++. Use plain C, the output is far clearer for a number of reasons.

How do exceptions work (behind the scenes) in c++

I keep seeing people say that exceptions are slow, but I never see any proof. So, instead of asking if they are, I will ask how do exceptions work behind the scenes, so I can make decisions of when to use them and whether they are slow.
From what I know, exceptions are the same as doing a return bunch of times, except that it also checks after each return whether it needs to do another one or to stop. How does it check when to stop returning? I guess there is a second stack that holds the type of the exception and a stack location, it then does returns until it gets there. I am also guessing that the only time this second stack is touched is on a throw and on each try/catch. AFAICT implementing a similar behaviour with return codes would take the same amount of time. But this is all just a guess, so I want to know what really happens.
How do exceptions really work?
Instead of guessing, I decided to actually look at the generated code with a small piece of C++ code and a somewhat old Linux install.
class MyException
{
public:
MyException() { }
~MyException() { }
};
void my_throwing_function(bool throwit)
{
if (throwit)
throw MyException();
}
void another_function();
void log(unsigned count);
void my_catching_function()
{
log(0);
try
{
log(1);
another_function();
log(2);
}
catch (const MyException& e)
{
log(3);
}
log(4);
}
I compiled it with g++ -m32 -W -Wall -O3 -save-temps -c, and looked at the generated assembly file.
.file "foo.cpp"
.section .text._ZN11MyExceptionD1Ev,"axG",#progbits,_ZN11MyExceptionD1Ev,comdat
.align 2
.p2align 4,,15
.weak _ZN11MyExceptionD1Ev
.type _ZN11MyExceptionD1Ev, #function
_ZN11MyExceptionD1Ev:
.LFB7:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
popl %ebp
ret
.LFE7:
.size _ZN11MyExceptionD1Ev, .-_ZN11MyExceptionD1Ev
_ZN11MyExceptionD1Ev is MyException::~MyException(), so the compiler decided it needed a non-inline copy of the destructor.
.globl __gxx_personality_v0
.globl _Unwind_Resume
.text
.align 2
.p2align 4,,15
.globl _Z20my_catching_functionv
.type _Z20my_catching_functionv, #function
_Z20my_catching_functionv:
.LFB9:
pushl %ebp
.LCFI2:
movl %esp, %ebp
.LCFI3:
pushl %ebx
.LCFI4:
subl $20, %esp
.LCFI5:
movl $0, (%esp)
.LEHB0:
call _Z3logj
.LEHE0:
movl $1, (%esp)
.LEHB1:
call _Z3logj
call _Z16another_functionv
movl $2, (%esp)
call _Z3logj
.LEHE1:
.L5:
movl $4, (%esp)
.LEHB2:
call _Z3logj
addl $20, %esp
popl %ebx
popl %ebp
ret
.L12:
subl $1, %edx
movl %eax, %ebx
je .L16
.L14:
movl %ebx, (%esp)
call _Unwind_Resume
.LEHE2:
.L16:
.L6:
movl %eax, (%esp)
call __cxa_begin_catch
movl $3, (%esp)
.LEHB3:
call _Z3logj
.LEHE3:
call __cxa_end_catch
.p2align 4,,3
jmp .L5
.L11:
.L8:
movl %eax, %ebx
.p2align 4,,6
call __cxa_end_catch
.p2align 4,,6
jmp .L14
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",#progbits
.align 4
.LLSDA9:
.byte 0xff
.byte 0x0
.uleb128 .LLSDATT9-.LLSDATTD9
.LLSDATTD9:
.byte 0x1
.uleb128 .LLSDACSE9-.LLSDACSB9
.LLSDACSB9:
.uleb128 .LEHB0-.LFB9
.uleb128 .LEHE0-.LEHB0
.uleb128 0x0
.uleb128 0x0
.uleb128 .LEHB1-.LFB9
.uleb128 .LEHE1-.LEHB1
.uleb128 .L12-.LFB9
.uleb128 0x1
.uleb128 .LEHB2-.LFB9
.uleb128 .LEHE2-.LEHB2
.uleb128 0x0
.uleb128 0x0
.uleb128 .LEHB3-.LFB9
.uleb128 .LEHE3-.LEHB3
.uleb128 .L11-.LFB9
.uleb128 0x0
.LLSDACSE9:
.byte 0x1
.byte 0x0
.align 4
.long _ZTI11MyException
.LLSDATT9:
Surprise! There are no extra instructions at all on the normal code path. The compiler instead generated extra out-of-line fixup code blocks, referenced via a table at the end of the function (which is actually put on a separate section of the executable). All the work is done behind the scenes by the standard library, based on these tables (_ZTI11MyException is typeinfo for MyException).
OK, that was not actually a surprise for me, I already knew how this compiler did it. Continuing with the assembly output:
.text
.align 2
.p2align 4,,15
.globl _Z20my_throwing_functionb
.type _Z20my_throwing_functionb, #function
_Z20my_throwing_functionb:
.LFB8:
pushl %ebp
.LCFI6:
movl %esp, %ebp
.LCFI7:
subl $24, %esp
.LCFI8:
cmpb $0, 8(%ebp)
jne .L21
leave
ret
.L21:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
movl %eax, (%esp)
call __cxa_throw
.LFE8:
.size _Z20my_throwing_functionb, .-_Z20my_throwing_functionb
Here we see the code for throwing an exception. While there was no extra overhead simply because an exception might be thrown, there is obviously a lot of overhead in actually throwing and catching an exception. Most of it is hidden within __cxa_throw, which must:
Walk the stack with the help of the exception tables until it finds a handler for that exception.
Unwind the stack until it gets to that handler.
Actually call the handler.
Compare that with the cost of simply returning a value, and you see why exceptions should be used only for exceptional returns.
To finish, the rest of the assembly file:
.weak _ZTI11MyException
.section .rodata._ZTI11MyException,"aG",#progbits,_ZTI11MyException,comdat
.align 4
.type _ZTI11MyException, #object
.size _ZTI11MyException, 8
_ZTI11MyException:
.long _ZTVN10__cxxabiv117__class_type_infoE+8
.long _ZTS11MyException
.weak _ZTS11MyException
.section .rodata._ZTS11MyException,"aG",#progbits,_ZTS11MyException,comdat
.type _ZTS11MyException, #object
.size _ZTS11MyException, 14
_ZTS11MyException:
.string "11MyException"
The typeinfo data.
.section .eh_frame,"a",#progbits
.Lframe1:
.long .LECIE1-.LSCIE1
.LSCIE1:
.long 0x0
.byte 0x1
.string "zPL"
.uleb128 0x1
.sleb128 -4
.byte 0x8
.uleb128 0x6
.byte 0x0
.long __gxx_personality_v0
.byte 0x0
.byte 0xc
.uleb128 0x4
.uleb128 0x4
.byte 0x88
.uleb128 0x1
.align 4
.LECIE1:
.LSFDE3:
.long .LEFDE3-.LASFDE3
.LASFDE3:
.long .LASFDE3-.Lframe1
.long .LFB9
.long .LFE9-.LFB9
.uleb128 0x4
.long .LLSDA9
.byte 0x4
.long .LCFI2-.LFB9
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI3-.LCFI2
.byte 0xd
.uleb128 0x5
.byte 0x4
.long .LCFI5-.LCFI3
.byte 0x83
.uleb128 0x3
.align 4
.LEFDE3:
.LSFDE5:
.long .LEFDE5-.LASFDE5
.LASFDE5:
.long .LASFDE5-.Lframe1
.long .LFB8
.long .LFE8-.LFB8
.uleb128 0x4
.long 0x0
.byte 0x4
.long .LCFI6-.LFB8
.byte 0xe
.uleb128 0x8
.byte 0x85
.uleb128 0x2
.byte 0x4
.long .LCFI7-.LCFI6
.byte 0xd
.uleb128 0x5
.align 4
.LEFDE5:
.ident "GCC: (GNU) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)"
.section .note.GNU-stack,"",#progbits
Even more exception handling tables, and assorted extra information.
So, the conclusion, at least for GCC on Linux: the cost is extra space (for the handlers and tables) whether or not exceptions are thrown, plus the extra cost of parsing the tables and executing the handlers when an exception is thrown. If you use exceptions instead of error codes, and an error is rare, it can be faster, since you do not have the overhead of testing for errors anymore.
In case you want more information, in particular what all the __cxa_ functions do, see the original specification they came from:
Itanium C++ ABI
Exceptions being slow was true in the old days.
In most modern compiler this no longer holds true.
Note: Just because we have exceptions does not mean we do not use error codes as well. When error can be handled locally use error codes. When errors require more context for correction use exceptions: I wrote it much more eloquently here: What are the principles guiding your exception handling policy?
The cost of exception handling code when no exceptions are being used is practically zero.
When an exception is thrown there is some work done.
But you have to compare this against the cost of returning error codes and checking them all the way back to to point where the error can be handled. Both more time consuming to write and maintain.
Also there is one gotcha for novices:
Though Exception objects are supposed to be small some people put lots of stuff inside them. Then you have the cost of copying the exception object. The solution there is two fold:
Don't put extra stuff in your exception.
Catch by const reference.
In my opinion I would bet that the same code with exceptions is either more efficient or at least as comparable as the code without the exceptions (but has all the extra code to check function error results). Remember you are not getting anything for free the compiler is generating the code you should have written in the first place to check error codes (and usually the compiler is much more efficient than a human).
There are a number of ways you could implement exceptions, but typically they will rely on some underlying support from the OS. On Windows this is the structured exception handling mechanism.
There is decent discussion of the details on Code Project: How a C++ compiler implements exception handling
The overhead of exceptions occurs because the compiler has to generate code to keep track of which objects must be destructed in each stack frame (or more precisely scope) if an exception propagates out of that scope. If a function has no local variables on the stack that require destructors to be called then it should not have a performance penalty wrt exception handling.
Using a return code can only unwind a single level of the stack at a time, whereas an exception handling mechanism can jump much further back down the stack in one operation if there is nothing for it to do in the intermediate stack frames.
Matt Pietrek wrote an excellent article on Win32 Structured Exception Handling. While this article was originally written in 1997, it still applies today (but of course only applies to Windows).
This article examines the issue and basically finds that in practice there is a run-time cost to exceptions, although the cost is fairly low if the exception isn't thrown. Good article, recommended.
A friend of me wrote a bit how Visual C++ handles exceptions some years ago.
http://www.xyzw.de/c160.html
All good answers.
Also, think about how much easier it is to debug code that does 'if checks' as gates at the top of methods instead of allowing the code to throw exceptions.
My motto is that it's easy to write code that works. The most important thing is to write the code for the next person who looks at it. In some cases, it's you in 9 months, and you don't want to be cursing your name!