How does C++ exception handling translate to machine code - c++

Mentally, I've always wondered how try/throw/catch looks behind the scenes, when the C++ compiles translates it to assembler. But since I never use it, I never got around to checking it out (some people would say lazy).
Is the normal stack used for keeping track of trys, or is a separate per-thread stack kept for this purpose alone? Is the implementation between MSVC and g++ big or small? Please show me some pseudo asm (IA-32 is ok too) so I never have to check it out myself! :)
Edit: Now I get the basics of MSVC's implementation on IA-32 handling. Anybody know for g++ on IA-32, or any other CPU for that matter?
Edit 2 (11 years later): Here are some data on performance. They've also made source code freely available.

Poor implementations of exception handlers push some kind of exception handler block for each try clause on the runtime stack as the try clause is entered, and pop it off as the try clause is exited. A location holding the address of the most recently pushed exception handler block is also maintained. Typically these exception handlers are chained together so they can be found by following links from the most recent to older versions. When an exception occurs, a pointer to the last-pushed EH handler block is found, and processing of that "try" clause's EH cases is checked. A hit on an EH case causes stack cleanup to occur back to the point of pushed EH, and control transfers to the EH case. No hits on the EH causes the next EH to be found, and the process repeats. The Windows 32-bit SEH scheme is a version of this.
This is a poor implementation because the program pays a runtime price for each try clause (push then pop) even when no exception occurs.
Good implementations simply record a table of ranges where try clauses occur. This means there's zero overhead to enter/exit a try clause. (My PARLANSE parallell programming langauge uses this technique). An exception looks up the PC of the exception point in the table, and passes control to the EH selected by the table. The EH code resets the stack as appropriate. Fast and pretty.
I think the Windows 64 bit EH is of this type, but I haven't looked carefully.
[EDIT April 2020: Just measured the cost of PARLANSE exceptions recently. 0nS (by design) if no exception; 25ns on an 3Ghz i7 from "throw" to "catch" to "acknowledge" (end empty catch). OP added a link measuring C++ exception handling at roughly 1000ns for the simplest kind, and a literally nonStandard handling scheme that clocks in at 57ns for exception or no exception; CPU clock rates for the C++ versions are a bit slower so these numbers are only for rough comparison.]

The C++ standard committee published a technical report on "C++ performance" to debunk many myths about how C++ features supposedly slow you down. This also includes details about how exception handling could be implemented. The draft of this technical report is available for free. Check section 5.4.1. "Exception Handling Implementation Issues and Techniques".

Asm from the Godbolt compiler explorer, for the x86-64 System V calling convention with g++8.2's C++ABI, for a function that catches, and one that throws.
x86-64 System V uses the .eh_frame section for stack-unwind metadata, so the exception-helper library functions know how to walk the stack and restore registers. That's what .cfi directives do.

Related

Why do exceptions always incur overhead in non-leaf functions with destructible stack objects?

I came across the claim in the title here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
via here:
http://www.boost.org/doc/libs/1_57_0/doc/html/container/exception_handling.html
Exception handling violates the don't-pay-for-what-you-don't-use design of C++, as it incurs overhead in any non-leaf function that has
destructable stack objects regardless of whether they use exception
handling.
What is this referring to?
I take this bullet point to mean that any strategy for properly unwinding the stack in the event of an exception requires non-leaf functions to store some sort of information about destructible objects they placed on the stack. If that's correct, then my specific questions are:
What is this information that must be stored?
Why is it not possible to correctly unwind the stack given only an instruction address at which a throw occurred and tables of address ranges computed before run-time?
Modern exception handling is indeed table based and zero cost. Unfortunately it was not the case for Windows x86 - one of the most popular targets for game development. Most likely it was due to binary compatibility reasons but even Raymond Chen doesn't now the reason. In x64 they implemented it the way should be from the very beginning.
You pay in binary size.
All the code that deals with exceptions needs to be there no matter if you use exceptions or not, since in general a compiler can not know if a function can throw or not, unless it is marked noexcept (noexcept exists mostly for this reason).
The increased binary size might also hurt actual runtime performance if the code that contains the exception handling enters the CPU cache, wasting cache memory. A good compiler should be able to avoid this problem by storing all the code that performs the exception handling as far as possible from the "hot" runtime path.
Moreover, some ABI (SJLJ) implements exceptions with some runtime overhead even in the non exceptional path. Itanium and windows ABI both have zero overhead on the non-exceptional paths (and hence on these ABI you can expect exceptions to be faster than return-error-code error handling).
This llvm doc is a good starting point if you are interested in the differences between exception handling in the various ABIs.

How exactly does the SEH (structured exception handling) mechanism work on ARM?

I'm aware of the general overview: http://msdn.microsoft.com/en-us/library/dn743843.aspx ; there is plenty of information available on the exception mechanism in win32 and some for win64, but ARM is not so clear.
I would like to know: when a machine exception is generated (memory protection violation), exactly where is control transferred to (so I can set a breakpoint)? Does it transfer into kernel mode? Presumably it looks up the address of a handler from the vector table and transfers there. Does the kernel code perform the SEH unwind, or is the unwind done in user mode? If in user mode, where exactly - part of the .exe, coredll, or elsewhere?
Background
We have a large application which places __try / __except handlers around the top level of each thread in order to log crashes in an informative way. This has worked fine for years on MIPS. We're now porting it to WinCE 7 on ARM, building under VS2008. We found that exceptions would sometimes not be handled properly in release builds. This seems to depend on exactly where in the code the exception is generated (I have a test function which deliberately accesses a NULL pointer to raise a SEH exception, and I'm calling this from various locations in the code).
We're using the /EHsc option to enable SEH. Release builds use /O2 /Os ; removing /O2 makes it work but significantly slower. That leads me to believe that the optimizer is deleting something that's required for SEH to work - but what exactly? Not all functions have the "mov r12, sp ..." prologue, but some that don't still work properly. Is it some error in the PDATA? Is there an upper limit on the number of PDATA entries? Is it normal that the output of "dumpbin /pdata" doesn't have a function name on every line?
By using the debugger (despite release build), I can place a breakpoint at the start of my exception filtering function (called from __except) and observe that it is never entered in the failure case. The program just exits.
I'm not familiar with the low-level kernel details, but pointer to the CRT exception handler is stored in the dwords just before the function:
0001106C DCD _C_specific_handler ; handler function
00011070 DCD _scope_table ; scope table (__try/__except map)
00011074 start
00011074 MOV R12, SP
Maybe put your breakpoint on the _C_specific_handler and see if it's invoked. The actual function is in COREDLL. If it does get called but does not pass the exception to your code, possibly the scope table info is wrong for some reason.
EDIT: the above applies to WinCE 6 and earlier. Since you mention VS2008, I'm pretty sure WinCE 7 still uses the same exception handling model. The link you mention (unwind code-based) applies to the new, ARMv7-only WinRT kernel (I think Windows Embedded Compact 2013 uses it too).
We're using the /EHsc option to enable SEH
You're supposed to use /EHa with SEH. /EHsc allows the compiler to optimize away all exception-handling logic if it can prove that the C++ throw statement will not be reached inside the block.
Structured Exception Handling __try and __except will work even with /EHsc, but then C++ stack unwinding won't take place for structured exceptions. As a result, global state may be inconsistent after an exception, which may result in the failure of your handler.
This might be not the exact answer to your question but if your goal is to catch critical hardware exceptions and log its call stack then since Windowce CE 6.0 you can use AddVectoredExceptionHandler. In application I work (vs2005, ARM only) on I am able to get call stacks with the help of this function, also you dont need to add SEH catch/throw, exception handler will always get called.
This is very easy. Type "ARM ARM" into your favorite search engine.
A narrower seach "ARM prefetch exception".
In essence, when the processor tries to fetch data from an undefined memory area, a Data Fetch exception is generated. The processor transfers execution to a predefined address called an exception vector. The rest of the behavior is up to the programmer or OS. For example, on an embedded system, a System Failure function may be called. On a desktop platform, the OS would generate a signal or exception and terminate the program.
Other platforms may have advanced Memory Management Unit (MMU) processors that can have fences. When a program accesses outside of the fenced area, an exeception or interrupt is generated. The region would be programmed into the MMU. This enable an OS to protect regions in memory that a User's program should not access, even though memory does exist.
Accessing undefined regions of memory or memory your program does not have access to is defined as "undefined behavior". This behavior may vary by compiler or platform and there is no standard behavior. Some platforms may generate "signals", others may generate "exceptions", some pass messages, others just crash. If you want portable memory exception handling, you will need to write platform specific code.

run-time penalty of C++ try blocks [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Measuring exception handling overhead in C++
Performance when exceptions are not thrown (C++)
I have heard anecdotally that using "try" blocks in C++ slows down the code at run-time even if no exceptions occur. I have searched but have been unable to find any explanation or substantiation for this. Does anyone know if this is true & if so why?
The answer, as usually, is "it depends".
It depends on how exception handling is implemented by your compiler.
If you're using MSVC and targeting 32-bit Windows, it uses a stack-based mechanism, which requires some setup code every time you enter a try block, so yes, that means you incur a penalty any time you enter such a block, even if no exception is thrown.
Practically every other platform (other compilers, as well as MSVC targeting 64-bit Windows) use a table-based approach where some static tables are generated at compile-time, and when an exception is thrown, a simple table lookup is performed, and no setup code has to be injected into the try blocks.
There are two common ways of implementing exceptions.
One, sometimes refered to as "table-based" or "DWARF", uses static data to specify how to unwind the stack from any given point; this has no runtime overhead except when an exception is thrown.
The other, sometime referred to as "stack-based", "setjmp-longjmp" or "sjlj", maintains dynamic data to specify how to unwind the current call stack. This has some runtime overhead whenever you enter or leave a try block, and whenever you create or destroy an automatic object with a non-trivial destructor.
The first is more common in modern compilers (certainly GCC has done this by default for many years); you'll have to check your compiler documentation to see which it uses, and whether it's configurable.

Performance when exceptions are not thrown (C++)

I have already read a lot about C++ exceptions and what i see, that especially exceptions performance is a hard topic. I even tried to look under the g++'s hood to see how exceptions are represented in assembly.
I'm a C programmer, because I prefer low level languages. Some time ago I decided to use C++ over C because with small cost it can make my life much easier (classes over structures, templates etc.).
Returning back to my question, as I see exceptions do generate overhead bud only when they occur, because it require a long sequence of jumps and comparisons instructions to find a appropriate exception handler. In normal program execution (where is no error) exceptions overhead equals to normal return code checking. Am I right?
Please see my detailed response to a similar question here.
Exception handling overhead is platform specific and depends on the OS, the compiler, and the CPU architecture you're running on.
For Visual Studio, Windows, and x86, there is a cost even when exceptions are not thrown. The compiler generates additional code to keep track of the current "scope" which is later used to determine what destructors to call and where to start searching for exception filters and handlers. Scope changes are triggered by try blocks and the creation of objects with destructors.
For Visual Studio, Windows, and x86-64, the cost is essentially zero when exceptions are not thrown. The x86-64 ABI has a much stricter protocol around exception handling than x86, and the OS does a lot of heavy lifting, so the program itself does not need to keep track of as much information in order to handle exceptions.
When exceptions occur, the cost is significant, which is why they should only happen in truly exceptional cases. Handling exceptions on x86-64 is more expensive than on x86, because the architecture is optimized for the more common case of exceptions not happening.
Here's a detailed review of the cost of the exception handling when no exceptions are actually thrown:
http://www.nwcpp.org/old/Meetings/2006/10.html
In general, in every function that uses exception handling (has either try/catch blocks or automatic objects with destructor) - the compiler generates some extra prolog/epilog code to deal with the expcetion registration record.
Plus after every automatic object is constructed and destructed - a few more assembler commands are added (adjust the exception registration record).
In addition some optimizations may be disabled. Especially this is the case when you work in the so-called "asynchronous" exception handling model.

C++ return value versus exception performance

Somewhere I have read that modern Intel processors have low-level hardware for implementing exceptions and most compilers take advantage of it, to the effect that exceptions become faster than returning results state using variables.
Is it true? are exceptions faster than variables as far as returning state/responding to state? reading stack overflow on the topic seems to contradict that.
Thank you
Be aware that there's ambiguity in the term "exception handler." I believe you'll find that hardware folks when talking about exceptions mean things like:
Hardware interrupts, aka signals, whose handlers are sometimes called exception handlers (see http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/interrupts.html)
Machine check exceptions, which halt the computer if something in hardware goes wrong (see http://en.wikipedia.org/wiki/Machine_Check_Exception)
Neither of those has anything to do with C++'s exception handling facility.
As a counterexample, I have at least one anecdotal data point where exceptions were way slower than return codes: that was on Intel hardware alright, but with gcc 2.95 and a very large set of code with a very large exception table, that was constructed the first time an exception was thrown. Subsequent exceptions were fast, but by then the damage was usually done. Admittedly, gcc 2.95 is pretty ancient, but it should be enough to caution you about making generalizations about the speed of C++ exception handling, even on Intel hardware.
I don't know where you read this, but it is surely incorrect. No hardware designer would make exceptional circumstances, which are by definition uncommon, work FASTER than normal ones. Also keep in mind that C, which according to TIOBE is the most popular systems language, does not even support exceptions. It seems EXTREMELY unlikely that processors are optimized for ONE language's exception handling, whose implementation is not even standardized among compilers.
Even if, somehow, exceptions were faster, you still should not use them outside their intended purpose, lest you confuse every other programmer in the world.
No. Nothing is going to be faster than sticking a variable into a register. Even with explicit hardware support, exceptions are still going to require things like memory accesses.
C++ exceptions couldn't be implemented for the most part in that way, because c++ requires that the stack be unwound and objects destructed.
The answer is technically correct, but highly misleading.
At the core of the issue is the observation that exceptions are exceptional. They usually do not happen. This is not the case when you return an error code. This happens always, even if there is no error. In that case the function still has to return 0, or true, or -1, or ...
Now this means that a CPU and a compiler can specifically optimize functions that fail by exception. But it's important to realize what they optimize, and that's the non-failure, non-exception case - at the cost of the exceptional cases.
Once we realize that, we can look at how the compiler and CPU optimzie such cases. One common method is putting the exception code separate from the normal code. As a result, that code will normally not end up in the CPU cache, which can contain more useful code as a result. In fact, the exception code might not end up in RAM at all, and stay on disk.
Another supporting mechanism is the CPU branch predictor. It will remember that the branches that lead to exception code are usually not taken, and therefore predict that the next time they're not taken either. The compiler can even put this in as a hint. However, this hint feature was abandoned past the Intel Pentium 4; modern CPUs predicted branches well enough.
Even if they were faster, you should not use them for anything other than exceptional conditions. If you misuse them you make your program much harder to debug. In gdb you can do a 'catch throw' and easily find out where your program is going wrong and throwing an exception, but not if you're throwing exceptions as part of your regular processing.
Your question is a little unclear, because what you mean by implementing exceptions covers three things:
Entering a try block. This can have no cost, but tends to make a throw more expensive. There is a more specific question about this on SO.
Executing a throw. There is a more specific question about this on SO.
Unwinding the stack to get from a throw to its catch, and loading the error handling code (in the catch) into the CPU cache. Your should ignore this cost, because you must pay this cost if using status codes rather than exceptions.
Here is blog article where someone did some actual benchmarks: https://pspdfkit.com/blog/2020/performance-overhead-of-exceptions-in-cpp/
tl;dr: The throw/catch mechanism is about an order of magnitude slower than returning a value, so if you care about performance you should only use it in exceptional situations.