run-time penalty of C++ try blocks [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Measuring exception handling overhead in C++
Performance when exceptions are not thrown (C++)
I have heard anecdotally that using "try" blocks in C++ slows down the code at run-time even if no exceptions occur. I have searched but have been unable to find any explanation or substantiation for this. Does anyone know if this is true & if so why?

The answer, as usually, is "it depends".
It depends on how exception handling is implemented by your compiler.
If you're using MSVC and targeting 32-bit Windows, it uses a stack-based mechanism, which requires some setup code every time you enter a try block, so yes, that means you incur a penalty any time you enter such a block, even if no exception is thrown.
Practically every other platform (other compilers, as well as MSVC targeting 64-bit Windows) use a table-based approach where some static tables are generated at compile-time, and when an exception is thrown, a simple table lookup is performed, and no setup code has to be injected into the try blocks.

There are two common ways of implementing exceptions.
One, sometimes refered to as "table-based" or "DWARF", uses static data to specify how to unwind the stack from any given point; this has no runtime overhead except when an exception is thrown.
The other, sometime referred to as "stack-based", "setjmp-longjmp" or "sjlj", maintains dynamic data to specify how to unwind the current call stack. This has some runtime overhead whenever you enter or leave a try block, and whenever you create or destroy an automatic object with a non-trivial destructor.
The first is more common in modern compilers (certainly GCC has done this by default for many years); you'll have to check your compiler documentation to see which it uses, and whether it's configurable.

Related

When should I use VULKAN_HPP_NO_EXCEPTIONS?

This question is regarding exception handling in Vulkan-Hpp (official Vulkan C++ bindings).
I wrote a small application using Vulkan-Hpp without VULKAN_HPP_NO_EXCEPTIONS defined (and with exception handlers). But after coming across this stackoverflow question (Are Exceptions in C++ really slow) I started worrying about the penalty for using exceptions there. Then I found out about the define VULKAN_HPP_NO_EXCEPTIONS, but it changes the syntax completely for all calls which could throw an exception (because of different return values): that means, one has to decide before starting implementation to either use VULKAN_HPP_NO_EXCEPTIONS or not (i.e. they can't be enabled for the "Debug" Configuration and disabled for the "Release" Configuration easily).
If exception handling is disabled by defining VULKAN_HPP_NO_EXCEPTIONS
ResultValue<SomeType>::type is a struct which contains the return
value and the error code in the fields result and value.
Source
i.e.
surface = instance.createWin32SurfaceKHR(surfaceCreateInfo);
becomes
vk::ResultValue<vk::SurfaceKHR> surfaceResult = instance.createWin32SurfaceKHR(surfaceCreateInfo);
if (surfaceResult.result == vk::Result::eSuccess) {
surface = surfaceResult.value;
}
So given that it is not trivial to change the strategy regarding VULKAN_HPP_NO_EXCEPTIONS at a later stage in development, I wonder in which situations I should use VULKAN_HPP_NO_EXCEPTIONS for my project and in which situations I shouldn't?
I assume there must be some technical rationale behind it, other than just personal taste/opinion.
The principle reason exceptions can be disabled is because many game developers for various platforms turn off exception handling at the compiler level. On some platforms, exception handling is flat out not supported. Those platforms still need a reasonable means to deal with errors, and that requires a different API.
Exceptions have been a hotly debated subject in C++ and likely always will be. While C++ programmers will agree that exceptions should only be used in exceptional circumstances, the line between "exceptional circumstances" and "expected behavior" is ultimately in the eye of the beholder.
Personally, I would consider Vulkan errors to be "exceptional circumstances". Device lost and OOM errors are not things you frequently expect to happen. Plus, your response to them will likely be decidedly non-local; code higher up in the call stack will be what actually deals with it.
Furthermore, many of the functions that error are not the functions commonly encountered in performance critical Vulkan code (vkCmd*, and such). After all, usage errors are supposed to be handled by validation layers and should be impossible at runtime. Errors are usually given for object creation/destruction, and allocations, which are not things you do in the middle of building command buffers.
The erroring function most likely to be found in performance-critical code is vkAllocateDescriptorSets. And while it can error out, can only do so for memory fragmentation reasons. The standard actually requires this:
Any returned error other than VK_ERROR_OUT_OF_POOL_MEMORY_KHR or VK_ERROR_FRAGMENTED_POOL does not imply its usual meaning: applications should assume that the allocation failed due to fragmentation, and create a new descriptor pool.
Fragmentation is something that you can usually prevent, if you have firm control over your input data. Given such control, you can ensure that you never get errors when allocating from descriptor pools.
vkBegin/EndCommandBuffer can error, but only for OOM reasons. Which typically means that there's little you can do to recover, so performance is irrelevant.
The commands that give you serious runtime errors that require actions are typically device commands. And you don't issue such commands in the middle of rendering; vkQueueSubmit is the one exception, and that's at the end of rendering (or beginning; however you want to see it).
This is probably why throwing in VK_HPP is the default.

Why do exceptions always incur overhead in non-leaf functions with destructible stack objects?

I came across the claim in the title here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
via here:
http://www.boost.org/doc/libs/1_57_0/doc/html/container/exception_handling.html
Exception handling violates the don't-pay-for-what-you-don't-use design of C++, as it incurs overhead in any non-leaf function that has
destructable stack objects regardless of whether they use exception
handling.
What is this referring to?
I take this bullet point to mean that any strategy for properly unwinding the stack in the event of an exception requires non-leaf functions to store some sort of information about destructible objects they placed on the stack. If that's correct, then my specific questions are:
What is this information that must be stored?
Why is it not possible to correctly unwind the stack given only an instruction address at which a throw occurred and tables of address ranges computed before run-time?
Modern exception handling is indeed table based and zero cost. Unfortunately it was not the case for Windows x86 - one of the most popular targets for game development. Most likely it was due to binary compatibility reasons but even Raymond Chen doesn't now the reason. In x64 they implemented it the way should be from the very beginning.
You pay in binary size.
All the code that deals with exceptions needs to be there no matter if you use exceptions or not, since in general a compiler can not know if a function can throw or not, unless it is marked noexcept (noexcept exists mostly for this reason).
The increased binary size might also hurt actual runtime performance if the code that contains the exception handling enters the CPU cache, wasting cache memory. A good compiler should be able to avoid this problem by storing all the code that performs the exception handling as far as possible from the "hot" runtime path.
Moreover, some ABI (SJLJ) implements exceptions with some runtime overhead even in the non exceptional path. Itanium and windows ABI both have zero overhead on the non-exceptional paths (and hence on these ABI you can expect exceptions to be faster than return-error-code error handling).
This llvm doc is a good starting point if you are interested in the differences between exception handling in the various ABIs.

Are Exceptions still undesirable in Realtime environment?

A couple of years ago I was taught, that in real-time applications such as Embedded Systems or (Non-Linux-)Kernel-development C++-Exceptions are undesirable. (Maybe that lesson was from before gcc-2.95). But I also know, that Exception Handling has become better.
So, are C++-Exceptions in the context of real-time applications in practice
totally unwanted?
even to be switched off via via compiler-switch?
or very carefully usable?
or handled so well now, that one can use them almost freely, with a couple of things in mind?
Does C++11 change anything w.r.t. this?
Update: Does exception handling really require RTTI to be enabled (as one answerer suggested)? Are there dynamic casts involved, or similar?
Exceptions are now well-handled, and the strategies used to implement them make them in fact faster than testing return code, because their cost (in terms of speed) is virtually null, as long as you do not throw any.
However they do cost: in code-size. Exceptions usually work hand in hand with RTTI, and unfortunately RTTI is unlike any other C++ feature, in that you either activate or deactivate it for the whole project, and once activated it will generated supplementary code for any class that happens to have a virtual method, thus defying the "you don't pay for what you don't use mindset".
Also, it does require supplementary code for its handling.
Therefore the cost of exceptions should be measured not in terms of speed, but in terms of code growth.
EDIT:
From #Space_C0wb0y: This blog article gives a small overview, and introduces two widespread methods for implementing exceptions Jumps and Zero-Cost. As the name implies, good compilers now use the Zero-Cost mechanism.
The Wikipedia article on Exception Handling talk about the two mechanisms used. The Zero-Cost mechanism is the Table-Driven one.
EDIT:
From #Vlad Lazarenko whose blog I had referenced above, the presence of exception thrown might prevent a compiler from inlining and optimizing code in registers.
Answer just to the update:
Does exception handling really require
RTTI to be enabled
Exception-handling actually requires something more powerful than RTTI and dynamic cast in one respect. Consider the following code:
try {
some_function_in_another_TU();
} catch (const int &i) {
} catch (const std::logic_error &e) {}
So, when the function in the other TU throws, it's going to look up the stack (either check all levels immediately, or check one level at a time during stack unwinding, that's up to the implementation) for a catch clause that matches the object being thrown.
To perform this match, it might not need the aspect of RTTI that stores the type in each object, since the type of a thrown exception is the static type of the throw expression. But it does need to compare types in an instanceof way, and it needs to do this at runtime, because some_function_in_another_TU could be called from anywhere, with any type of catch on the stack. Unlike dynamic_cast, it needs to perform this runtime instanceof check on types which have no virtual member functions, and for that matter types which are not class types. That last part doesn't add difficulty, because non-class types have no hierarchy, and so all that's needed is type equality, but you still need type identifiers that can be compared at runtime.
So, if you enable exceptions then you need the part of RTTI that does type comparisons, like dynamic_cast's type comparisons but covering more types. You don't necessarily need the part of RTTI that stores the data used to perform this comparison in each class's vtable, where it's reachable from the object -- the data could instead only be encoded at the point of each throw expression and each catch clause. But I doubt that's a significant saving, since typeid objects aren't exactly massive, they contain a name that's often needed anyway in a symbol table, plus some implementation-defined data to describe the type hierarchy. So probably you might as well have all of RTTI by that point.
The problem with exceptions is not necessarily the speed (which may differ greatly, depending on the implementation), but it's what they actually do.
In the real-time world, when you have a time constraint on an operation, you need to know exactly what your code does. Exceptions provide shortcuts that may influence the overall run time of your code (exception handler may not fit into the real-time constraint, or due to an exception you might not return the query response at all, for example).
If you mean "real-time" as in fact "embedded", then the code size, as mentioned, becomes an issue. Embedded code may not necessarily be real-time, but it can have size constraint (and often does).
Also, embedded systems are often designed to run forever, in an infinite event loop. Exception may take you somewhere out of that loop, and also corrupt your memory and data (because of the stack unwinding) - again, depends on what you do with them, and how the compiler actually implements it.
So better safe than sorry: don't use exceptions. If you can sustain occasional system failures, if you're running in a separate task than can be easily restarted, if you're not really real-time, just pretend to be - then you probably can give it a try. If you're writing software for a heart-pacer - I would prefer to check return codes.
C++ exceptions still aren't supported by every realtime environment in a way that makes them acceptable everywhere.
In the particular example of video games (which have a soft 16.6ms deadline for every frame), the leading compilers implement C++ exceptions in such a way that simply turning on exception handling in your program will significantly slow it down and increase code size, regardless of whether you actually throw exceptions or not. Given that both performance and memory are critical on a game console, that's a dealbreaker: the PS3's SPU units, for example, have 256kb of memory for both code and data!
On top of this, throwing exceptions is still quite slow (measure it if you don't believe me) and can cause heap deallocations which are also undesirable in cases where you haven't got microseconds to spare.
The one... er... exception I have seen to this rule is cases where the exception might get thrown once per app run -- not once per frame, but literally once. In that case, structured exception handling is an acceptable way to catch stability data from the OS when a game crashes and relay it back to the developer.
The implementation of the exception mechanism is usually very slow when an exception is thrown, otherwise the costs of using them is almost none. In my opinion exceptions are very useful if you use them correctly.
In RT applications, exceptions should be thrown only when something goes bad and the program has to stop and fix the issue (and possible wait for the user interaction). Under such circumstances, it takes longer to fix the issue.
Exceptions provide hidden path of reporting an error. They make the code more shorter and more readable, therefore easier maintenance.
Typical implementations of C++ exception handling were still not ideal, and might cause the entire language implementation almost unusable for some embedded targets with extremely limited resources, even if the user code is not explicitly using these features. This is referred as "zero overhead principle violation" by recent WG21 papers, see N4049 and N4234 for details. In such environments, exception handling does not work as expected (consuming reasonable system resources) whether the application is real-time or not.
However, there should be real-time applications in embedded environments which can afford these overhead, e.g. a video player in a handheld device.
Exception handling should always be used carefully. Throwing and catching exceptions per frame in a real-time application for any platforms (not only for embedded environments) is a bad design/implementation and not acceptable in general.
There are generally 3 or 4 constraints in embedded / realtime development - especially when that implies kernel mode development
at various points - usually while handling hardware exceptions - operations MUST NOT throw more hardware exceptions. c++'s implicit data structures (vtables) and code (default constructors & operators & other implicitly generated code to support the c++ exception mechanisim) are not placeable, and cannot as a result be guaranteed to be placed in non paged memory when executed in this context.
Code quality - c++ code in general can hide a lot of complexity in statements that look trivial making code difficult to visually audit for errors. exceptions decouple handling from location, making proving code coverage of tests difficult.
C++ exposes a very simple memory model: new allocates from an infinite free store, until you run out, and it throws an exception. In memory constrained devices, more efficient code can be written that makes explicit use of fixed size blocks of memory. C+'s implicit allocations on almost any operation make it impossible to audit memory use. Also, most c++ heaps exhibit the disturbing property that there is no computable upper limit on how long a memory allocation can take - which again makes it difficult to prove the response time of algorithms on realtime devices where fixed upper limits are desirable.

Performance when exceptions are not thrown (C++)

I have already read a lot about C++ exceptions and what i see, that especially exceptions performance is a hard topic. I even tried to look under the g++'s hood to see how exceptions are represented in assembly.
I'm a C programmer, because I prefer low level languages. Some time ago I decided to use C++ over C because with small cost it can make my life much easier (classes over structures, templates etc.).
Returning back to my question, as I see exceptions do generate overhead bud only when they occur, because it require a long sequence of jumps and comparisons instructions to find a appropriate exception handler. In normal program execution (where is no error) exceptions overhead equals to normal return code checking. Am I right?
Please see my detailed response to a similar question here.
Exception handling overhead is platform specific and depends on the OS, the compiler, and the CPU architecture you're running on.
For Visual Studio, Windows, and x86, there is a cost even when exceptions are not thrown. The compiler generates additional code to keep track of the current "scope" which is later used to determine what destructors to call and where to start searching for exception filters and handlers. Scope changes are triggered by try blocks and the creation of objects with destructors.
For Visual Studio, Windows, and x86-64, the cost is essentially zero when exceptions are not thrown. The x86-64 ABI has a much stricter protocol around exception handling than x86, and the OS does a lot of heavy lifting, so the program itself does not need to keep track of as much information in order to handle exceptions.
When exceptions occur, the cost is significant, which is why they should only happen in truly exceptional cases. Handling exceptions on x86-64 is more expensive than on x86, because the architecture is optimized for the more common case of exceptions not happening.
Here's a detailed review of the cost of the exception handling when no exceptions are actually thrown:
http://www.nwcpp.org/old/Meetings/2006/10.html
In general, in every function that uses exception handling (has either try/catch blocks or automatic objects with destructor) - the compiler generates some extra prolog/epilog code to deal with the expcetion registration record.
Plus after every automatic object is constructed and destructed - a few more assembler commands are added (adjust the exception registration record).
In addition some optimizations may be disabled. Especially this is the case when you work in the so-called "asynchronous" exception handling model.

How does C++ exception handling translate to machine code

Mentally, I've always wondered how try/throw/catch looks behind the scenes, when the C++ compiles translates it to assembler. But since I never use it, I never got around to checking it out (some people would say lazy).
Is the normal stack used for keeping track of trys, or is a separate per-thread stack kept for this purpose alone? Is the implementation between MSVC and g++ big or small? Please show me some pseudo asm (IA-32 is ok too) so I never have to check it out myself! :)
Edit: Now I get the basics of MSVC's implementation on IA-32 handling. Anybody know for g++ on IA-32, or any other CPU for that matter?
Edit 2 (11 years later): Here are some data on performance. They've also made source code freely available.
Poor implementations of exception handlers push some kind of exception handler block for each try clause on the runtime stack as the try clause is entered, and pop it off as the try clause is exited. A location holding the address of the most recently pushed exception handler block is also maintained. Typically these exception handlers are chained together so they can be found by following links from the most recent to older versions. When an exception occurs, a pointer to the last-pushed EH handler block is found, and processing of that "try" clause's EH cases is checked. A hit on an EH case causes stack cleanup to occur back to the point of pushed EH, and control transfers to the EH case. No hits on the EH causes the next EH to be found, and the process repeats. The Windows 32-bit SEH scheme is a version of this.
This is a poor implementation because the program pays a runtime price for each try clause (push then pop) even when no exception occurs.
Good implementations simply record a table of ranges where try clauses occur. This means there's zero overhead to enter/exit a try clause. (My PARLANSE parallell programming langauge uses this technique). An exception looks up the PC of the exception point in the table, and passes control to the EH selected by the table. The EH code resets the stack as appropriate. Fast and pretty.
I think the Windows 64 bit EH is of this type, but I haven't looked carefully.
[EDIT April 2020: Just measured the cost of PARLANSE exceptions recently. 0nS (by design) if no exception; 25ns on an 3Ghz i7 from "throw" to "catch" to "acknowledge" (end empty catch). OP added a link measuring C++ exception handling at roughly 1000ns for the simplest kind, and a literally nonStandard handling scheme that clocks in at 57ns for exception or no exception; CPU clock rates for the C++ versions are a bit slower so these numbers are only for rough comparison.]
The C++ standard committee published a technical report on "C++ performance" to debunk many myths about how C++ features supposedly slow you down. This also includes details about how exception handling could be implemented. The draft of this technical report is available for free. Check section 5.4.1. "Exception Handling Implementation Issues and Techniques".
Asm from the Godbolt compiler explorer, for the x86-64 System V calling convention with g++8.2's C++ABI, for a function that catches, and one that throws.
x86-64 System V uses the .eh_frame section for stack-unwind metadata, so the exception-helper library functions know how to walk the stack and restore registers. That's what .cfi directives do.