Are Exceptions still undesirable in Realtime environment? - c++

A couple of years ago I was taught, that in real-time applications such as Embedded Systems or (Non-Linux-)Kernel-development C++-Exceptions are undesirable. (Maybe that lesson was from before gcc-2.95). But I also know, that Exception Handling has become better.
So, are C++-Exceptions in the context of real-time applications in practice
totally unwanted?
even to be switched off via via compiler-switch?
or very carefully usable?
or handled so well now, that one can use them almost freely, with a couple of things in mind?
Does C++11 change anything w.r.t. this?
Update: Does exception handling really require RTTI to be enabled (as one answerer suggested)? Are there dynamic casts involved, or similar?

Exceptions are now well-handled, and the strategies used to implement them make them in fact faster than testing return code, because their cost (in terms of speed) is virtually null, as long as you do not throw any.
However they do cost: in code-size. Exceptions usually work hand in hand with RTTI, and unfortunately RTTI is unlike any other C++ feature, in that you either activate or deactivate it for the whole project, and once activated it will generated supplementary code for any class that happens to have a virtual method, thus defying the "you don't pay for what you don't use mindset".
Also, it does require supplementary code for its handling.
Therefore the cost of exceptions should be measured not in terms of speed, but in terms of code growth.
EDIT:
From #Space_C0wb0y: This blog article gives a small overview, and introduces two widespread methods for implementing exceptions Jumps and Zero-Cost. As the name implies, good compilers now use the Zero-Cost mechanism.
The Wikipedia article on Exception Handling talk about the two mechanisms used. The Zero-Cost mechanism is the Table-Driven one.
EDIT:
From #Vlad Lazarenko whose blog I had referenced above, the presence of exception thrown might prevent a compiler from inlining and optimizing code in registers.

Answer just to the update:
Does exception handling really require
RTTI to be enabled
Exception-handling actually requires something more powerful than RTTI and dynamic cast in one respect. Consider the following code:
try {
some_function_in_another_TU();
} catch (const int &i) {
} catch (const std::logic_error &e) {}
So, when the function in the other TU throws, it's going to look up the stack (either check all levels immediately, or check one level at a time during stack unwinding, that's up to the implementation) for a catch clause that matches the object being thrown.
To perform this match, it might not need the aspect of RTTI that stores the type in each object, since the type of a thrown exception is the static type of the throw expression. But it does need to compare types in an instanceof way, and it needs to do this at runtime, because some_function_in_another_TU could be called from anywhere, with any type of catch on the stack. Unlike dynamic_cast, it needs to perform this runtime instanceof check on types which have no virtual member functions, and for that matter types which are not class types. That last part doesn't add difficulty, because non-class types have no hierarchy, and so all that's needed is type equality, but you still need type identifiers that can be compared at runtime.
So, if you enable exceptions then you need the part of RTTI that does type comparisons, like dynamic_cast's type comparisons but covering more types. You don't necessarily need the part of RTTI that stores the data used to perform this comparison in each class's vtable, where it's reachable from the object -- the data could instead only be encoded at the point of each throw expression and each catch clause. But I doubt that's a significant saving, since typeid objects aren't exactly massive, they contain a name that's often needed anyway in a symbol table, plus some implementation-defined data to describe the type hierarchy. So probably you might as well have all of RTTI by that point.

The problem with exceptions is not necessarily the speed (which may differ greatly, depending on the implementation), but it's what they actually do.
In the real-time world, when you have a time constraint on an operation, you need to know exactly what your code does. Exceptions provide shortcuts that may influence the overall run time of your code (exception handler may not fit into the real-time constraint, or due to an exception you might not return the query response at all, for example).
If you mean "real-time" as in fact "embedded", then the code size, as mentioned, becomes an issue. Embedded code may not necessarily be real-time, but it can have size constraint (and often does).
Also, embedded systems are often designed to run forever, in an infinite event loop. Exception may take you somewhere out of that loop, and also corrupt your memory and data (because of the stack unwinding) - again, depends on what you do with them, and how the compiler actually implements it.
So better safe than sorry: don't use exceptions. If you can sustain occasional system failures, if you're running in a separate task than can be easily restarted, if you're not really real-time, just pretend to be - then you probably can give it a try. If you're writing software for a heart-pacer - I would prefer to check return codes.

C++ exceptions still aren't supported by every realtime environment in a way that makes them acceptable everywhere.
In the particular example of video games (which have a soft 16.6ms deadline for every frame), the leading compilers implement C++ exceptions in such a way that simply turning on exception handling in your program will significantly slow it down and increase code size, regardless of whether you actually throw exceptions or not. Given that both performance and memory are critical on a game console, that's a dealbreaker: the PS3's SPU units, for example, have 256kb of memory for both code and data!
On top of this, throwing exceptions is still quite slow (measure it if you don't believe me) and can cause heap deallocations which are also undesirable in cases where you haven't got microseconds to spare.
The one... er... exception I have seen to this rule is cases where the exception might get thrown once per app run -- not once per frame, but literally once. In that case, structured exception handling is an acceptable way to catch stability data from the OS when a game crashes and relay it back to the developer.

The implementation of the exception mechanism is usually very slow when an exception is thrown, otherwise the costs of using them is almost none. In my opinion exceptions are very useful if you use them correctly.
In RT applications, exceptions should be thrown only when something goes bad and the program has to stop and fix the issue (and possible wait for the user interaction). Under such circumstances, it takes longer to fix the issue.
Exceptions provide hidden path of reporting an error. They make the code more shorter and more readable, therefore easier maintenance.

Typical implementations of C++ exception handling were still not ideal, and might cause the entire language implementation almost unusable for some embedded targets with extremely limited resources, even if the user code is not explicitly using these features. This is referred as "zero overhead principle violation" by recent WG21 papers, see N4049 and N4234 for details. In such environments, exception handling does not work as expected (consuming reasonable system resources) whether the application is real-time or not.
However, there should be real-time applications in embedded environments which can afford these overhead, e.g. a video player in a handheld device.
Exception handling should always be used carefully. Throwing and catching exceptions per frame in a real-time application for any platforms (not only for embedded environments) is a bad design/implementation and not acceptable in general.

There are generally 3 or 4 constraints in embedded / realtime development - especially when that implies kernel mode development
at various points - usually while handling hardware exceptions - operations MUST NOT throw more hardware exceptions. c++'s implicit data structures (vtables) and code (default constructors & operators & other implicitly generated code to support the c++ exception mechanisim) are not placeable, and cannot as a result be guaranteed to be placed in non paged memory when executed in this context.
Code quality - c++ code in general can hide a lot of complexity in statements that look trivial making code difficult to visually audit for errors. exceptions decouple handling from location, making proving code coverage of tests difficult.
C++ exposes a very simple memory model: new allocates from an infinite free store, until you run out, and it throws an exception. In memory constrained devices, more efficient code can be written that makes explicit use of fixed size blocks of memory. C+'s implicit allocations on almost any operation make it impossible to audit memory use. Also, most c++ heaps exhibit the disturbing property that there is no computable upper limit on how long a memory allocation can take - which again makes it difficult to prove the response time of algorithms on realtime devices where fixed upper limits are desirable.

Related

When should I use VULKAN_HPP_NO_EXCEPTIONS?

This question is regarding exception handling in Vulkan-Hpp (official Vulkan C++ bindings).
I wrote a small application using Vulkan-Hpp without VULKAN_HPP_NO_EXCEPTIONS defined (and with exception handlers). But after coming across this stackoverflow question (Are Exceptions in C++ really slow) I started worrying about the penalty for using exceptions there. Then I found out about the define VULKAN_HPP_NO_EXCEPTIONS, but it changes the syntax completely for all calls which could throw an exception (because of different return values): that means, one has to decide before starting implementation to either use VULKAN_HPP_NO_EXCEPTIONS or not (i.e. they can't be enabled for the "Debug" Configuration and disabled for the "Release" Configuration easily).
If exception handling is disabled by defining VULKAN_HPP_NO_EXCEPTIONS
ResultValue<SomeType>::type is a struct which contains the return
value and the error code in the fields result and value.
Source
i.e.
surface = instance.createWin32SurfaceKHR(surfaceCreateInfo);
becomes
vk::ResultValue<vk::SurfaceKHR> surfaceResult = instance.createWin32SurfaceKHR(surfaceCreateInfo);
if (surfaceResult.result == vk::Result::eSuccess) {
surface = surfaceResult.value;
}
So given that it is not trivial to change the strategy regarding VULKAN_HPP_NO_EXCEPTIONS at a later stage in development, I wonder in which situations I should use VULKAN_HPP_NO_EXCEPTIONS for my project and in which situations I shouldn't?
I assume there must be some technical rationale behind it, other than just personal taste/opinion.
The principle reason exceptions can be disabled is because many game developers for various platforms turn off exception handling at the compiler level. On some platforms, exception handling is flat out not supported. Those platforms still need a reasonable means to deal with errors, and that requires a different API.
Exceptions have been a hotly debated subject in C++ and likely always will be. While C++ programmers will agree that exceptions should only be used in exceptional circumstances, the line between "exceptional circumstances" and "expected behavior" is ultimately in the eye of the beholder.
Personally, I would consider Vulkan errors to be "exceptional circumstances". Device lost and OOM errors are not things you frequently expect to happen. Plus, your response to them will likely be decidedly non-local; code higher up in the call stack will be what actually deals with it.
Furthermore, many of the functions that error are not the functions commonly encountered in performance critical Vulkan code (vkCmd*, and such). After all, usage errors are supposed to be handled by validation layers and should be impossible at runtime. Errors are usually given for object creation/destruction, and allocations, which are not things you do in the middle of building command buffers.
The erroring function most likely to be found in performance-critical code is vkAllocateDescriptorSets. And while it can error out, can only do so for memory fragmentation reasons. The standard actually requires this:
Any returned error other than VK_ERROR_OUT_OF_POOL_MEMORY_KHR or VK_ERROR_FRAGMENTED_POOL does not imply its usual meaning: applications should assume that the allocation failed due to fragmentation, and create a new descriptor pool.
Fragmentation is something that you can usually prevent, if you have firm control over your input data. Given such control, you can ensure that you never get errors when allocating from descriptor pools.
vkBegin/EndCommandBuffer can error, but only for OOM reasons. Which typically means that there's little you can do to recover, so performance is irrelevant.
The commands that give you serious runtime errors that require actions are typically device commands. And you don't issue such commands in the middle of rendering; vkQueueSubmit is the one exception, and that's at the end of rendering (or beginning; however you want to see it).
This is probably why throwing in VK_HPP is the default.

Why do exceptions always incur overhead in non-leaf functions with destructible stack objects?

I came across the claim in the title here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html
via here:
http://www.boost.org/doc/libs/1_57_0/doc/html/container/exception_handling.html
Exception handling violates the don't-pay-for-what-you-don't-use design of C++, as it incurs overhead in any non-leaf function that has
destructable stack objects regardless of whether they use exception
handling.
What is this referring to?
I take this bullet point to mean that any strategy for properly unwinding the stack in the event of an exception requires non-leaf functions to store some sort of information about destructible objects they placed on the stack. If that's correct, then my specific questions are:
What is this information that must be stored?
Why is it not possible to correctly unwind the stack given only an instruction address at which a throw occurred and tables of address ranges computed before run-time?
Modern exception handling is indeed table based and zero cost. Unfortunately it was not the case for Windows x86 - one of the most popular targets for game development. Most likely it was due to binary compatibility reasons but even Raymond Chen doesn't now the reason. In x64 they implemented it the way should be from the very beginning.
You pay in binary size.
All the code that deals with exceptions needs to be there no matter if you use exceptions or not, since in general a compiler can not know if a function can throw or not, unless it is marked noexcept (noexcept exists mostly for this reason).
The increased binary size might also hurt actual runtime performance if the code that contains the exception handling enters the CPU cache, wasting cache memory. A good compiler should be able to avoid this problem by storing all the code that performs the exception handling as far as possible from the "hot" runtime path.
Moreover, some ABI (SJLJ) implements exceptions with some runtime overhead even in the non exceptional path. Itanium and windows ABI both have zero overhead on the non-exceptional paths (and hence on these ABI you can expect exceptions to be faster than return-error-code error handling).
This llvm doc is a good starting point if you are interested in the differences between exception handling in the various ABIs.

Debugging crashes in production environments

First, I should give you a bit of context. The program in question is
a fairly typical server application implemented in C++. Across the
project, as well as in all of the underlying libraries, error
management is based on C++ exceptions.
My question is pertinent to dealing with unrecoverable errors and/or
programmer errors---the loose equivalent of "unchecked" Java
exceptions, for want of a better parallel. I am especially interested
in common practices for dealing with such conditions in production
environments.
For production environments in particular, two conflicting goals stand
out in the presence of the above class of errors: ease of debugging
and availability (in the sense of operational performance). Each of
these suggests in turn a specific strategy:
Install a top-level exception handler to absorb all uncaught
exceptions, thus ensuring continuous availability. Unfortunately,
this makes error inspection more involved, forcing the programmer to
rely on fine-grained logging or other code "instrumentation"
techniques.
Crash as hard as possible; this enables one to perform a post-mortem
analysis of the condition that led to the error via a core
dump. Naturally, one has to provide a means for the system to resume
operation in a timely manner after the crash, and this may be far
from trivial.
So I end-up with two half-baked solutions; I would like a compromise
between service availability and debugging facilities. What am I
missing ?
Note: I have flagged the question as C++ specific, as I am interested
in solutions and idiosyncrasies that apply to it particular;
nonetheless, I am aware there will be considerable overlap with other
languages/environments.
Disclaimer: Much like the OP I code for servers, thus this entire answer is focused on this specific use case. The strategy for embedded software or deployed applications should probably be widely different, no idea.
First of all, there are two important (and rather different) aspects to this question:
Easing investigation (as much as possible)
Ensuring recovery
Let us treat both separately, as dividing is conquering. And let's start by the tougher bit.
Ensuring Recovery
The main issue with C++/Java style of try/catch is that it is extremely easy to corrupt your environment because try and catch can mutate what is outside their own scope. Note: contrast to Rust and Go in which a task should not share mutable data with other tasks and a fail will kill the whole task without hope of recovery.
As a result, there are 3 recovery situations:
unrecoverable: the process memory is corrupted beyond repairs
recoverable, manually: the process can be salvaged in the top-level handler at the cost of reinitializing a substantial part of its memory (caches, ...)
recoverable, automatically: okay, once we reach the top-level handler, the process is ready to be used again
An completely unrecoverable error is best addressed by crashing. Actually, in a number of cases (such as a pointer outside your process memory), the OS will help in making it crash. Unfortunately, in some cases it won't (a dangling pointer may still point within your process memory), that's how memory corruptions happen. Oops. Valgrind, Asan, Purify, etc... are tools designed to help you catch those unfortunate errors as early as possible; the debugger will assist (somewhat) for those which make it past that stage.
An error that can be recovered, but requires manual cleanup, is annoying. You will forget to clean in some rarely hit cases. Thus it should be statically prevented. A simple transformation (moving caches inside the scope of the top-level handler) allows you to transform this into an automatically recoverable situation.
In the latter case, obviously, you can just catch, log, and resume your process, waiting for the next query. Your goal should be for this to be the only situation occurring in Production (cookie points if it does not even occur).
Easing Investigation
Note: I will take the opportunity to promote a project by Mozilla called rr which could really, really, help investigating once it matures. Check the quick note at the end of this section.
Without surprise, in order to investigate you will need data. Preferably, as much as possible, and well ordered/labelled.
There are two (practiced) ways to obtain data:
continuous logging, so that when an exception occurs, you have as much context as possible
exception logging, so that upon an exception, you log as much as possible
Logging continuously implies performance overhead and (when everything goes right) a flood of useless logs. On the other hand, exception logging implies having enough trust in the system ability to perform some actions in case of exceptions (which in case of bad_alloc... oh well).
In general, I would advise a mix of both.
Continuous Logging
Each log should contain:
a timestamp (as precise as possible)
(possibly) the server name, the process ID and thread ID
(possibly) a query/session correlator
the filename, line number and function name of where this log came from
of course, a message, which should contain dynamic information (if you have a static message, you can probably enrich it with dynamic information)
What is worth logging ?
At least I/O. All inputs, at least, and outputs can help spotting the first deviation from expected behavior. I/O include: inbound query and corresponding response, as well as interactions with other servers, databases, various local caches, timestamps (for time-related decisions), ...
The goal of such logging is to be able to reproduce the issue spotted in a control environment (which can be setup thanks to all this information). As a bonus, it can be useful as crude performance monitor since it gives some check-points during the process (note: I am talking about monitoring and not profiling for a reason, this can allow you to raise alerts and spot where, roughly, time is spent, but you will need more advanced analysis to understand why).
Exception Logging
The other option is to enrich exception. As an example of a crude exception: std::out_of_range yields the follow reason (from what): vector::_M_range_check when thrown from libstdc++'s vector.
This is pretty much useless if, like me, vector is your container of choice and therefore there are about 3,640 locations in your code where this could have been thrown.
The basics, to get a useful exception, are:
a precise message: "access to index 32 in vector of size 4" is slightly more helpful, no ?
a call stack: it requires platform specific code to retrieve it, though, but can be automatically inserted in your base exception constructor, so go for it!
Note: once you have a call-stack in your exceptions, you will quickly find yourself addicted and wrapping lesser-abled 3rd party software into an adapter layer if only to translate their exceptions into yours; we all did it ;)
On top of those basics, there is a very interesting feature of RAII: attaching notes to the current exception during unwinding. A simple handler retaining a reference to a variable and checking whether an exception is unwinding in its destructor costs only a single if check in general, and does all the important logging when unwinding (but then, exception propagation is costly already, so...).
Finally, you can also enrich and rethrow in catch clauses, but this quickly litters the code with try/catch blocks so I advise using RAII instead.
Note: there is a reason that std exceptions do NOT allocate memory, it allows throwing exceptions without the throw being itself preempted by a std::bad_alloc; I advise to consciously pick having richer exceptions in general with the potential of a std::bad_alloc thrown when attempting to create an exception (which I have yet to see happening). You have to make your own choice.
And Delayed Logging ?
The idea behind delayed logging is that instead of calling your log handler, as usual, you will instead defer logging all finer-grained traces and only get to them in case of issue (aka, exception).
The idea, therefore, is to split logging:
important information is logged immediately
finer-grained information is written to a scratch-pad, which can be called to log them in case of exception
Of course, there are questions:
the scratch pad is (mostly) lost in case of crash; you should be able to access it via your debugger if you get a memory dump though it's not as pleasant.
the scratch pad requires a policy: when to discard it ? (end of the session ? end of the transaction ? ...), how much memory ? (as much as it wants ? bounded ? ...)
what of the performance cost: even if not writing the logs to disk/network, it still cost to format them!
I have actually never used such a scratch pad, for now all non-crasher bugs that I ever had were solved solely using I/O logging and rich exceptions. Still, should I implement it I would recommend making it:
transaction local: since I/O is logged, we should not need more insight that this
memory bounded: evicting older traces as we progress
log-level driven: just as regular logging, I would want to be able to only enable some logs to get into the scratch pad
And Conditional / Probabilistic Logging ?
Writing one trace every N is not really interesting; it's actually more confusing than anything. On the other hand, logging in-depth one transaction every N can help!
The idea here is to reduce the amount of logs written, in general, whilst still getting a chance to observe bugs traces in detail in the wild. The reduction is generally driven by the logging infrastructure constraints (there is a cost to transferring and writing all those bytes) or by the performance of the software (formatting the logs slows software down).
The idea of probabilistic logging is to "flip a coin" at the start of each session/transaction to decide whether it'll be a fast one or a slow one :)
A similar idea (conditional logging) is to read a special debug field in a transaction field that initiates a full logging (at the cost of speed).
A quick note on rr
With an overhead of only 20%, and this overhead applying only on the CPU processing, it might actually be worth using rr systematically. If this is not feasible, however, it could be feasible to have 1 out of N servers being launched under rr and used to catch hard to find bugs.
This is similar to A/B testing, but for debugging purposes, and can be driven either by a willing commitment of the client (flag in the transaction) or with a probabilistic approach.
Oh, and in the general case, when you are not hunting down anything, it can be easily deactivated altogether. No sense in paying those 20% then.
That's all folks
I could apologize for the lengthy read, but the truth I probably just skimmed the topic. Error Recovery is hard. I would appreciate comments and remarks, to help improve this answer.
If the error is unrecoverable, by definition there is nothing the application can do in production environment, to recover from the error. In other words, the top-level exception handler is not really a solution. Even if the application displays a friendly message like "access violation", "possible memory corruption", etc, that doesn't actually increase availability.
When the application crashes in a production environment, you should get as much information as possible for post-mortem analysis (your second solution).
That said, if you get unrecoverable errors in a production environment, the main problems are your product QA process (it's lacking), and (much before that), writing unsafe/untested code.
When you finish investigating such a crash, you should not only fix the code, but fix your development process so that such crashes are no longer possible (i.e. if the corruption is an uninitialized pointer write, go over your code base and initialize all pointers and so on).

Performance when exceptions are not thrown (C++)

I have already read a lot about C++ exceptions and what i see, that especially exceptions performance is a hard topic. I even tried to look under the g++'s hood to see how exceptions are represented in assembly.
I'm a C programmer, because I prefer low level languages. Some time ago I decided to use C++ over C because with small cost it can make my life much easier (classes over structures, templates etc.).
Returning back to my question, as I see exceptions do generate overhead bud only when they occur, because it require a long sequence of jumps and comparisons instructions to find a appropriate exception handler. In normal program execution (where is no error) exceptions overhead equals to normal return code checking. Am I right?
Please see my detailed response to a similar question here.
Exception handling overhead is platform specific and depends on the OS, the compiler, and the CPU architecture you're running on.
For Visual Studio, Windows, and x86, there is a cost even when exceptions are not thrown. The compiler generates additional code to keep track of the current "scope" which is later used to determine what destructors to call and where to start searching for exception filters and handlers. Scope changes are triggered by try blocks and the creation of objects with destructors.
For Visual Studio, Windows, and x86-64, the cost is essentially zero when exceptions are not thrown. The x86-64 ABI has a much stricter protocol around exception handling than x86, and the OS does a lot of heavy lifting, so the program itself does not need to keep track of as much information in order to handle exceptions.
When exceptions occur, the cost is significant, which is why they should only happen in truly exceptional cases. Handling exceptions on x86-64 is more expensive than on x86, because the architecture is optimized for the more common case of exceptions not happening.
Here's a detailed review of the cost of the exception handling when no exceptions are actually thrown:
http://www.nwcpp.org/old/Meetings/2006/10.html
In general, in every function that uses exception handling (has either try/catch blocks or automatic objects with destructor) - the compiler generates some extra prolog/epilog code to deal with the expcetion registration record.
Plus after every automatic object is constructed and destructed - a few more assembler commands are added (adjust the exception registration record).
In addition some optimizations may be disabled. Especially this is the case when you work in the so-called "asynchronous" exception handling model.

C++ return value versus exception performance

Somewhere I have read that modern Intel processors have low-level hardware for implementing exceptions and most compilers take advantage of it, to the effect that exceptions become faster than returning results state using variables.
Is it true? are exceptions faster than variables as far as returning state/responding to state? reading stack overflow on the topic seems to contradict that.
Thank you
Be aware that there's ambiguity in the term "exception handler." I believe you'll find that hardware folks when talking about exceptions mean things like:
Hardware interrupts, aka signals, whose handlers are sometimes called exception handlers (see http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/interrupts.html)
Machine check exceptions, which halt the computer if something in hardware goes wrong (see http://en.wikipedia.org/wiki/Machine_Check_Exception)
Neither of those has anything to do with C++'s exception handling facility.
As a counterexample, I have at least one anecdotal data point where exceptions were way slower than return codes: that was on Intel hardware alright, but with gcc 2.95 and a very large set of code with a very large exception table, that was constructed the first time an exception was thrown. Subsequent exceptions were fast, but by then the damage was usually done. Admittedly, gcc 2.95 is pretty ancient, but it should be enough to caution you about making generalizations about the speed of C++ exception handling, even on Intel hardware.
I don't know where you read this, but it is surely incorrect. No hardware designer would make exceptional circumstances, which are by definition uncommon, work FASTER than normal ones. Also keep in mind that C, which according to TIOBE is the most popular systems language, does not even support exceptions. It seems EXTREMELY unlikely that processors are optimized for ONE language's exception handling, whose implementation is not even standardized among compilers.
Even if, somehow, exceptions were faster, you still should not use them outside their intended purpose, lest you confuse every other programmer in the world.
No. Nothing is going to be faster than sticking a variable into a register. Even with explicit hardware support, exceptions are still going to require things like memory accesses.
C++ exceptions couldn't be implemented for the most part in that way, because c++ requires that the stack be unwound and objects destructed.
The answer is technically correct, but highly misleading.
At the core of the issue is the observation that exceptions are exceptional. They usually do not happen. This is not the case when you return an error code. This happens always, even if there is no error. In that case the function still has to return 0, or true, or -1, or ...
Now this means that a CPU and a compiler can specifically optimize functions that fail by exception. But it's important to realize what they optimize, and that's the non-failure, non-exception case - at the cost of the exceptional cases.
Once we realize that, we can look at how the compiler and CPU optimzie such cases. One common method is putting the exception code separate from the normal code. As a result, that code will normally not end up in the CPU cache, which can contain more useful code as a result. In fact, the exception code might not end up in RAM at all, and stay on disk.
Another supporting mechanism is the CPU branch predictor. It will remember that the branches that lead to exception code are usually not taken, and therefore predict that the next time they're not taken either. The compiler can even put this in as a hint. However, this hint feature was abandoned past the Intel Pentium 4; modern CPUs predicted branches well enough.
Even if they were faster, you should not use them for anything other than exceptional conditions. If you misuse them you make your program much harder to debug. In gdb you can do a 'catch throw' and easily find out where your program is going wrong and throwing an exception, but not if you're throwing exceptions as part of your regular processing.
Your question is a little unclear, because what you mean by implementing exceptions covers three things:
Entering a try block. This can have no cost, but tends to make a throw more expensive. There is a more specific question about this on SO.
Executing a throw. There is a more specific question about this on SO.
Unwinding the stack to get from a throw to its catch, and loading the error handling code (in the catch) into the CPU cache. Your should ignore this cost, because you must pay this cost if using status codes rather than exceptions.
Here is blog article where someone did some actual benchmarks: https://pspdfkit.com/blog/2020/performance-overhead-of-exceptions-in-cpp/
tl;dr: The throw/catch mechanism is about an order of magnitude slower than returning a value, so if you care about performance you should only use it in exceptional situations.