Related
I haven't found any resources that exactly answer what I am trying to understand with an issue I saw in a piece of software I am working on, so I'll ask the geniuses here!
For starters, I'm running with VxWorks on a PowerPC processor.
In trying to debug a separate issue, I tried throwing some quick and dirty debug code in an interrupt handling routine. It involved a double precision floating point operation to store a value of interest (namely, how long it had been since I saw the last interrupt come in) which I used later outside the handler in my running thread. I didn't see a problem in this (sure, it takes longer, but time-wise I had pleanty; the interrupts aren't coming in too quickly) however VxWorks sure didn't like it. It consistently crashes the when it reaches that code, one of the bad crashes that reboots the system. It took me a bit to track down the double operation as the source of the issue, and I realized it's not even double "operations", even returning a constant double from a routine called in the interrupt failed miserably.
On PowerPC (or other architectures in general) are there generally issues doing floating point operations in interrupt handlers and returning floating point (or other type) values in functions called by an interrupt handler? I'm at a loss for why this would cause a program to crash.
(The workaround was to delay the conversion of "ticks" since last interrupt to "time" since laster interrupt until the code is out of the handler, since it seems to handle long integer operations just fine.)
In VxWorks, each task that utilises floating point has to be specified as such in the task creation so that the FP registers are saved during context switches, but only when switching from tasks that use floating point. This allows non-floating point tasks to have faster context switch times.
When an interrupt pre-empts a floating point task however, it is most likely the case that FP registers are not saved. To do so, the interrupt handler would need to determine what task was pre-empted and whether it had been specified as a floating point task; this would make the interrupt latency both higher and variable, which is generally undesirable in a real-time system.
So to make it work any interrupt routine using floating point must explicitly save and restore the FP registers itself. Any task that uses floating point must be specified as such in any case, though you can get away with it if you only have one such task.
If a floating-point task is pre-empted, your interrupt will modify floating point register values in use by that task, the result of this when the FP task resumes is non-deterministic but includes causing a floating point exception - if a previously non-zero register for example, becomes zero, and is subsequently used as the right-hand of a division operation.
It seems to me however that in this case the floating point operation is probably entirely unnecessary. Your "workaround" is in fact the conventional, safest and most deterministic method, and should probably be regarded as a correction of your design rather than a workaround.
Does your ISR call the fppSave()/fppRestore() functions?
If it doesn't, then the ISR is stomping on FP registers that might be in use by existing tasks.
Specifically, FP registers are used by the C++ compiler on the PPC architecture (I think dealing with throw/catch).
In VxWorks, at least for the PPC architectures, a floating point operation will cause a FP Unavilable Exception. This is because when an interrupt occurs the FP bit in MSR is cleared because VxWorks assumes that there will be no FP operations. This speeds up ISR/Task context switching because the FP registers do not have to saved/restored.
That being said, there was a time when we had some debug code that we needed FP operations in the interrupt context. We changed the VxWorks code that calls the specific ISR to 1) set the MSR[FP], do a fpsave call, call the ISR, do a fprestore call, then clear the MSR[FP]. This got us around the problem.
That being said, I agree with the rest of the folks here that FP operations should not be used in an ISR context because that ISRs should be fast and FP operations at typically not.
I have worked with e300 core while developing bare-metal applications and I can say that when an interrupt occurs, core closes the FPU, that you can observe by checking FP bit of MSR. Before doing anything with the floating point registers, you must re-enable FPU by writing 1 to FP bit of MSR. Then you make operations on FPU registers as you want in an ISR.
The general assumption in VxWorks is that Floating Point registers don't need to be saved and restored by ISRs. Primarily because ISRs usually don't mess with them. Historically, most real-time tasks didn't do FP either, but that's obviously changed. What's not obvious is that many tasks that don't explicitly use floating point nevertheless use the floating point registers. I believe that any task with code written in C++ uses the floating point registers (at least on some processors/compilers), even though no floating point operations are obvious. Such tasks should be given the FP_? (I forget the exact spelling) task attribute, causing their FP regs to be saved during context switches.
I think you will find this article interesting. Maybe you are getting into a floating point exception.
I never used PowerPC, but I'm good with Google :P
A couple of years ago I was taught, that in real-time applications such as Embedded Systems or (Non-Linux-)Kernel-development C++-Exceptions are undesirable. (Maybe that lesson was from before gcc-2.95). But I also know, that Exception Handling has become better.
So, are C++-Exceptions in the context of real-time applications in practice
totally unwanted?
even to be switched off via via compiler-switch?
or very carefully usable?
or handled so well now, that one can use them almost freely, with a couple of things in mind?
Does C++11 change anything w.r.t. this?
Update: Does exception handling really require RTTI to be enabled (as one answerer suggested)? Are there dynamic casts involved, or similar?
Exceptions are now well-handled, and the strategies used to implement them make them in fact faster than testing return code, because their cost (in terms of speed) is virtually null, as long as you do not throw any.
However they do cost: in code-size. Exceptions usually work hand in hand with RTTI, and unfortunately RTTI is unlike any other C++ feature, in that you either activate or deactivate it for the whole project, and once activated it will generated supplementary code for any class that happens to have a virtual method, thus defying the "you don't pay for what you don't use mindset".
Also, it does require supplementary code for its handling.
Therefore the cost of exceptions should be measured not in terms of speed, but in terms of code growth.
EDIT:
From #Space_C0wb0y: This blog article gives a small overview, and introduces two widespread methods for implementing exceptions Jumps and Zero-Cost. As the name implies, good compilers now use the Zero-Cost mechanism.
The Wikipedia article on Exception Handling talk about the two mechanisms used. The Zero-Cost mechanism is the Table-Driven one.
EDIT:
From #Vlad Lazarenko whose blog I had referenced above, the presence of exception thrown might prevent a compiler from inlining and optimizing code in registers.
Answer just to the update:
Does exception handling really require
RTTI to be enabled
Exception-handling actually requires something more powerful than RTTI and dynamic cast in one respect. Consider the following code:
try {
some_function_in_another_TU();
} catch (const int &i) {
} catch (const std::logic_error &e) {}
So, when the function in the other TU throws, it's going to look up the stack (either check all levels immediately, or check one level at a time during stack unwinding, that's up to the implementation) for a catch clause that matches the object being thrown.
To perform this match, it might not need the aspect of RTTI that stores the type in each object, since the type of a thrown exception is the static type of the throw expression. But it does need to compare types in an instanceof way, and it needs to do this at runtime, because some_function_in_another_TU could be called from anywhere, with any type of catch on the stack. Unlike dynamic_cast, it needs to perform this runtime instanceof check on types which have no virtual member functions, and for that matter types which are not class types. That last part doesn't add difficulty, because non-class types have no hierarchy, and so all that's needed is type equality, but you still need type identifiers that can be compared at runtime.
So, if you enable exceptions then you need the part of RTTI that does type comparisons, like dynamic_cast's type comparisons but covering more types. You don't necessarily need the part of RTTI that stores the data used to perform this comparison in each class's vtable, where it's reachable from the object -- the data could instead only be encoded at the point of each throw expression and each catch clause. But I doubt that's a significant saving, since typeid objects aren't exactly massive, they contain a name that's often needed anyway in a symbol table, plus some implementation-defined data to describe the type hierarchy. So probably you might as well have all of RTTI by that point.
The problem with exceptions is not necessarily the speed (which may differ greatly, depending on the implementation), but it's what they actually do.
In the real-time world, when you have a time constraint on an operation, you need to know exactly what your code does. Exceptions provide shortcuts that may influence the overall run time of your code (exception handler may not fit into the real-time constraint, or due to an exception you might not return the query response at all, for example).
If you mean "real-time" as in fact "embedded", then the code size, as mentioned, becomes an issue. Embedded code may not necessarily be real-time, but it can have size constraint (and often does).
Also, embedded systems are often designed to run forever, in an infinite event loop. Exception may take you somewhere out of that loop, and also corrupt your memory and data (because of the stack unwinding) - again, depends on what you do with them, and how the compiler actually implements it.
So better safe than sorry: don't use exceptions. If you can sustain occasional system failures, if you're running in a separate task than can be easily restarted, if you're not really real-time, just pretend to be - then you probably can give it a try. If you're writing software for a heart-pacer - I would prefer to check return codes.
C++ exceptions still aren't supported by every realtime environment in a way that makes them acceptable everywhere.
In the particular example of video games (which have a soft 16.6ms deadline for every frame), the leading compilers implement C++ exceptions in such a way that simply turning on exception handling in your program will significantly slow it down and increase code size, regardless of whether you actually throw exceptions or not. Given that both performance and memory are critical on a game console, that's a dealbreaker: the PS3's SPU units, for example, have 256kb of memory for both code and data!
On top of this, throwing exceptions is still quite slow (measure it if you don't believe me) and can cause heap deallocations which are also undesirable in cases where you haven't got microseconds to spare.
The one... er... exception I have seen to this rule is cases where the exception might get thrown once per app run -- not once per frame, but literally once. In that case, structured exception handling is an acceptable way to catch stability data from the OS when a game crashes and relay it back to the developer.
The implementation of the exception mechanism is usually very slow when an exception is thrown, otherwise the costs of using them is almost none. In my opinion exceptions are very useful if you use them correctly.
In RT applications, exceptions should be thrown only when something goes bad and the program has to stop and fix the issue (and possible wait for the user interaction). Under such circumstances, it takes longer to fix the issue.
Exceptions provide hidden path of reporting an error. They make the code more shorter and more readable, therefore easier maintenance.
Typical implementations of C++ exception handling were still not ideal, and might cause the entire language implementation almost unusable for some embedded targets with extremely limited resources, even if the user code is not explicitly using these features. This is referred as "zero overhead principle violation" by recent WG21 papers, see N4049 and N4234 for details. In such environments, exception handling does not work as expected (consuming reasonable system resources) whether the application is real-time or not.
However, there should be real-time applications in embedded environments which can afford these overhead, e.g. a video player in a handheld device.
Exception handling should always be used carefully. Throwing and catching exceptions per frame in a real-time application for any platforms (not only for embedded environments) is a bad design/implementation and not acceptable in general.
There are generally 3 or 4 constraints in embedded / realtime development - especially when that implies kernel mode development
at various points - usually while handling hardware exceptions - operations MUST NOT throw more hardware exceptions. c++'s implicit data structures (vtables) and code (default constructors & operators & other implicitly generated code to support the c++ exception mechanisim) are not placeable, and cannot as a result be guaranteed to be placed in non paged memory when executed in this context.
Code quality - c++ code in general can hide a lot of complexity in statements that look trivial making code difficult to visually audit for errors. exceptions decouple handling from location, making proving code coverage of tests difficult.
C++ exposes a very simple memory model: new allocates from an infinite free store, until you run out, and it throws an exception. In memory constrained devices, more efficient code can be written that makes explicit use of fixed size blocks of memory. C+'s implicit allocations on almost any operation make it impossible to audit memory use. Also, most c++ heaps exhibit the disturbing property that there is no computable upper limit on how long a memory allocation can take - which again makes it difficult to prove the response time of algorithms on realtime devices where fixed upper limits are desirable.
Somewhere I have read that modern Intel processors have low-level hardware for implementing exceptions and most compilers take advantage of it, to the effect that exceptions become faster than returning results state using variables.
Is it true? are exceptions faster than variables as far as returning state/responding to state? reading stack overflow on the topic seems to contradict that.
Thank you
Be aware that there's ambiguity in the term "exception handler." I believe you'll find that hardware folks when talking about exceptions mean things like:
Hardware interrupts, aka signals, whose handlers are sometimes called exception handlers (see http://pages.cs.wisc.edu/~smoler/x86text/lect.notes/interrupts.html)
Machine check exceptions, which halt the computer if something in hardware goes wrong (see http://en.wikipedia.org/wiki/Machine_Check_Exception)
Neither of those has anything to do with C++'s exception handling facility.
As a counterexample, I have at least one anecdotal data point where exceptions were way slower than return codes: that was on Intel hardware alright, but with gcc 2.95 and a very large set of code with a very large exception table, that was constructed the first time an exception was thrown. Subsequent exceptions were fast, but by then the damage was usually done. Admittedly, gcc 2.95 is pretty ancient, but it should be enough to caution you about making generalizations about the speed of C++ exception handling, even on Intel hardware.
I don't know where you read this, but it is surely incorrect. No hardware designer would make exceptional circumstances, which are by definition uncommon, work FASTER than normal ones. Also keep in mind that C, which according to TIOBE is the most popular systems language, does not even support exceptions. It seems EXTREMELY unlikely that processors are optimized for ONE language's exception handling, whose implementation is not even standardized among compilers.
Even if, somehow, exceptions were faster, you still should not use them outside their intended purpose, lest you confuse every other programmer in the world.
No. Nothing is going to be faster than sticking a variable into a register. Even with explicit hardware support, exceptions are still going to require things like memory accesses.
C++ exceptions couldn't be implemented for the most part in that way, because c++ requires that the stack be unwound and objects destructed.
The answer is technically correct, but highly misleading.
At the core of the issue is the observation that exceptions are exceptional. They usually do not happen. This is not the case when you return an error code. This happens always, even if there is no error. In that case the function still has to return 0, or true, or -1, or ...
Now this means that a CPU and a compiler can specifically optimize functions that fail by exception. But it's important to realize what they optimize, and that's the non-failure, non-exception case - at the cost of the exceptional cases.
Once we realize that, we can look at how the compiler and CPU optimzie such cases. One common method is putting the exception code separate from the normal code. As a result, that code will normally not end up in the CPU cache, which can contain more useful code as a result. In fact, the exception code might not end up in RAM at all, and stay on disk.
Another supporting mechanism is the CPU branch predictor. It will remember that the branches that lead to exception code are usually not taken, and therefore predict that the next time they're not taken either. The compiler can even put this in as a hint. However, this hint feature was abandoned past the Intel Pentium 4; modern CPUs predicted branches well enough.
Even if they were faster, you should not use them for anything other than exceptional conditions. If you misuse them you make your program much harder to debug. In gdb you can do a 'catch throw' and easily find out where your program is going wrong and throwing an exception, but not if you're throwing exceptions as part of your regular processing.
Your question is a little unclear, because what you mean by implementing exceptions covers three things:
Entering a try block. This can have no cost, but tends to make a throw more expensive. There is a more specific question about this on SO.
Executing a throw. There is a more specific question about this on SO.
Unwinding the stack to get from a throw to its catch, and loading the error handling code (in the catch) into the CPU cache. Your should ignore this cost, because you must pay this cost if using status codes rather than exceptions.
Here is blog article where someone did some actual benchmarks: https://pspdfkit.com/blog/2020/performance-overhead-of-exceptions-in-cpp/
tl;dr: The throw/catch mechanism is about an order of magnitude slower than returning a value, so if you care about performance you should only use it in exceptional situations.
Mentally, I've always wondered how try/throw/catch looks behind the scenes, when the C++ compiles translates it to assembler. But since I never use it, I never got around to checking it out (some people would say lazy).
Is the normal stack used for keeping track of trys, or is a separate per-thread stack kept for this purpose alone? Is the implementation between MSVC and g++ big or small? Please show me some pseudo asm (IA-32 is ok too) so I never have to check it out myself! :)
Edit: Now I get the basics of MSVC's implementation on IA-32 handling. Anybody know for g++ on IA-32, or any other CPU for that matter?
Edit 2 (11 years later): Here are some data on performance. They've also made source code freely available.
Poor implementations of exception handlers push some kind of exception handler block for each try clause on the runtime stack as the try clause is entered, and pop it off as the try clause is exited. A location holding the address of the most recently pushed exception handler block is also maintained. Typically these exception handlers are chained together so they can be found by following links from the most recent to older versions. When an exception occurs, a pointer to the last-pushed EH handler block is found, and processing of that "try" clause's EH cases is checked. A hit on an EH case causes stack cleanup to occur back to the point of pushed EH, and control transfers to the EH case. No hits on the EH causes the next EH to be found, and the process repeats. The Windows 32-bit SEH scheme is a version of this.
This is a poor implementation because the program pays a runtime price for each try clause (push then pop) even when no exception occurs.
Good implementations simply record a table of ranges where try clauses occur. This means there's zero overhead to enter/exit a try clause. (My PARLANSE parallell programming langauge uses this technique). An exception looks up the PC of the exception point in the table, and passes control to the EH selected by the table. The EH code resets the stack as appropriate. Fast and pretty.
I think the Windows 64 bit EH is of this type, but I haven't looked carefully.
[EDIT April 2020: Just measured the cost of PARLANSE exceptions recently. 0nS (by design) if no exception; 25ns on an 3Ghz i7 from "throw" to "catch" to "acknowledge" (end empty catch). OP added a link measuring C++ exception handling at roughly 1000ns for the simplest kind, and a literally nonStandard handling scheme that clocks in at 57ns for exception or no exception; CPU clock rates for the C++ versions are a bit slower so these numbers are only for rough comparison.]
The C++ standard committee published a technical report on "C++ performance" to debunk many myths about how C++ features supposedly slow you down. This also includes details about how exception handling could be implemented. The draft of this technical report is available for free. Check section 5.4.1. "Exception Handling Implementation Issues and Techniques".
Asm from the Godbolt compiler explorer, for the x86-64 System V calling convention with g++8.2's C++ABI, for a function that catches, and one that throws.
x86-64 System V uses the .eh_frame section for stack-unwind metadata, so the exception-helper library functions know how to walk the stack and restore registers. That's what .cfi directives do.
It is often hard to find the origin of a NaN, since it can happen at any step of a computation and propagate itself.
So is it possible to make a C++ program halt when a computation returns NaN or inf? The best in my opinion would be to have a crash with a nice error message:
Foo: NaN encoutered at Foo.c:624
Is something like this possible? Do you have a better solution? How do you debug NaN problems?
EDIT: Precisions: I'm working with GCC under Linux.
You can't do it in a completely portable way, but many platforms provide C APIs that allow you to access the floating point status control register(s).
Specifically, you want to unmask the overflow and invalid floating-point exceptions, which will cause the processor to signal an exception when arithmetic in your program produces a NaN or infinity result.
On your linux system this should do the trick:
#include <fenv.h>
...
feenableexcept(FE_INVALID | FE_OVERFLOW);
You may want to learn to write a trap handler so that you can print a diagnostic message or otherwise continue execution when one of these exceptions is signaled.
Yes! Set (perhaps more or less portably) your IEEE 754-compliant processor to generate an interrupt when a NaN or infinite is encountered.
I googled and found these slides, which are a start. The slide on page 5 summarizes all the information you need.
I'm no C expert, but I expect the answer is no.
This would require every float calculation to have this check. A huge performance impact.
NaN and Inf aren't evil. They may be legitimately used in some library your app uses, and break it.