Related
Why is this a common practice to have assert macro do something useful only in debug configuration? If it exists to test invariants and detect coding bugs, then would not it be easier to go ahead and do the same big boom in production software?
I have some S60 background and there exist __ASSERT_ALWAYS and __ASSERT_DEBUG, where the latter is equivalent to assert.
Asserts are made for stuff that should never happen, i.e. if it does then there is a bug in your code that you need to fix. Releases are builds that are "supposed" to be bug free, and killing application with the assert for the user is as bad as any other faulty behavior.
Checking for assertion costs. You have extra operations that you may not want to exist in the final product. If assertions were always going to work, then people would start using them less "to not kill performance". And believe me, there are a lot of people out there who consider the extra checks a performance kill and would avoid it. These are the same people who actually have to use assert more!
A more important reason is that, assert, if failed, will just abort your program. There is no usefulness in that whatsoever for the end user (except perhaps for security, to avoid running code with unexpected data, but that's arguably better handled by actual error checking). If you want your program to actually terminate with a message or do something useful, you would have to write your own assert. In that case, you can of course choose to keep it in release mode also.
Finally, assertion helps you find bugs, especially hidden bugs, but in the execution of the software, they may actually not happen. Imagine the following code:
struct X
{
// other stuff
int stage;
};
X x;
... do some stuff
assert(x.stage == STAGE_2);
x.stage = STAGE_3; // go to next stage
... do more stuff
In such an example, your logic says x should be in STAGE_2. If it is not, it's a bug. However, if you remove the assert, fix x.stage and move on, there is hope that the bug is not so severe. In such a case, the end-user can actually continue working without noticing this. If you had kept assert in release mode too, you would force the application to exit over a bug that didn't have any visible effect.
In reality, you get updates all the times for your software, in which they claim they have fixed bugs. Some of those, are indeed bugs that assert would have caught. However, you as the end-user didn't have any problem and were actually happy that you weren't interrupted due to those asserts, weren't you?
I think it's a cultural thing. The arguments in favour removing this kind of check in production code go like this:
It makes your code run slower.
It makes the final executable larger.
Your code shouldn't have bugs in once it ships.
It will cause your program to exit suddenly and violently, with possible loss of data.
The arguments against run as follows
You're shipping the exact code that you tested.
Debugging problems reported in the field gets much easier
Regardless of what you'd like to think, the code you ship WILL have bugs
Performance and size effects are typically minimal.
Failing fast may be preferable to attempting to continue when your program is in an undesired state.
Personally, I ship software that's built exactly as it's tested, asserts and all. But, a lot depends on your customer base and how you hope to schedule releases...
This article is worth a read:- http://www.martinfowler.com/ieeeSoftware/failFast.pdf
but what about when you deploy the software to customers? We don’t
want the application to crash just because there’s a typo in a
configuration file. One reaction to this fear is to disable assertions
in the field.
Don’t do that! Remember, an error that occurs at the
customer’s site made it through your testing process. You’ll probably
have trouble reproducing it. These errors are the hardest to find, and
a well-placed assertion explaining the problem could save you days of
effort.
One other thing - in C++, using BOOST_ASSERT you can set it to throw an exception on assertion failure, which makes handling and potentially recovering from assertion failures even more useful. we use this in conjunction with MadExcept so that any assert failures in the field can be easily posted by the user into our bug tracker, with complete call stacks, screenshots, whathaveyou.
assert works unless you explicitly turn it off. There's (usually) no
reason to turn it off, even in "release" builds, and most of the
released code I've delivered has had assert active.
The main reason for turning it off is performance. In this case, you
turn it off very locally, in the function where the performance is
critical, with something like:
#ifdef TURNOFFCRITICALASSERTS
#define NDEBUG
#include <assert.h>
#endif
// Function with critical code here...
#undef NDEBUG
#include <assert.h>
This is the way the C standards committee designed assert to be used.
Generally, you should not define NDEBUG except locally, like this.
In developing a large C++ programming project with many developers, we have run into issues with inappropriate use of assert() in the code which results in poor quality where the assertion does indeed occur and the product crashes.
The question is what are good principles to apply to use assert() appropriately? When is it proper to use an assert() and when is it not? Is there a list of criteria that each assertion should pass in order to be legitimate? How can we encourage proper use of assert()?
As a first crack at this, I would say that assert() should only be used to document a condition that is believed to be impossible to reach and which should be identified as an assert() failure at run time where it ever to arise because programming assumptions are being violated.
Can folks do better than this? What is your experience with assert()?
Use Exceptions for error condition which come from the outside (outside the method or outside the program) like parameter checking and missing/defective external ressources like files or connections or user input.
Use Assertions to indicate an internal defects like programming errors, conditions that shouldn't occur, e.g. class/method invariants and invalid program state.
You should use assert to check all conditions that should never happen:
Preconditions on input parameters
Results of intermediate calculations
Postconditions on object state
But you should include those asserts only in debug builds or when explicitly activated for release (not in builds released to the customers).
I use asserts to check for any unwanted program state:
Function preconditions
Sometimes I insert them in a macro after every API call: glDrawArray(); checkOpenGLError();--checkOpenGLError() will call getGLError() if turned on
Data structure integrity: assert(something == null);
Sometimes GDB lies to me (iOS SDK 3.2). I use asserts to prove it.
Note that "unwanted program state" excludes errors that naturally occur at runtime, such as being unable to open a user-selected file due to permissions or HD failure. In these cases it is not wise to use assertions.
Much code nowadays has a lot of external dependencies and connections. I don't tend to use traditional assertions much these days, I favor exceptions. I don't feel like I can assume "this can never happen" and the checks can safely be removed in a non-debug build.
Most advice concerning error handling boils down to a handful of tips and tricks (see this post for example). These hints are helpful but I think they don't answer all questions. I feel that I should design my application according to a certain philosophy, a school of thought that provides a strong foundation to build upon. Is there such a theory on the topic of error handling?
Here's a few practical questions:
How to decide if an error should be handled locally or propagated to higher level code?
How to decide between logging an error, or showing it as an error message to the user?
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
Does it make sense to create a list of error codes? Or is that old fashioned these days?
In many cases common sense is sufficient for developing a good-enough strategy to deal with error conditions. However, I would like to know if there is a more formal/"scholarly" approach?
PS: this is a general question, but C++ specific answers are welcome too (C++ is my main programming language for work).
Is logging something that should only
be done in application code? Or is it
ok to do some logging from library
code.
Just wanted to comment on this. My view is to never logg directly in the library code, but provide hooks or callbacks to implement this in the application code, so the application can decide what to do with the output from the log (if anything at all).
A couple of years ago I thought exactly about the same question :)
After searching and reading several things, I think that the most interesting reference I found was Patterns for Generation, Handling and Management of Errors from Andy Longshaw and Eoin Woods. It is a short and systematic attempt to cover the basic idioms you mention and some others.
The answer to these questions is quite controversial, but the authors above were brave enough to expose themselves in a conference, and then put their thoughts on paper.
Introduction
To understand what needs to be done for error handling, I think one needs clearly to understand the types of errors one encounters, and the contexts in which one encounters them.
To me, it has been extremely useful to consider the two major types of errors as:
Errors that should never happen, and are typically due to a bug in the code.
Errors which are expected and cannot be prevented in normal operation, such as a database connection going down because of a database issue over which the application has no control.
The way an error should be handled depends heavily on which type of error it is.
The differing contexts which affect how errors should be handled are:
Application code
Library code
The handling of errors in library code differs somewhat from the handling in application code.
A philosophy for handling of the two major types of errors is discussed below. The special considerations for library code are also addressed. Finally, the specific practical questions in the original post are addressed in the context of the philosophy presented.
Types of errors
Programming errors - bugs - and other errors that should never happen
Many errors are the result of programming mistakes. These errors typically cannot be corrected, since the specific programming mistake cannot be anticipated. That means we can't know in advance what condition the mistake leaves the application in, so we can't recover from that condition and shouldn't try.
Ultimately, the fix to this kind of error is to fix the programming mistake. To facilitate that, the error should be surfaced as quickly as possible. Ideally, the program should exit immediately after identifying such an error and providing the relevant information. A quick and obvious exit reduces the time required to complete the debug and retest cycle, permitting more bugs to be fixed in the same amount of testing time; that in turn results in having a more robust application with fewer bugs when it comes time to deploy.
The other major objective in handling this type of error should be to provide sufficient information to make it easy to identify the bug. In Java, for example, throwing a RuntimeException often provides sufficient information in the stack trace to identify the bug immediately; in clean code, immediate fixes can often be identified just from examining the stack trace. In other languages, one might log the call stack or otherwise preserve the necessary information. It is critical not to suppress information in the interests of brevity; don't worry about how much log space you are taking up when this type of error occurs. The more information that is provided, the quicker the bugs can be fixed, and the fewer bugs will remain to pollute the logs when the application makes it to production.
Server applications
Now, in some server applications, it's important that the server be sufficiently fault tolerant to continue operation even in the face of occasional programming errors. In this case, the best approach is to have a very clear separation between the server code that must continue operation and the task processing code that can be allowed to fail. For example, tasks can be relegated to threads or subprocesses, as is done in many web servers.
In such a server architecture, the thread or subprocess handling the task can then be treated like an application which can fail. All the considerations above apply to such a task: the error should be surfaced as quickly as possible by a clean exit from the task, and sufficient information should be logged to permit the bug to be easily found and fixed. When such a task exits in Java, for example, the entire stack trace of any RuntimeException causing the exit should normally be logged.
As much of the code as possible should be executed within the threads or processes handling the task, rather than in the main server thread or process. This is because any bug in the main server thread or process will still cause the entire server to go down. It's better to push the code - with the bugs it contains - into the task handling code where it won't cause a server crash when the bug manifests itself.
Errors that are expected and cannot be prevented in normal operation
Errors that are expected and cannot be prevented in normal operation, such as an exception from a database or other service separate from the application, require very different treatment. In these cases, the objective is not to fix the code, but rather to have the code handle the error when that makes sense, and inform users or operators who can fix the problem otherwise.
In these cases, for example, the application may wish to throw away any results that have accumulated thus far, and retry the operation. In database access, use of transactions can help ensure that accumulated data is discarded. In other cases, it can be useful to write one's code with such retries in mind. The concept of idempotency can also be useful here.
When automated retries won't sufficiently solve the problem, human beings should be informed. The user should be informed that the operation failed; often the user can be given the option of retrying. The user can then judge whether a retry is desirable, and can also make alterations in input that might help things go better on a retry.
For this type of error, logging and perhaps email notices can be used to inform system operators. Unlike logging of programming errors, logging of errors that are expected in normal operation should be more succinct, since the error may happen many times and appear many times in the logs; operators will often be analyzing the pattern of many errors, rather than focusing on one individual error.
Libraries and applications
The above discussion of types of errors is directly applicable to application code. The other major context for error handling is library code. Library code still has the same two basic types of errors, but it typically cannot or should not communicate directly with the user, and and it has less knowledge about the application context, including whether an immediate exit is acceptable, than does the application code.
As a result, there are differences in how libraries should handle logging, how they should handle errors that may be expected in normal operation, and how they should handle programming errors and other errors that should never happen.
With respect to logging, the library should if possible support logging in the format desired by the client application code. One valid approach is to do no logging at all, and allow the application code to do all logging based on error information provided to the application code by the library. Another approach is to use a configurable logging interface, allowing the client application to provide the implementation for the logging, for example when the library is first loaded. In Java, for example, the library might use the logback logging interface, and allow the application to worry about what logging implementation to configure for logback to use.
For bugs and other errors that should never happen, libraries still cannot simply exit the application, since that may not be acceptable to the application. Rather, libraries should exit the library call, providing the caller with sufficient information to help diagnose the problem. The information may be provided in the form of an exception with a stack trace, or the library may log the information if the configurable logging approach is being used. The application can then treat this as it would any other error of this type, typically by exiting, or in a server, by allowing the task process or thread to exit, with the same logging or error reporting that would be done for programming errors in the application code.
Errors that are expected in normal operation should be also be reported to the client code. In this case, as with this type of error when encountered in the client code, the information associated with the error can be more succinct. Typically libraries should do less local handling of this type of error, relying more on the client code to decide things like whether to retry and for how many times. The client code can then pass along the retry decision to the user if desired.
Practical questions
Now that we have the philosophy, let's apply it to the practical questions you mention.
How to decide if an error should be handled locally or propagated to higher level code?
If it is an error that is expected in normal operation, retry or possibly consult the user locally. Otherwise, propagate it to higher level code.
How to decide between logging an error, or showing it as an error message to the user?
If it is an error that is expected in normal operation, and user input would be useful to determine what action to take, get user input and log a succinct message; if it seems to be a programming error, provide the user with a brief notification and log more extensive information.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
Logging from the library code should be under the control of the client code. At most, the library should log to an interface for which the client provides the implementation.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Exceptions that are expected in normal operation can be caught locally and the operation retried or otherwise handled. In all other cases, exceptions should be allowed to propagate.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
The types of errors in third party libraries are the same types of errors that occur in application code. Errors should be handled primarily according to which error type they represent, with relevant adjustments for library code.
Does it make sense to create a list of error codes? Or is that old fashioned these days?
Application code should provide a complete description of the error in the case of programming errors, and a succinct description in the case of errors that can occur in normal operation; in either case, a description is normally more appropriate than an error code. Libraries may provide an error code as a way of describing whether an error is a programming or other internal error, or whether the error is one which can occur in normal operation, with the latter type perhaps subdivided more finely; however, an exception hierarchy can be more useful than an error code in languages where such is possible. Note that applications run from the command line may act as libraries for shell scripts, however.
Disclaimer: I do not know any theory on error-handling, I did, however, thought repetitively about the subject as I explored various languages and programming paradigms, as well as toyed around with programming language designs (and discussed them). What follows, thus, is a summary of my experience so far; with objective arguments.
Note: this should cover all the questions, but I did not even try to address them in order, preferring instead a structured presentation. At the end of each section, I present a succinct answer to those questions it answered, for clarity.
Introduction
As a premise, I would like to note that whatever is subject to discussion some parameters must be kept in mind when designing a library (or reusable code).
The author cannot hope to fathom how this library will be used, and should thus avoid strategies that make integration more difficult than it should. The most glaring defect would be relying on globally shared state; thread-local shared state can also be a nightmare for interactions with coroutines/green-threads. The use of such coroutines and threads also highlight that synchronization best be left to the user, in single-threaded code it will mean none (best performance), whilst in coroutines and green-threads the user is best suited to implement (or use existing implementations of) dedicated synchronization mechanisms.
That being said, when library are for internal use only, global or thread-local variables might be convenient; if used, they should be clearly documented as a technical limitation.
Logging
There are many ways to log messages:
with extra information such as timestamp, process-ID, thread-ID, server name/IP, ...
via synchronous calls or with an asynchronous mechanism (and an overflow handling mechanism)
in files, databases, distributed databases, dedicated log-servers, ...
As the author of a library, the logs should be integrated within the client infrastructure (or turned off). This is best provided by allowing the client to provide hooks so as to deal with the logs themselves, my recommendation is:
to provide 2 hooks: one to decide whether to log or not, and one to actually log (the message being formatted and the latter hook called only when the client decided to log)
to provide, on top of the message: a severity (aka level), the filename, line and function name if open-source or otherwise the logical module (if several)
to, by default, write to stdout and stderr (depending on severity), until the client explicitly says not to log
I would note that, following the guidelines delineated in the introduction, synchronization is left to the client.
Regarding whether to log errors: do not log (as errors) what you otherwise already report via your API; you can however still log at a lower severity the details. The client can decide whether to report or not when handling the error, and for example choose not to report it if this was just a speculative call.
Note: some information should not make it into the logs and some other pieces are best obfuscated. For example, passwords should not be logged, and Credit-Card or Passport/Social Security Numbers are best obfuscated (partly at least). In a library designed for such sensitive information, this can be done during logging; otherwise the application should take care of this.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
Application code should decide the policy. Whether a library logs or not depends on whether it needs to.
Going on after an error ?
Before we actually talk about reporting errors, the first question we should ask is whether the error should be reported (for handling) or if things are so wrong that aborting the current process is clearly the best policy.
This is certainly a tricky topic. In general, I would advise to design such that going on is an option, with a purge/reset if necessary. If this cannot be achieved in certain cases, then those cases should provoke an abortion of the process.
Note: on some systems, it is possible to get a memory-dump of the process. If an application handles sensitive data (password, credit-cards, passports, ...), it is best deactivated in production (but can be used during development).
Note: it can be interesting to have a debug switch that transforms a portion of the error-reporting calls into abortions with a memory-dump to assist debugging during development.
Reporting an error
The occurrence of an error signifies that the contract of a function/interface could not be fulfilled. This has several consequences:
the client should be warned, which is why the error should be reported
no partially correct data should escape in the wild
The latter point will be treated later on; for now let us focus on reporting the error. The client should not, ever, be able to accidentally ignore this report. Which is why using error-codes is such an abomination (in languages when return values can be ignored):
ErrorStatus_t doit(Input const* input, Output* output);
I know of two schemes that require explicit action on the client part:
exceptions
result types (optional<T>, either<T, U>, ...)
The former is well-known, the latter is very much used in functional languages and was introduced in C++11 under the guise of std::future<T> though other implementations exist.
I advise to prefer the latter, when possible, as it easier to fathom, but revert to exceptions when no result is expected. Contrast:
Option<Value&> find(Key const&);
void updateName(Client::Id id, Client::Name name);
In the case of "write-only" operations such as updateName, the client has no use for a result. It could be introduced, but it would be easy to forget the check.
Reverting to exceptions also occur when a result type is impractical, or insufficient to convey the details:
Option<Value&> compute(RepositoryInterface&, Details...);
In such a case of externally defined callback, there is an almost infinite list of potential failures. The implementation could use the network, a database, the filesystem, ... in this case, and in order to report errors accurately:
the externally defined callback should be expected to report errors via exceptions when the interface is insufficient (or impractical) to convey the full details of the error.
the functions based on this abstract callback should be transparent to those exceptions (let them pass, unmodified)
The goal is to let this exception bubble up to the layer where the implementation of the interface was decided (at least), for it's only at this level that there is a chance to correctly interpret the exception thrown.
Note: the externally defined callback is not forced to use exceptions, we should just expect it might be using some.
Using an error
In order to use an error report, the client need enough information to take a decision. Structured information, such as error codes or exception types, should be preferred (for automatic actions) and additional information (message, stack, ...) can be provided in a non-structured way (for humans to investigate).
It would be best if a function clearly documented all possible failure modes: when they occur and how they are reported. However, especially in case arbitrary code is executed, the client should be prepared to deal with unknown codes/exceptions.
A notable exception is, of course, result types: boost::variant<Output, Error0, Error1, ...> provides a compiler-checked exhaustive list of known failure modes... though a function returning this type could still throw, of course.
How to decide between logging an error, or showing it as an error message to the user?
The user should always be warned when its order could not be fulfilled, however a user-friendly (understandable) message should be displayed. If possible, advices or work-arounds should be presented as well. Details are for investigating teams.
Recovering from an error ?
Last, but certainly not least, comes the truly frightening part about errors: recovery.
This is something that databases (real ones) are so good for: transaction-like semantics. If anything unexpected occurs, the transaction is aborted as if nothing had happened.
In the real world, things are not simple. The simple example of cancelling an e-mail sent pops to mind: too late. Protocols may exist, depending on your application domain, but this is out of this discussion. The first step, though, is the ability to recover a sane in-memory state; and that is far from being simple in most languages (and STM can only do so much today).
First of all, an illustration of the challenge:
void update(Client& client, Client::Name name, Client::Address address) {
client.update(std::move(name));
client.update(std::move(address)); // Throws
}
Now, after updating the address failed, I am left with a half-updated client. What can I do ?
attempting to undo all the updates that occurred is close to impossible (the undo might fail)
copying the state prior to executing any single update is a performance hog (supposing we can even swap it back in a sure way)
In any case, the book-keeping required is such that mistakes will creep in.
And worst of all: there is no safe assumption that can be made as to the extent of the corruption (except that client is now botched). Or at least, no assumption that will endure time (and code changes).
As often, the only way to win is not to play.
A possible solution: Transactions
Wherever possible, the key idea is to define macro functions, that will either fail or produce the expected result. Those are our transactions. And their form is invariant:
Either<Output, Error> doit(Input const&);
// or
Output doit(Input const&); // throw in case of error
A transaction does not modify any external state, thus if it fails to produce a result:
the external world has not changed (nothing to rollback)
there is no partial result to observe
Any function that is not a transaction should be considered as having corrupted anything it touched, and thus the only sane way of dealing with an error from non-transactional functions is to let it bubble up until a transaction layer is reached. Any attempt to deal with the error prior is, in the end, doomed to fail.
How to decide if an error should be handled locally or propagated to higher level code ?
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Deal with them whenever it is safe to do so and there is value in doing so. Most notably, it's okay to catch an error, check if it can be dealt with locally, and then either deal with it or pass it up.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
I did not address this question previously, however I believe it is clear than the approach I highlighted is already dual since it consists of both result-types and exceptions. As such, dealing with 3rd party libraries should be a cinch, though I do advise wrapping them anyway for other reasons (3rd party code is better insulated beyond a business-oriented interface tasked with the impedance adaption).
My view on logging (or other actions) from library code is NEVER.
A library should not impose policy on its user, and the user may have INTENDED an error to occur. Perhaps the program was deliberately soliciting a particular error, in the expectation of it arriving, to test some condition. Logging this error would be misleading.
Logging (or anything else) imposes policy on the caller, which is bad. Moreover, if a harmless error condition (which would be ignored or retried harmlessly by the caller, for example) were to happen with a high frequency, the volume of logs could mask any legitimate errors or cause robustness problems (filling discs, using excessive IO etc)
How to decide if an error should be handled locally or propagated to higher level code?
If the exception breaks the operation of a method it is a good approach to throw it to higher level. If you are familiar with MVC, Exceptions must be evaluated in Controller.
How to decide between logging an error, or showing it as an error message to the user?
Logging errors and all information available about the error is a good approach. If the error breaks the operation or user needs to know that an error is occur you should display it to user. Note that in a windows service logs are very very important.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
I don't see any reason to log errors in a dll. It should only throw errors. There may be a specific reason to do of course. In our company a dll logs information about the process (not only errors)
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Similar question: at what point should you stop propagating an error and deal with it?
In a controller.
Edit: I need to explain this a bit if you are not familiar with MVC. Model View Controller is a design pattern. In Model you develop application logic. In View you display content to user. In Controller you get user events and call Model for relevant function then invoke View to display result to the user.
Suppose that you have a form which has two textboxes and a label and a button named Add. As you might guess this is your view. Button_Click event is defined in Controller. And an add method is defined in Model. When user clicks, Button_Click event is triggered and Controller calls add method. Here textbox values can be empty or they can be letters instead of numbers. An exception occur in add function and this exception is thrown. Controller handles it. And displays error message in the label.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
I prefer second one. It would be easier. And I don't think you can do a general stuff for error handling. Especially for different libraries.
Does it make sense to create a list of error codes? Or is that old fashioned these days?
That depends on how will you use it. In a single application (a web site, a desktop application), i don't think it is needed. But if you develop a web service, how will you inform users for errors? Providing an error code is always important here.
If (error.Message == "User Login Failed")
{
//do something.
}
If (error.Code == "102")
{
//do something.
}
Which one do you prefer?
And there is another way for error codes these days:
If (error.Code == "LOGIN_ERROR_102") // wrong password
{
//do something.
}
The others may be: LOGIN_ERROR_103 (eg: this is user expired) etc...
This one is also human readable.
Here is an awesome blog post which explains how error handling should be done. http://damienkatz.net/2006/04/error_code_vs_e.html
How to decide if an error should be handled locally or propagated to higher level code?
Like Martin Becket says in another answer, this is a question of whether the error can be fixed here or not.
How to decide between logging an error, or showing it as an error message to the user?
You should probably never show an error to the user if you think so. Rather, show them a well formed message explaining the situation, without giving too much technical information. Then log the technical information, especially if it is an error while processing input. If your code doesn't know how to handle faulty input, then that MUST be fixed.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
Logging in library code is not useful, because you may not even have written it. However, the application could log interaction with the library code and even through statistics detect errors.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
See question one.
Similar question: at what point should you stop propagating an error and deal with it?
See question one.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
Throwing exceptions is an expensive operation in most heavy languages, so use them where the entire program flow is broken for that operation. On the other hand, if you can predict all outcomes of a function, put any data through a referenced variable passed as parameter to it, and return an error code (0 on success, 1+ on errors).
Does it make sense to create a list of error codes? Or is that old fashioned these days?
Make a list of error codes for a particular function, and document it inside it as a list of possible return values. See previous question as well as the link.
Always handle as soon as possible. The closer you are to its occurrence the more chance you have to do something meaningful or at the least figure out where and why it happened. In C++, it is not just a matter of context but being impossible to determine in many cases.
In general you should always halt the app if something buggy occurs that is a real error (not something like not finding a file, which is not really something that should count as an error but is labeled as such). It's not going to just sort itself out, and once the app is broken it will cause errors that are impossible to debug because they have nothing to do with the area they occur.
Why not?
see 1.
see 1.
You need to keep things simple, or you will regret it. More important to handling bugs at runtime is testing to avoid them.
It's like saying is it better to centralize or not centralize. It might make a lot of sense in some cases but be a waste of time in others. For something that is a loadable lib/module of some kind that can have errors that are data related (garbage in, garbage out), it makes tons of sense. For more general error handling or catastrophic errors, less.
Error handling is not accompanied by formal theory. It is too 'implementation specific' of a topic to be considered a science field (to be fair there is a great debate whether programming is a science on its own right).
Nontheless it a good part of a developer's work (and thus his/hers life), so practical approaches and technical guidliness have been developed on the topic.
A good view on the topic is presented by A. Alexandrescu, in his talk systematic error handling in C++
I have a repository in GitHub where the techniques presented are implemented.
Basically, what A.A does, is implement a class
template<class T>
class Expected { /* Implementation in the GitHub link */ };
that is meant to be used as a return value. This class could hold either a return value of type T or an exception (pointer). The exception could be either thrown explictly or upon request, yet the rich error information is always available. An example usage would be like this
int foo();
// ....
Expected<int> ret = foo();
if (ret.valid()) {
// do the work
}
else {
// either use the info of the exception
// or throw the exception (eg in an exception "friendly" codebase)
}
While building this framework for error handling, A.A walks us through techniques and designs that produce successfull or poor error handling and what works or what not. He also gives his definitions of 'error' and 'error handling'
My two cents.
How to decide if an error should be handled locally or propagated to higher level code?
Handle errors you can handle. Let errors propagate that you can not.
How to decide between logging an error, or showing it as an error message to the user?
Two orthogonal issues, which are not mutually exclusive. Logging the error is ultimately for you, the developer. If you would be interested in it, log it. Show it to the user if it is actionable to the user ("Error: No network connection!").
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
I see no reason why libraries can't log.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
You should catch them where you can handle them (insert your definition of handle). If you can't handle them, ignore them (maybe someone up the stack can handle them..).
You certainly shouldn't put a try/catch block around each and every throwing function you call.
Similar question: at what point should you stop propagating an error and deal with it?
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
At the first point that you can actually deal with it. That point may not exist, and your app may crash. Then you'll get a nice crash dump, and can update your error handling.
Does it make sense to create a list of error codes? Or is that old fashioned these days?
Another point of contention. I'd actually say no: one super list of all error codes implies that that list is always up to date, so you can actually do harm when it's not up to date. It's better to have each function document all the error codes it can return, rather than have one super list.
How to decide if an error should be handled locally or propagated to higher level code?
Error handling should be done at the highest affected level. If it only impacts the lower level code, then it should be handled there. If the error affects higher level code, then the error needs to be handled at the higher level. This is to prevent some higher level code from going on its merry way after an error has caused its actions to be incorrect. It should know what is going on, provided it is impacted.
How to decide between logging an error, or showing it as an error message to the user?
You should always log the error. You should show the error to the user when they are affected by it. If it is something they will never notice and does not have a direct impact (e.g. two sockets failed to open before the third finally opened, resulting in a very short delay for the user should not be reported), then they should not be notified.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
Too much logging is rarely a bad thing. You will regret not logging things when you have to hunt down a library bug more than you will be frustrated by extra logs when hunting down other bugs.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Similar to error handling above, it should be caught where the impact is, and where the error can be corrected/handled effectively. This will vary from case to case.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
This is largely a personal decision. My internal error handling is much different than the error handling I use for anything that touches a third party library. I have a general idea of what to expect from my code, but the third party stuff could have anything happen to it.
Does it make sense to create a list of error codes? Or is that old fashioned these days?
Depends how much you expect to have errors thrown. You might love your list of error codes if you spend a lot of time bug hunting, as they can help point you in the right direction. However, any time spent building these is less time spent on coding/bug fixing, so its a mixed bag. This largely comes down to personal preference.
The first question is probably what can you do about the error?
Can you fix it (in which case do you need to tell the user) or can the user fix it?
If nobody can fix it and you are going to exit, is there any value in having this reported back to you (through a crash dump or error code)?
I'm changing my design and coding philosophy so that:
If all runs smoothly, as expected,
no errors generated.
Throw an exception if something
different, or unexpected happens;
let the caller handle it.
If it can't be resolved, propagate
it up a higher level.
Hopefully, with this technique, the issues that get propagated to the User will be very important; otherwise the program tries to resolve them.
I'm currently experiencing issues that get lost in the return codes; or new return codes are created.
The book "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" book by Krzysztof Cwalina and Brad Abrams has some good suggestions on this. See chapter 7 on Exceptions. For example it favours throwing exceptions to returning error codes.
-Krip
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I used to work for a company where some of the lead architect/developers had mandated on various projects that assertions were not to be used, and they would routinely be removed from code and replaced with exceptions.
I feel they are extremely important in writing correct code. Can anyone suggest how such a mandate could be justified? If so, what's wrong with assertions?
We use a modified version of assert, as per JaredPar's comment, that acts like a contract. This version is compiled into the release code so there is a small size overhead, but disabled unless a diagnostics switch is set, such that performance overhead is minimized. Our assert handler in this instance can be set to disabled, silent mode (e.g. log to file), or noisy mode (e.g. display on screen with abort / ignore, where abort throws an exception).
We used automated regression testing as part of our pre-release testing, and asserts are hugely important here as they allow us to find potential internal errors that cannot be picked up at a GUI level, and may not be initially fatal at a user level. With automation, we can run the tests both with and without diagnostics, with little overhead other than the execution time, so we can also determine if the asserts are having any other side effects.
One thing to be careful of with asserts is side effects. For example, you might see something like assert(MyDatabasesIsOk()), which inadvertently corrects errors in the database. This is a bug, as asserts should never change the state of the running application.
The only really negative thing I can say about assertions is they don't run in retail code. In our team we tend to avoid assertions because of this. Instead we use contracts, which are assertions that run in both retail and debug.
The only time we use assertions now is if one of the following are true.
The assertion code has a noticable performance impact
The particular condition is not fatal
Occasionally there is a piece of code that may or may not be dead. We will add an assertion that essentially says "how did you get here." Not firing does not mean the code is indeed dead but if QA emails me and says "what does this assertion mean," we now have a repro to get to a particular piece of code (it's immediately documented of course).
assertions and exceptions are used for two different things.
Assertions are used for states that should never happen. For example, a signalton pointer should never be null and this error should be picked up during development using an assert. Handling it with an exception is alot more work for nothing.
On the other hand exceptions are used for rare states that could happen in the normal running of an application. For example using fopen and it returns a null pointer. It could happen but most times it will return a valid pointer.
Using assertions is nether wrong nor right but it comes down to personal preference as at the end of the day it is a tool to make programing easier and can be replaced by other tools.
It depends on the criticality of your system: assertions are a failfast strategy, while exceptions can be used when the system can perform some kind of recovery.
For instance, I won't use assertions in a banking application or a telecommunication system : I'd throw an exception, that will be catched upper in the call stack. There, resources can be cleaned, and the next call/transaction can be processed ; only one will be lost.
Assertions are an excellent thing, but not to be confused with parameter/return value checking. You use them in situations that you don't believe will occur, not in situations that you expect could occur.
My favourite place to use them is in code blocks that really shouldn't be reached - such as a default case in switch-statement over an enum that has a case for every possible enum value.
It's relatively common that you might extend the enum with new values, but don't update all switch-statements involving the enum, you'll want to know that as soon as possible. Failing hard and fast is the best you can wish for in such circumstances.
Granted, in those places you usually want something that breaks in production builds as well. But the principle of abort()ing under such conditions is highly recommended. A good stack trace in the debugger gives you the information to fix your bug faster than guessing around.
Is it true that an assertion exists in the debug build, but not in the release build?
If you want to verify/assert something, don't you want to do that in the release build as well as in the debug build?
The only guess is that because an exception is often non-fatal that it makes for a codebase that does not die in some odd state. The counter-point is that the fatality of an assertions points right to where the problem is, thus easy to debug.
Personally I prefer to take the risk of an assertion as I feel that it leads to more predictable code that is easier to debug.
Assertions can be left on simply by not defining NDEBUG, so that's not really an issue.
The real problem is that assertions call abort(), which instantly stops the program. This can cause problems if there is critical cleanup your program must do before it quits. Exceptions have the advantage that destructors are properly called, even if the exception is never caught.
As a result, in a case where cleanup really matters, exceptions are more appropriate. Otherwise, assertions are just fine.
We use assertions to document assumptions.
We ensure in code review that no application logic is performed in the asserts, so it is quite safe to turn them off just shortly before release.
One reason to veto assert() is that it's possible to write code that works correctly when NDEBUG is defined, but fails when NDEBUG is not defined. Or vice versa.
It's a trap that good programmers shouldn't fall into very often, but sometimes the causes can be very subtle. For example, the code in the assert() might nudge memory assignments or code positions in the executable such that a segmentation fault that would happen, does not (or vice versa).
Depending on the skill level of your team, it can be a good idea to steer them away from risky areas.
Note, throwing an exception in a destructor is undefined behaviour.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
There's a discussion going on over at comp.lang.c++.moderated about whether or not assertions, which in C++ only exist in debug builds by default, should be kept in production code or not.
Obviously, each project is unique, so my question here is not so much whether assertions should be kept, but in which cases this is recommendable/not a good idea.
By assertion, I mean:
A run-time check that tests a condition which, when false, reveals a bug in the software.
A mechanism by which the program is halted (maybe after really minimal clean-up work).
I'm not necessarily talking about C or C++.
My own opinion is that if you're the programmer, but don't own the data (which is the case with most commercial desktop applications), you should keep them on, because a failing asssertion shows a bug, and you should not go on with a bug, with the risk of corrupting the user's data. This forces you to test strongly before you ship, and makes bugs more visible, thus easier to spot and fix.
What's your opinion/experience?
See related question here
Responses and Updates
An assertion is error, pure and simple and therefore should be handled like one.
Since an error should be handled in release mode then you don't really need assertions.
That's why I prefer the word "bug" when talking about assertions. It makes things much clearer. To me, the word "error" is too vague. A missing file is an error, not a bug, and the program should deal with it. Trying to dereference a null pointer is a bug, and the program should acknowledge that something smells like bad cheese.
Hence, you should test the pointer with an assertion, but the presence of the file with normal error-handling code.
Slight off-topic, but an important point in the discussion.
As a heads-up, if your assertions break into the debugger when they fail, why not. But there are plenty of reasons a file could not exist that are completely outside of the control of your code: read/write rights, disk full, USB device unplugged, etc. Since you don't have control over it, I feel assertions are not the right way to deal with that.
Yes, I have Code Complete, and must say I strongly disagree with that particular advice.
Say your custom memory allocator screws up, and zeroes a chunk of memory that is still used by some other object. I happens to zero a pointer that this object dereferences regularly, and one of the invariants is that this pointer is never null, and you have a couple of assertions to make sure it stays that way. What do you do if the pointer suddenly is null. You just if() around it, hoping that it works?
Remember, we're talking about product code here, so there's no breaking into the debugger and inspecting the local state. This is a real bug on the user's machine.
Assertions are comments that do not become outdated. They document which theoretical states are intended, and which states should not occur. If code is changed so states allowed change, the developer is soon informed and needs to update the assertion.
Allow me to quote Steve McConnell's Code Complete. The section on Assertions is 8.2.
Normally, you don't want users to see assertion messages in production code; assertions are primarily for use during development and maintenance. Assertions are normally compiled into the code at development time and compiled out of the code for production.
However, later in the same section, this advice is given:
For highly robust code, assert and then handle the error anyway.
I think that as long as performance is not an issue, leave the assertion in, but rather than display a message, have it write to a log file. I think that advice is also in Code Complete, but I'm not finding it right now.
Leave assertions turned on in production code, unless you have measured that the program runs significantly faster with them turned off.
if it's not worth measuring to prove it's more efficient, then it's not worth sacrificing clarity for a performance gamble." - Steve McConnell 1993
http://c2.com/cgi/wiki?ShipWithAssertionsOn
If you're even thinking of leaving assertions on in production, you're probably thinking about them wrong. The whole point of assertions is that you can turn them off in production, because they are not a part of your solution. They are a development tool, used to verify that your assumptions are correct. But the time you go into production, you should already have confidence in your assumptions.
That said, there is one case where I will turn assertions on in production: If we encounter a reproducible bug in production that we're having a hard time reproducing in a test environment, it may be helpful to reproduce the bug with assertions turned on in production, to see if they provide useful information.
A more interesting question is this: In your testing phase, when do you turn assertions off?
Assertions should never stay in production code. If a particular assertion seems like it might be useful in production code, then it should not be an assertion; it should be a run time error check, i.e. something coded like this: if( condition != expected ) throw exception.
The term 'assertion' has come to mean "a development-time-only check which will not be performed on the field."
If you start thinking that assertions might make it to the field then you will inevitably also start making other dangerous thoughts, like wondering whether any given assertion is really worth making. There is no assertion which is not worth making. You should never be asking yourself "should I assert this or not?" You should only be asking yourself "Is there anything I forgot to assert?"
Unless profiling shows that the assertions are causing performance problems, I say they should stay in the production release as well.
However, I think this also requires that you handle assertion failures somewhat gracefully. For example, they should result in a general type of dialog with the option of (automatically) reporting the issue to the developers, and not just quit or crash the program. Also, you should be careful not to use assertions for conditions that you actually do allow, but possibly don't like or consider unwanted. Those conditions should be handled by other parts of the code.
In my C++ I define REQUIRE(x) which is like assert(x) except that it throws an exception if the assertion fails in a release build.
Since a failed assertion indicates a bug, it should be treated seriously even in a Release build. When my code's performance matters, I will often use REQUIRE() for higher-level code and assert() for lower-level code that must run fast. I also use REQUIRE instead of assert if the failure condition may be caused by data passed in from code written by a third party, or by file corruption (optimally I would design the code specifically to be well behaved in case of file corruption, but we don't always have time to do that.)
They say you shouldn't show those assert messages to end-users because they won't understand them. So? End users may send you an email with a screen shot or some text of the error message, which helps you debug. If the user simply says "it crashed", you have no ability to fix it. It would be better to send the assertion-failure messages to yourself automatically, but that only works if (1) the software runs on a server you control/monitor or (2) the user has internet access and you can get their permission to send a bug report.
If you want to keep them replace them with error handling. Nothing worse than a program just disappearing. I see nothing wrong with treating certain errors as serious bugs, but they should be directed to a section of your program that is equipped to deal with them by collecting data, logging it, and informing the user that your app has had some unwanted condition and is exiting.
Provided they are handled just as any other error, I don't see a problem with it. Do bear in mind though that failed assertions in C, as with other languages, will just exit the program, and this isn't usually sufficient for production systems.
There are some exceptions - PHP, for instance, allows you to create a custom handler for assertion failures so that you can display custom errors, do detailed logging, etc. instead of just exiting.
Our database server software contains both production and debug assertions. Debug assertions are just that -- they are removed in production code. Production assertions only happen if (a) some condition exists that should never exist and (b) it is not possible to reliably recover from this condition. A production assertion indicates that a bug in the software has been encountered or some kind of data corruption has occurred.
Since this is a database system and we are storing potentially enterprise-critical data, we do whatever we can to avoid corrupted data. If a condition exists that may cause us to store incorrect data, we immediately assert, rollback all transactions, and stop the server.
Having said that, we also try to avoid production assertions in performance-critical routines.
Suppose a piece of code is in production, and it hits an assertion that would normally be triggered. The assertion has found a bug! Except it hasn't, because the assertion is turned off.
So what happens now? Either the program will (1) crash in an uninformative way at a point further removed from the source of the problem, or (2) run merrily to completion, likely giving the wrong result.
Neither scenario is inviting. Leave assertions active even in production.
I see asserts as in-line unit tests. Useful for a quick test while developing, but ultimately those assertions should be refactored out to be tested externally in unit tests.
I find it best to handle all errors that are in scope, and use assertions for assumptions that we're asserting ARE true.
i.e., if your program is opening/reading/closing a file, then not being able to open the file is in scope -- it's a real possibility, which would be negligent to ignore, in other words. So, that should have error-checking code associated with it.
However, let's say your fopen() is documented as always returning a valid, open file handle. You open the file, and pass it to your readfile() function.
That readfile function, in this context, and probably according to its design specification, can pretty much assume it's going to get a valid file ptr. So, it would be wasteful to add error-handling code for the negative case, in such a simple program. However, it should at least document the assumption, somehow -- ensure somehow --- that this is actually the case, before continuing its execution. It should not ACTUALLY assume that will always be valid, in case it's called incorrectly, or it's copy/pasted into some other program, for example.
So, readfile() { assert(fptr != NULL); .. } is appropriate in this case, whilst full-blown error handling is not (ignoring the fact that actually reading the file would require some error handling system anyway).
And yes, those assertions should stay in production code, unless its absolutely necessary to disable them. Even then, you should probably disable them only within performance-critical sections.
I rarely use assertions for anything other that compile time type checking. I would use an exception instead of an assertion just because most languages are built to handle them.
I offer an example
file = create-some-file();
_throwExceptionIf( file.exists() == false, "FILE DOES NOT EXIST");
against
file = create-some-file();
ASSERT(file.exists());
How would the application handle the assertion? I prefer the old try catch method of dealing with fatal errors.
Most of the time, when i use assertion in java (the assert keyword) I automatically add some production codes after. According to the case, it can be a logging message, an exception... or nothing.
According to me, all your assertions are critical in dev release, not in production relase. Some of them must be kept, other must be discarded.
ASSERTIONS are not errors and should not be handled as errors. When an assertion is thrown, this means that there is a bug in your code or alternatively in the code calling your code.
There are a few points to avoid enabling assertions in production code:
1. You don't want your end user to see a message like "ASSERTION failed MyPrivateClass.cpp line 147. The end user is NOT you QA engineer.
2. ASSERTION might influence performance
However, there is one strong reason to leave assertions:
ASSERTION might influence performance and timing, and sadly this sometimes matter (especially in embedded systems).
I tend to vote for leaving the assertion on in production code but making sure that these assertions print outs are not exposed to end user.
~Yitzik
An assertion is error, pure and simple and therefore should be handled like one.
Since an error should be handled in release mode then you don't really need assertions.
The main benefit I see for assertions is a conditional break - they are much easier to setup than drilling through VC's windows to setup something that takes 1 line of code.