Not sure when to use exceptions in C++ - c++

After many years of coding scientific software in C++, I still can't seem to get used to exceptions and I've no idea when I should use them. I know that using them for controlling program flow is a big no-no, but otherwise than that... consider the following example (excerpt from a class that represents an image mask and lets the user add areas to it as polygons):
class ImageMask
{
public:
ImageMask() {}
ImageMask(const Size2DI &imgSize);
void addPolygon(const PolygonI &polygon);
protected:
Size2DI imgSize_;
std::vector<PolygonI> polygons_;
};
The default constructor for this class creates a useless instance, with an undefined image size. I don't want the user to be able to add polygons to such an object. But I'm not sure how to handle that situation. When the size is undefined, and addPolygon() is called, should I:
Silently return,
assert(imgSize_.valid) to detect violations in code using this class and fix them before a release,
throw an exception?
Most of the time I go either with 1) or 2) (depending on my mood), because it seems to me exceptions are costly, messy and simply overkill for such a simple scenario. Some insight please?

The general rule is that you throw an exception when you cannot perform the desired operation. So in your case, yes, it does make sense to throw an exception when addPolygon is called and the size is undefined or inconsistent.
Silently returning is almost always the wrong thing to do. assert is not a good error-handling technique (it is more of a design/documentation technique).
However, in your case a redesign of the interface to make an error condition impossible or unlikely may be better. For example, something like this:
class ImageMask
{
public:
// Constructor requires collection of polygons and size.
// Neither can be changed after construction.
ImageMask(std::vector<PolygonI>& polygons, size_t size);
}
or like this
class ImageMask
{
public:
class Builder
{
public:
Builder();
void addPolygon();
};
ImageMask(const Builder& builder);
}
// used like this
ImageMask::Builder builder;
builder.addPolygon(polyA);
builder.addPolygon(polyB);
ImageMask mask(builder);

I would try to avoid any situation where it's possible to create data that is in some kind of useless state. If you need a polygon that is not empty, than don't let empty polygons be created and you save yourself much trouble because the compiler will enforce that there are no empty polygons.
I never use silent returns, because they hide bugs and this makes finding bugs much more complicated than it have to be.
I use asserts when I detect that the program is in a state that it only can be in, if there is a bug in the software. In your example, if you check in the c'tor that takes a Size2DI, that this size is not empty, than asserting if the size stored is not empty, is useful to detect bugs. Asserts should not have side effect and it must be possible to remove them, without changing the behavior of the software. I find them very useful, to find my own bugs and to document, the current state of the object / function etc.
If it's very likely, that a runtime error will be handled directly by a caller of a function, I would use conventional return values. If it's very likely, that this error situation have to be communicated over several function calls at the call stack, I prefer exceptions. In doubt I offer two function.
kind regards
Torsten

To me, 1 is a no option. Whether it is 2 or 3 depends on the design of your program/library, whether you consider (and document) default-constructing image mask and then adding polygons a valid or invalid usage of your component. This is an important design decision. I recommend reading this article by Matthew Wilson.
Note that you have more options:
Invent your own assert that always calls std::terminate and does additional logging
Disable the default constructor (as others already pointed out) -- this is my favourite

"Silently return" - that's real 'the big no-no'. The program should know what's wrong.
"assert" - the second rule is that asserts using only if normal program's flow couldn't be restored.
"throw exception" - yes, this right and good technique. Just take care about exception-safety. There are many articles about exception-safe coding on GotW.
Don't afraid exceptions. They don't bite. :) If you'll take this technique enough, you'll be a strong coder. ;)

Related

Is std::bad_optional_access a small crime against exceptions?

If std::optional's value() member function is called when the optional has no actual value initialized, a std::bad_optional_access is thrown. As it is derived directly from std::exception, you need either catch (std::bad_optional_access const&) or catch (std::exception const&) for dealing with the exception. However, both options seem sad to me:
std::exception catches every single exception
std::bad_optional_access exposes implementation details. Consider the following example:
Placement Item::get_placement() const {
// throws if the item cannot be equipped
return this->placement_optional.value();
}
void Unit::equip_item(Item acquisition) {
// lets the exception go further if it occurs
this->body[acquisition.get_placement()] = acquisition;
}
// somewhere far away:
try {
unit.equip_item(item);
} catch (std::bad_optional_access const& exception) { // what is this??
inform_client(exception.what());
}
So, to catch the exception you need to be well-informed about the usage of std::optional in the Item's implementation, being led to a list of already known issues. Neither I want to catch and rewrap std::bad_optional_access because (for me) the key part of exceptions is the possibility of ignoring them until needed. This is how I see the right approach:
std::exception
<- std::logic_error
<- std::wrong_state (doesn't really exist)
<- std::bad_optional_access (isn't really here)
So, the "far away" example could be written like this:
try {
unit.equip_item(item);
} catch (std::wrong_state const& exception) { // no implementation details
inform_client(exception.what());
}
Finally,
Why is std::bad_optional_access designed like it is?
Do I feel exceptions correctly? I mean, were they introduced for such usage?
Note: boost::bad_optional_access derives from std::logic_error. Nice!
Note 2: I know about catch (...) and throwing objects of types other than std::exception family. They were omitted for brevity (and sanity).
Update: unfortunately, I can't accept two answers, so: if you're interested in the topic, you can read Zuodian Hu's answer and their comments.
You say that the key appeal of exceptions is that you can ignore them for as deep of a call stack as you can. Presumably, given your ambition of avoiding to leak implementation details, you no longer can let an exception propagate beyond the point where that exception cannot be understood and fixed by its handler. That seems to be a contradiction with your ideal design: it punts fixing the exception to the user, but bad_optional_access::what has exactly no context on what just happened–leaking implementation details to the user. How do you expect a user to take meaningful action against a failure to equip an item when all they see is, at best, "could not equip item: bad_optional_access"?
You have obviously made an over-simplification, but the challenge remains. Even with a "better" exception hierarchy, std::bad_optional_access simply does not have enough context that anything beyond extremely close callers might know what to do with it.
There are several fairly distinct cases in which you might want to throw:
You want control flow to be interrupted without much syntactical overhead. For instance, you have 25 different optionals that you want to unwrap, and you want to return a special value if any of them fails. You put a try-catch block around the 25 accesses, saving yourself 25 if blocks.
You have written a library for general use that does a lot of things that can go wrong, and you want to report fine-grained errors to the calling program to give it the best chance of programmatically doing something smart to recover.
You have written a large framework that performs very high-level tasks, such that you expect that usually, the only reasonable outcome of an operation failing is to inform the user that the operation has failed.
When you run into issues with exceptions not feeling right, it's usually because you're trying to handle an error meant for a different level than the one you wish it was operating at. Wishing for changes to the exception hierarchy is just trying to bring that exception in line for your specific use, which causes tensions with how other people use it.
Clearly, the C++ committee believes that bad_optional_access belongs to the first category, and you're asking why it's not part of the third category. Instead of trying to ignore exceptions until you "need" to do something with them, I believe that you should flip the question around and ask yourself what is intended to catch the exception.
If the answer truly is "the user", then you should throw something that's not a bad_optional_access and that instead has high-level features like localized error messages and enough data on it that inform_user is able to bring up a dialog with a meaningful title, main text, subtext, buttons, an icon, etc.
If the answer is that this is a general game engine error and that it might happen in the regular course of the game, then you should throw something that says that equipping the item failed, not that there was a state error. It's more likely that you'll be able to recover from failing to equip an item than from having a non-descript state error, including if, down the road, you need to produce a pretty error for the user.
If the answer is that you might try to equip 25 items in a row and you want to stop as soon as something goes wrong, then you need no changes to bad_optional_access.
Also note that different implementations make different uses more or less convenient. In most modern C++ implementations, there is no performance overhead on code paths that do not throw, and a huge overhead on paths that do throw. This often pushes against the use of exceptions for the first category of errors.
So, to catch the exception you need to be well-informed about the usage of std::optional in the Item's implementation
No, to catch the exception, you must read the documentation for get_placement, which will tell you that it throws std::bad_optional_access. By choosing to emit that exception, the function is making the emission of that exception a part of the interface of that function.
And therefore, it is no more dependent on Item's implementation than it would be if it directly returned a std::optional. You choose to put it in your interface, so you ought to live with the consequences.
To put it another way, if you felt that putting std::optional as a parameter type or return value was wrong, then you should feel the same way about emitting bad_optional_exception directly.
Ultimately, this all goes back to one of the most fundamental questions of error handling: how far away from the site of the error can you get before the specific nature of the error becomes effectively meaningless or even completely different?
Let's say you're doing text processing. You've got a file with each line containing 3 floating-point numbers. You're processing it line by line, and inserting each set of three values into a list. And you have a function that converts strings to floats, which throws an exception if that conversion fails.
So the code broadly looks like this:
for each line
split the line into a 3-element list of number strings.
for each number string
convert the string into a number.
add the number to the current element.
push the element into the list.
Alright, so... what happens if your string-to-float converter throws? That depends; what do you want to happen? That's determined by who catches it. If you want a default value on an error, then the code in the innermost loop catches it and writes a default value into the element.
But maybe you want to log that a particular line has an error, then skip that line (don't add it to the list), but continue processing the rest of the text as normal. In that case, you catch the exception in the first loop.
At that point, the meaning of the error has changed. The error which was thrown was "this string doesn't contain a valid float", but that's not how your code handles it. In fact, the catching code has completely lost the context of the error. It doesn't know whether it was the first, second, or third string in the text which caused the failure. At best, it knows that it was somewhere along that line, and maybe the exception happens to contain a couple of pointers to the bad string range (though that's increasingly dangerous the farther that exception gets from its source, due to the possibility of dangling pointers).
And what if a failed conversion ought to mean that the entire process can no longer be trusted, that the list you're building is invalid and should be discarded? This has even less context than the previous case, and the meaning is even more muddled and distant. At this point, the error just means to terminate the list building process. Maybe you put together a log entry, but that's about all you're going to do at this point.
The farther you get from where the exception is thrown, the more context about the error is lost, and the more the meaning ultimately drifts from the initial meaning of the error. That's not just about being an implementation detail; it's about the locality of information and the response to that information.
So basically, code close to the source of the error is catching specific exceptions with contextual meaning. The farther the catch gets from the source of the error, the more likely it is that the catching code is going to be very generic, dealing with vague "this didn't work because reasons" kinds of things. This is where vague types like std::logic_error come in.
Indeed, one could imagine that at each step in the process, the exception is reinterpreted (and by "reinterpreted", I mean converting it into a different type via catch/throw). The string-to-float converter throws a meaningful exception: could not convert string to float. The layer trying to build an element from 3 strings converts the exception to something that has value to its caller: string index X is malformed. And at the last phase, the exception is generalized to: couldn't parse the list due to line Y.
The idea that a single exception type can jump through whole libraries of code and designed intent and still retain its initial meaning is a fantasy. Exceptions work great when they have to pass through neutral code, such as throwing an exception from a callback or some other case of indirect function execution. In this case, the code which provoked the execution still has the local context of the process that provoked the exception. But the farther from the local context who knows what's going on you get, the less meaningful a specific exception becomes.
Inheriting from logic_error is wrong for these reasons. Catching a bad_optional_access is ultimately a very local thing. Past a certain point, the meaning of that error changes.
A "logic error" represents a failure of your program to make sense. But an optional which doesn't contain a value does not necessarily represent such a problem. In one piece of code, it could be a perfectly valid thing to have an empty optional, and the exception being thrown is simply how that gets reported to the caller. Another piece of code might treat an optional being empty at a certain point as a user having made some prior mistake in their API usage. One of these is a logic error, and the other isn't.
Ultimately, the right thing to do is make sure that your classes APIs all emit exceptions which are meaningful to the caller. And it's not clear what bad_optional_access means to the caller of get_placement.
Exposing Implementation Details
If you wish for your user to be entirely unaware of std::optional in your implementation, your interface would either check operator bool or has_value and do one of the following:
return a status code
throw a custom exception type
handle the emptiness in such a way that the client has no knowledge that an internal error ever happened
...or your interface would catch std::bad_optional_access and do one of the above. In either case, your client has no idea you used std::optional.
Note that whether you found out about the emptiness of the optional through an explicit check or an exception is a design choice (but personally I wouldn't catch and re-throw either in most cases).
Logic Error?
Based on the conceptual model for optional in the pre-standardization paper, std::optional is a value wrapper with a valid empty state. Hence, the intent was for emptiness to be intentional in normal usage. There are two general ways handling emptiness, as I stated in the comments:
use operator bool or has_value, then handle emptiness inline or use the wrapped value through operator* or operator->.
use value and bail out of the scope if the optional is empty
In either case, you should be expecting the optional to potentially be empty, and designed for that to be a valid state in your program.
In other words, when you use operator bool or has_value to check for emptiness, it is not to prevent an exception being thrown. Instead, you are choosing to not use the exception interface of optional at all (usually). And when you use value, you are choosing to accept optional potentially throwing std::bad_optional_access. Hence, the exception should never be a logic error in the intended usage of optional.
UPDATE
Logic Errors in the Design of C++
You seem to misunderstand the Standard's intended definition of what a logic error is.
In the design of C++ in recent years (not the same in history), a logic error is a programmer error the application shouldn't try to recover from because it can't reasonably recover. This includes things like de-referencing dangling pointers and references, using operator* or operator-> on an empty optional, passing invalid arguments to a function, or otherwise breaking API contracts. Note that dangling pointers' existence is not a logic error, but de-referencing a dangling pointer is a logic error.
In these cases of true logic errors, the Standard purposely chooses not to throw because they are true logic errors on the part of the programmer, and the caller can't be reasonably expected to handle all bugs in the code they call.
When a well-designed (under this philosophy) Standard Library function throws, it's never supposed to be because the code or the caller wrote buggy code. For buggy code, the Standard let's you fall flat on your face for writing the bug. For example, many functions in <algorithn> run infinite loops if you pass them bad begin and end iterators, and never even try to diagnose the fact that you did that. They certainly don't throw std::invalid_argument. "Good" implementations do try to diagnose this in Debug builds though, because those logic errors are bugs. When a well-designed (under this philosophy) Standard Library function throws, it's supposed to be because a truly exceptional and unavoidable event occurred. has many throwing functions, because you can't really ever know for sure what's on some random file system. That's the situation exceptions are supposed to be used for.
In the paper linked below, Herb Sutter speaks against std::logic_error's existence as an exception type for this very reason. Clearly stating the philosophy, catching std::logic_error or any of its children amounts to introducing runtime overhead to fix programmer logic bugs. Any true logic error condition you want to detect should be asserted on, really, so the bug can be reported back to the people who wrote the bug.
In the optional interface, designed with the above in mind, value throws so that you can programmatically deal with it in a sensible way with the expectation that whoever catches it either don't care what bad_optional_access means (catch( ... ) // catch everything) or can specifically deal with bad_optional_access. That exception really isn't meant to propagate far at all. When you purposely call value, you do so because you acknowledge the optional may be empty, and you choose to exit the current scope if it does turn out to be empty.
See the first section of this paper (download) for the philosophical rationale.
First, if you don't want to expose the imlementation, than the exceptions shouldn't even cross the border between the implementation and the client code. This is a common idiom that no exception should cross the boundaries of libraries, APIs, etc.
Next, the fact that you store something in an optional is the implementation that you should control by yourself. That means that you should check that the optional is not empty (at least if you don't want the client knowing the details of the implementation).
And finally, answer the question: is it an error that the client code performs an operation on an empty object? If that is something that it is allowed to do, than no exception should be thrown (e.g. error code may be returned). If that is a real problem that shouldn't happen, throwing an exception is appropriate. You may catch the std::bad_optional_access in your code and throw something else from the catch block.
Another solution to your problem could be nested exceptions. That means that you catch a lower level exception (in your case std::bad_optional_access) and then throw another exception (of any type, in your case you may implement a wrong_state : public std::logic_error) using the std::throw_with_nested function. Using this approach you:
Preserve the information about the lower level exception (it is
still stored as a nested exception)
Hide this information about the nested exception from the user
Allow user to catch the exception as a wrong_state or a std::logic_error.
see the example: https://coliru.stacked-crooked.com/view?id=b9bc940f2cc6d8a3
Consider this REAL WORLD example of when NOT to use std:bad_optional_access, involving the INELEGANT 900 lines of code, wrapped up into one HUGE class, just to render a vulkan triangle, in THIS example at https://vulkan-tutorial.com/code/06_swap_chain_creation.cpp
I am in the process of reimplementing the one HUGE HelloTriangleApplication class into multiple smaller classes. And,the QueueFamilyIndices struct begins as a couple of empty std::optional<uinit32_t> lists , ergo, EXACTLY the sort of not-yet-a-things which std::optional was invented to handle.
So, obviously, I wanted to test each class, before subclassing it into another class. But, this involved leaving some not-yet-a-things uninitialized until a parent's subclass was latter implemented.
It seemed right, at least to me, NOT to use std:bad-optional-access as a placeholder for future values, but rather to just code a 0 in the parent class, as a placeholder for not yet implemented std:optional not-yet-a-things. Which was enough to avoid my IDE reporting those annoying "bad optional access" warnings.
This is a good question with good answers. I want to highlight some of the main points more directly, and also add some points that I disagree with from the other answers. I come at this more from the POV of abstract flow of information, with the idea that all infinite variants of specific situations become easier to handle when appropriate information is passed around effectively.
The TL;DR here is:
Using bad_optional_access in a semantically incorrect way is common, but is the root cause of a lot of the other stuff (like logic_error) appearing to not making sense, and
value() should be used only when you know it has a value; it's not intended as some "exception-y" variation of value_or(). It doesn't make sense when there is no value: A thing with no value does not have a value, and so retrieving its value isn't something you can do. If you call 'value()' when there's no value then you made a mistake somewhere.
Regarding the use of value() itself:
If you cannot guarantee that an optional has a value, you use has_value() or value_or(). The use of value() assumes the optional has a value and by using it, you are stating that assumption as an invariant (i.e. assert(x.has_value()) is expected to pass), and if it doesn't have a value, then the invariant has been violated and an exception is appropriate. value() does not have meaning when the optional doesn't have a value. It is the same reason that you do not compute a / b in situations where b might be 0 -- you either know it isn't 0, or you check first. Likewise dereferencing invalid iterators, accessing invalid pointers, calling front() on an empty container, uh... computing the unbiased variance of a single sample... things like that.
Following that point, if you see a bad_optional_access, then this means there is a bug in your code: one of your assumptions (it had a value) was false. In other words, this is a development error, and in an ideal world a user should never encounter this exception in the same way a user should never encounter an assertion failure or a divide-by-zero or a null pointer access: It does not represent a user-actionable error, it represents code that needs to be fixed. Ideally only you as a developer should encounter this particular exception.
This is specifically why it is a logic_error: You used value() but did not honor its preconditions, and the implied assumption that you made about it having a value was not correct. You made a programming error by using value() in a situation where you could not guarantee that it had a value.
That said, the world isn't ideal. Also, generally speaking, if some exception below some layer of code is meant to represent a more user-appropriate error above that layer of code, then you need to translate that exception. For example:
An exception may expose an implementation detail that you want to abstract away.
An exception may contain information that doesn't make sense to a user, but may represent a larger more generic issue that is important to a user.
And so you need to translate that. For example:
Placement Item::get_placement() const {
// throws if the item cannot be equipped
return this->placement_optional.value();
}
The comment literally says "throws if the item cannot be equipped", but bad_optional_access doesn't mean "an item cannot be equipped". So if you allow it to be thrown out of that function, then you've miscommunicated the conceptual issue by throwing a semantically incorrect exception. Instead:
// elsewhere
class item_equip_exception : ... {
...
};
// then:
Placement Item::get_placement() const {
// throws if the item cannot be equipped
try {
return this->placement_optional.value();
} catch (const std::bad_optional_access &x) {
throw item_equip_exception(...);
}
}
Because that's what you're really trying to communicate.
However, an even more correct version of that would be:
Placement Item::get_placement() const {
// throws if the item cannot be equipped
if (!this->placement_optional.has_value())
throw item_equip_exception(...);
return this->placement_optional.value();
}
The reason this is more correct is because now you are calling value() in a situation where your assumption that it has a value should be guaranteed. And in this case, if you end up with a bad_optional_access, then it's truly a serious logic error. This now means that, as long as you're consistent with this approach, at the very top level of your application you can now actually catch std::logic_error and it really will mean that some program logic went terribly wrong, and you can inform the user as such.
All of the issues in the original post can basically be boiled down to semantics:
If you use value() when there might be no value, that's a programming error, and...
If you then use that "programming error" to signify a general user-facing error, then...
... you've now tangled up all the semantics and nothing makes sense any more, including `std::logic_error'.
Where on the other hand:
If there reasonably may be no value and that signifies some higher level thing like "the item can't be equipped", and...
You check for that and throw a more appropriate exception and never call value() when there's no value, then...
... you can now communicate that to the user, the logic_error -> bad_optional_access inheritance meaning continues to make sense, and you can still separately catch programmer errors at a higher level.
So yeah; "say what you mean" applies as much to information flow in programming as it does to speaking in real life!

Accessing an object in a moved-from state

Let's say I have a class that manages a resource. Every method has a precondition that must be satisfied: the managed resource must be in a valid state (same way as unique_ptr with its operator* and operator->), and if the object is in a moved-from state, that precondition is not satisfied. For cases where I'm accessing a moved-from object (either accidentally or deliberately for whatever reason), this raises the following questions:
Is it good design to check (assert) if the precondition is satisfied at the beginning of every method?
If yes, how exactly would it be good design to address the issue in case it is not? Perhaps throw an exception or something? (I understand this is an opinion-based question)
Is it good design to check (assert) if the precondition is satisfied at the beginning of every method?
If you can statically assert on it then yes. During runtime no. Consider a caller that has code like this
for (int i=0; i<100000;++i) {
foo.bar();
}
Checking the precondition on every iteration is a waste of resources when the condition does not change.
If yes, how exactly would it be good design to address the issue in case it is not? Perhaps throw an exception or something? (I understand this is an opinion-based question)
That depends on how nice you want to be to the caller. One valid choice is to clearly state the preconditions and if a caller violates them then they are on their own. Compare to eg std::vector::operator[]: You want to pass an out-of-bounds index? No problem (from point of view of the operator), undefined behaviour is what you get (a real problem from your point of view). You cannot guarantee well defined behaviour in all situations, especially when the caller didn't care about your preconditions. In some cases you might be able to throw an exception, but that depends on details.
In C++, it is usually a good idea to use assert from <cassert> header to make sure some condition is met, and it has zero overhead in release builds. I don't think throwing is a particularly good idea here, but it is nevertheless an option with its own trade-offs. Throwing makes it harder to write exception-safe code and it makes the condition-check mandatory. I'd say, assert it, and also document it that those preconditions shall be met. std::vector::operator[] does that on many implementations to check for out-of-bound access in debug mode, for instance.
I just want to make clear that, "moved from" objects, generally, should have a valid state. Users can't make any assumptions about its state or value, but they should be able to reassign or reuse that object.
Is it good design to check (assert) if the precondition is satisfied at the beginning of every method?
Like most things in programming, it is a tradeoff. What is good depends on whether you value more what you gain compared to what you lose.
What you lose in the trade is performance: Not performing a check is at least as fast, and usually faster than performing it. Whether this loss of efficiency is significant depends on the use case. Within the hot path of a CPU bound algorithm: It may be. In a code dealing with network communication: Probably not. To find out whether the difference is significant, you need to measure.
how exactly would it be good design to address the issue in case it is not?
There are several ways of communicating an error in C++:
Simplest form is to abort the program. The advantage is that there won't be unknown side effects such as security holes due to undefined behaviour. This is a decent choice when the program would have no way of recovering... but how would you as the implementer of the class know whether the user of the class can recover?
The standard assert macro falls into the same category of terminating the program, but the check is typically enabled only in testing and disabled in release mode which may leave the program vulnerable to exploits.
Exceptions are a convenient way to either let the client code choose whether they would like to recover from it (try - catch) or let the program terminate safely. Pre / Post condition violations are typically exceptional cases where an exception is appropriate.
Use a traditional C style error code. This will force the client to write their own error handling code to avoid potential UB.
Each way has their own advantages and disadvantages. You should choose according to your needs.
I will also use assertion-like solution to check preconditions in that code. But instead using bare assert, I will use some compiler-specific magick to optimize likely/unlikely cases, like it is done in Microsoft's GSL library.
You have there GSL_LIKELY, GSL_UNLIKELY and GSL_ASSUME macros and Expects, Ensures constructs. They are better than normal assert because based on them compiler may perform better code optimization and profile assembly against likely/unlikely branches.
Also, what is very cool, you can tune config of GSL and decide, what to do on failed condition: throw, terminate or just nothing. This way you can easily test different behaviors without modifying whole codebase. All you need is to just change some compile definitions.
And another useful thing: using them is more concise. In my opinion Expects(m_data != nullptr) is more readable than assert(m_data != nullptr).

How much work should constructor of my class perform?

I have a class that represents a data stream, it basically
reads or writes into a file, but first the data are being encrypted/decrypted and there is also an underlying codec object that handles the media being accessed.
I'm trying to write this class in a RAII way and I'd like a clean, nice, usable design.
What bothers me is that right now there is a lot of work being done in the constructor.
Before the object's I/O routines can be safely used, first of all the codec needs to initialized (this isn't very demanding), but then a key is taken into account and crypto and other things are intialized - these require some analysis of the media which takes quite a lot of computation.
Right now I'm doing all this in the constructor, which makes it take a long time. I'm thinking of moving the crypto init stuff (most work) out of the ctor into a separate method (say, Stream::auth(key)), but then again, this would move some responsibility to the user of the class, as they'd be required to run auth() before they call any I/O ops. This also means I'd have to place a check in the I/O calls to verify that auth() had been called.
What do you think is a good design?
P.S. I did read similar question but I wasn't really able to apply the answers on this case. They're mostly like "It depens"... :-/
Thanks
The only truly golden unbreakable rule is that the class must be in a valid, consistent, state after the constructor has executed.
You can choose to design the class so that it is in some kind of "empty"/"inactive" state after the constructor has run, or you can put it directly into the "active" state that it is intended to be in.
Generally, it should be preferred to have the constructor construct your class. Usually, you wouldn't consider a class fully "constructed", until it's actually ready to be used, but exceptions do exist.
However, keep in mind that in RAII, one of the key ideas is that the class shouldn't exist unless it is ready, initalized and usable. That's why its destructor does the cleanup, and that's why its constructor should do the setup.
Again, exceptions do exist (for example, some RAII objects allow you to release the resource and perform cleanup early, and then have the destructor do nothing.)
So at the end of the day, it depends, and you'll have to use your own judgment.
Think of it in terms of invariants. What can I rely on if I'm given an instance of your class? The more I can safely assume about it, the easier it is to use. If it might be ready to use, and might be in some "constructed, but not initialized" state, and might be in a "cleaned up but not destroyed" state, then using it quickly becomes painful.
On the other hand, if it guarantees that "if the object exists, it can be used as-is", then I'll know that I can use it without worrying about what was done to it before.
It sounds like your problem is that you're doing too much in the constructor.
What if you split the work up into multiple smaller classes? Have the codec be initialized separately, then I can simply pass the already-initialized codec to your constructor. And all the authentication and cryptography stuff and whatnot could possibly be moved out into separate objects as well, and then simply passed to "this" constructor once they're ready.
Then the remaining constructor doesn't have to do everything from scratch, but can start from a handful of helper objects which are already initialized and ready to be used, so it just has to connect the dots.
you could just place the check in the IO calls to see if auth has been called, and if it has, then continue, if not, then call it.
this removes the burden from the user, and delays the expense until needed.
Basically, this all boils down to which design to choose from the following three:
Designs
Disclaimer: this post is not encouraging the use of exception specifications or exceptions for that matter. The errors may equivalently be reported using error codes if you wish. Exception specifications as used here are just meant to illustrate when different errors can occur using a concise syntax.
Design 1
This is the most recurring design out there, and totally non-RAII. The constructor just puts the object in some stale state and each instance must be initialized manually after construction takes place.
class SecureStream
{
public:
SecureStream();
void initialize(Stream&,const Key&) throw(InvalidKey,AlreadyInitialized);
std::size_t get( void*,std::size_t) throw(NotInitialized,IOError);
std::size_t put(const void*,std::size_t) throw(NotInitialized,IOError);
};
Pros:
Users have control over when to invoke the "heavy" initialization process
The object can be created before the key exists. This is important for frameworks such as COM, where all objects must have a default constructor (the CoCreateObject() does not allow you to forward extra arguments the object constructor). Sometimes, there are still workarounds, such as a builder object.
Cons:
Objects must be checked for the stale state before using the object. This may be enforced by the object by returning an error code or throwing an exception. Personally, I hate objects that allow me to use them and just appear to ignore my calls (e.g. a failed std::ostream).
Design 2
This is the RAII approch. Make sure the object is 100% usable with no extra artefacts (e.g. manually calling stream.initialize(...); on each instance.
class SecureStream
{
public:
SecureStream(Stream&,const Key&) throw(InvalidKey);
std::size_t get( void*,std::size_t) throw(IOError);
std::size_t put(const void*,std::size_t) throw(IOError);
};
Pros:
The object can always be assumed to be in a valid state. This is so much simpler to use.
Cons:
Constructor might take a long time to execute.
All required arguments must be available at the instance construction. This has once in a while been a problem for me, especially if most other objects in the code base use design #1.
Design 3
Somewhat of a compromise between the two previous cases. Don't initialize yet, but have the other methods lazily invoke the internal .initialize(...) method when necessary.
class SecureStream
{
public:
SecureStream(Stream&,const Key&);
std::size_t get( void*,std::size_t) throw(InvalidKey,IOError);
std::size_t put(const void*,std::size_t) throw(InvalidKey,IOError);
private:
void initialize() throw(InvalidKey);
};
Pros:
Almost as easy to use as design #1. Almost (see below).
Cons:
If the initialization step may fail, it may now fail anywhere there is a first call to any of the public methods. Proper error handling for this scenario is extremely difficult.
Discussion
If you absolutely must pay for the initialization for every instance, then design #1 is out of the question as it just results in more bugs in the software.
The question is just about when to pay for the initialization cost. Do you prefer paying it upfront, or on first use? In most scenarios, I prefer paying upfront because I don't want to assume users can handle errors later in the program. However, there might be specific threading semantics in your program, and you might not be able to stall threads at creation time (or, conversely, at use time).
In any case, you can still get the benefits of design #3 by using dynamic allocation of the class in design #2.
Conclusion
Basically, if the only reason you are hesitating is for some philosophical ideal where constructors execute quickly, I would just go with the pure RAII design.
There's no hard and fast rule on this, but in general it's best to avoid heavy constructors for two reasons that come to mind (maybe others as well):
The order of the objects created intializer list can give rise to subtle bugs
What to do with exceptions in the constructor? Will you need to handle partially-constructed objects in your app?

How defensive should you be? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Defensive programming
We had a great discussion this morning about the subject of defensive programming. We had a code review where a pointer was passed in and was not checked if it was valid.
Some people felt that only a check for null pointer was needed. I questioned whether it could be checked at a higher level, rather than every method it is passed through, and that checking for null was a very limited check if the object at the other end of the point did not meet certain requirements.
I understand and agree that a check for null is better than nothing, but it feels to me that checking only for null provides a false sense of security since it is limited in scope. If you want to ensure that the pointer is usable, check for more than the null.
What are your experiences on the subject? How do you write defenses in to your code for parameters that are passed to subordinate methods?
In Code Complete 2, in the chapter on error handling, I was introduced to the idea of barricades. In essence, a barricade is code which rigorously validates all input coming into it. Code inside the barricade can assume that any invalid input has already been dealt with, and that the inputs that are received are good. Inside the barricade, code only needs to worry about invalid data passed to it by other code within the barricade. Asserting conditions and judicious unit testing can increase your confidence in the barricaded code. In this way, you program very defensively at the barricade, but less so inside the barricade. Another way to think about it is that at the barricade, you always handle errors correctly, and inside the barricade you merely assert conditions in your debug build.
As far as using raw pointers goes, usually the best you can do is assert that the pointer is not null. If you know what is supposed to be in that memory then you could ensure that the contents are consistent in some way. This begs the question of why that memory is not wrapped up in an object which can verify it's consistency itself.
So, why are you using a raw pointer in this case? Would it be better to use a reference or a smart pointer? Does the pointer contain numeric data, and if so, would it be better to wrap it up in an object which managed the lifecycle of that pointer?
Answering these questions can help you find a way to be more defensive, in that you'll end up with a design that is easier to defend.
The best way to be defensive is not to check pointers for null at runtime, but to avoid using pointers that may be null to begin with
If the object being passed in must not be null, use a reference! Or pass it by value! Or use a smart pointer of some sort.
The best way to do defensive programming is to catch your errors at compile-time.
If it is considered an error for an object to be null or point to garbage, then you should make those things compile errors.
Ultimately, you have no way of knowing if a pointer points to a valid object. So rather than checking for one specific corner case (which is far less common than the really dangerous ones, pointers pointing to invalid objects), make the error impossible by using a data type that guarantees validity.
I can't think of another mainstream language that allows you to catch as many errors at compile-time as C++ does. use that capability.
There is no way to check if a pointer is valid.
In all serious, it depends on how many bugs you'd like to have to have inflicted upon you.
Checking for a null pointer is definitely something that I would consider necessary but not sufficient. There are plenty of other solid principles you can use starting with entry points of your code (e.g., input validation = does that pointer point to something useful) and exit points (e.g., you thought the pointer pointed to something useful but it happened to cause your code to throw an exception).
In short, if you assume that everyone calling your code is going to do their best to ruin your life, you'll probably find a lot of the worst culprits.
EDIT for clarity: some other answers are talking about unit tests. I firmly believe that test code is sometimes more valuable than the code that it's testing (depending on who's measuring the value). That said, I also think that units tests are also necessary but not sufficient for defensive coding.
Concrete example: consider a 3rd party search method that is documented to return a collection of values that match your request. Unfortunately, what wasn't clear in the documentation for that method is that the original developer decided that it would be better to return a null rather than an empty collection if nothing matched your request.
So now, you call your defensive and well unit-tested method thinking (that is sadly lacking an internal null pointer check) and boom! NullPointerException that, without an internal check, you have no way of dealing with:
defensiveMethod(thirdPartySearch("Nothing matches me"));
// You just passed a null to your own code.
I'm a big fan of the "let it crash" school of design. (Disclaimer: I don't work on medical equipment, avionics, or nuclear power-related software.) If your program blows up, you fire up the debugger and figure out why. In contrast, if your program keeps running after illegal parameters have been detected, by the time it crashes you'll probably have no idea what went wrong.
Good code consists of many small functions/methods, and adding a dozen lines of parameter-checking to every one of those snippets of code makes it harder to read and harder to maintain. Keep it simple.
I may be a bit extreme, but I don't like Defensive Programming, I think it's laziness that has introduced the principle.
For this particular example, there is no sense in assert that the pointer is not null. If you want a null pointer, there is no better way to actually enforce it (and document it clearly at the same time) than to use a reference instead. And it's documentation that will actually be enforced by the compiler and does not cost a ziltch at runtime!!
In general, I tend not to use 'raw' types directly. Let's illustrate:
void myFunction(std::string const& foo, std::string const& bar);
What are the possible values of foo and bar ? Well that's pretty much limited only by what a std::string may contain... which is pretty vague.
On the other hand:
void myFunction(Foo const& foo, Bar const& bar);
is much better!
if people mistakenly reverse the order of the arguments, it's detected by the compiler
each class is solely responsible for checking that the value is right, the users are not burdenned.
I have a tendency to favor Strong Typing. If I have an entry that should be composed only of alphabetical characters and be up to 12 characters, I'd rather create a small class wrapping a std::string, with a simple validate method used internally to check the assignments, and pass that class around instead. This way I know that if I test the validation routine ONCE, I don't have to actually worry about all the paths through which that value can get to me > it will be validated when it reaches me.
Of course, that doesn't me that the code should not be tested. It's just that I favor strong encapsulation, and validation of an input is part of knowledge encapsulation in my opinion.
And as no rule can come without an exception... exposed interface is necessarily bloated with validation code, because you never know what might come upon you. However with self-validating objects in your BOM it's quite transparent in general.
"Unit tests verifying the code does what it should do" > "production code trying to verify its not doing what its not supposed to do".
I wouldn't even check for null myself, unless its part of a published API.
It very much depends; is the method in question ever called by code external to your group, or is it an internal method?
For internal methods, you can test enough to make this a moot point, and if you're building code where the goal is highest possible performance, you might not want to spend the time on checking inputs you're pretty darn sure are right.
For externally visible methods - if you have any - you should always double check your inputs. Always.
From debugging point of view, it is most important that your code is fail-fast. The earlier the code fails, the easier to find the point of failure.
For internal methods, we usually stick to asserts for these kinds of checks. That does get errors picked up in unit tests (you have good test coverage, right?) or at least in integration tests that are running with assertions on.
checking for null pointer is only half of the story,
you should also assign a null value to every unassigned pointer.
most responsible API will do the same.
checking for a null pointer comes very cheap in CPU cycles, having an application crashing once its delivered can cost you and your company in money and reputation.
you can skip null pointer checks if the code is in a private interface you have complete control of and/or you check for null by running a unit test or some debug build test (e.g. assert)
There are a few things at work here in this question which I would like to address:
Coding guidelines should specify that you either deal with a reference or a value directly instead of using pointers. By definition, pointers are value types that just hold an address in memory -- validity of a pointer is platform specific and means many things (range of addressable memory, platform, etc.)
If you find yourself ever needing a pointer for any reason (like for dynamically generated and polymorphic objects) consider using smart pointers. Smart pointers give you many advantages with the semantics of "normal" pointers.
If a type for instance has an "invalid" state then the type itself should provide for this. More specifically, you can implement the NullObject pattern that specifies how an "ill-defined" or "un-initialized" object behaves (maybe by throwing exceptions or by providing no-op member functions).
You can create a smart pointer that does the NullObject default that looks like this:
template <class Type, class NullTypeDefault>
struct possibly_null_ptr {
possibly_null_ptr() : p(new NullTypeDefault) {}
possibly_null_ptr(Type* p_) : p(p_) {}
Type * operator->() { return p.get(); }
~possibly_null_ptr() {}
private:
shared_ptr<Type> p;
friend template<class T, class N> Type & operator*(possibly_null_ptr<T,N>&);
};
template <class Type, class NullTypeDefault>
Type & operator*(possibly_null_ptr<Type,NullTypeDefault> & p) {
return *p.p;
}
Then use the possibly_null_ptr<> template in cases where you support possibly null pointers to types that have a default derived "null behavior". This makes it explicit in the design that there is an acceptable behavior for "null objects", and this makes your defensive practice documented in the code -- and more concrete -- than a general guideline or practice.
Pointer should only be used if do you need to do something with the pointer. Such as pointer arithmetic to transverse some data structure. Then if possible that should be encapsulated in a class.
IF the pointer is passed into the function to do something with the object to which it points, then pass in a reference instead.
One method for defensive programming is to assert almost everything that you can. At the beginning of the project it is annoying but later it is a good adjunct to unit testing.
A number of answer address the question of how to write defenses in your code, but no much was said about "how defensive should you be?". That's something you have to evaluate based on the criticality of your software components.
We're doing flight software and the impacts of a software error range from a minor annoyance to loss of aircraft/crew. We categorize different pieces of software based on their potential adverse impacts which affects coding standards, testing, etc. You need to evaluate how your software will be used and the impacts of errors and set what level of defensiveness you want (and can afford). The DO-178B standard calls this "Design Assurance Level".

c++: Operator overloading and error handling

I am currently starting to look into operator overloading in c++ for a simple 2D vertex class where the position should be available with the [] operator. That generally works, but I dont really know how to deal with errors for instance if the operator is out of bounds (in the case of a 2D vertex class which only has x and y values, it is out of bounds if it is bigger than one)
What is the common way to handle errors in cases like that?
Thanks
When you have to throw, you have to throw. There's no other way to diagnose a problem in an overloaded operator unless you can return some sort of magic exploding result value. Define an exception type, throw it on errors, document it.
Error handling is a tricky beast in the best of times. It pretty much boils down to how big a deal the error is, and what if anything is expected to happen with it when it occurs.
There are four basic paths you can follow:
Throw an exception
The sledgehammer of error handling. A great tool, definitely want to use it if you need it, but if you're not careful you'll end up smashing yourself in the foot.
Essentially skips everything between the throw and the catch, leaving nothing but death and destruction in it's wake.
If it's not caught, it will abort your program.
Return a value that indicates failure
Leave it to the programmer to check for success and react accordingly.
Failure value would depend on type. Pointers can return NULL or 0, STL containers return object.end(), else otherwise unused values can be used (such as -1 or "").
Process the condition gracefully
Sometimes, an error isn't really an error, just an inconvenience.
If useful results can still be provided, a mistake can easily be swept under the carpet without hurting anyone.
For example, an out of range error can just return the last variable in an array, without needing to resort to any of that messy exception stuff.
So long as it's predictable and defined, the programmer can make of it what they wish.
Undefined behaviour
Hey, programmers shouldn't be giving you bad input in the first place. Let them suffer.
In general, I would resort to option one only for stuff that's program-breaking, for things that I don't really expect to recover from without concerted effort. Otherwise, using exceptions as a form of flow control is little better than going back to the days of goto.
Option two is probably the most common for non-program-breaking errors, but it's effectiveness really depends on the types of return you're dealing with. It is advantageous since it lets the programmer control the flow locally by detecting failures and recovering themselves. When dealing with overloading operators, it is of limited use, but I figured I'd throw it in for the sake of completeness.
Option three is very circumstance-specific. Many errors can't be handled in such a way, and even the ones that can can lead to unintuitive results. Use with caution, and be sure to document thoroughly. Or, don't document it at all, and pretend it's option four.
Now, as to the specific example provided, that being an out of range error on an overloaded operator[], I would personally go for option four. Not because I particularly enjoy watching other programmers suffer when they deal with my code (I do, incidentally, but that's tangential to the discussion), but because it's expected.
Most cases where a programmer would be using operator[], they expect to handle their own bounds checking and don't rely on the type or class to do anything for them. Even in the STL containers, you can see operator[] (no range checking) in parallel with the otherwise redundant object.at() (which does range checking). Reflecting the expected behaviour with your own overloaded operators tends to make for more intuitive code.
According to the C++ language FAQ, operator[] should not be used for matrices or 2d array implementations; instead use operator().Click here for FAQ #13.10
The big problem is implementing [] for multiple dimensions.
As for errors, you will have to go to the exception route if you don't want to provide any extra parameters to your overloaded operator (another reason to use operator().
I think an assertion might also be in place. Do you foresee that it would ever be anything else than a (simple?) programmer's error to go out of bounds in a 2d vector?
T& operator[](size_t index)
{
assert(index < 2 && "Vector index out of bounds");
return pos[index];
}
If you are going to throw exceptions, I suppose you could also use out_of_range - or a type derived from it.
As others have pointed out, exceptions are the way to go.
But that would seem to be quite an unusual idiom for accessing a point class like that. I would find it much more straightforward for the vertex class to have separate members:
class Vertex {
...
double x;
double y;
};
Then you can operate on them by doing things like vertex1.x - vertex2.x etc, which IMO is more readable than vertex1[0] - vertex2[0]. For an added bonus it avoids your exception problem completely.
You have at least two options other than exceptions for handling indexes out of bounds:
Just trust your input, document that it's undefined behaviour to use an index out of bounds, and rely on your callers to be professionals[*].
Abort if an index is out of bounds, by calling std::terminate(), or abort() directly, or whatever, perhaps after printing an error message.
There's a compromise between the two, which is to use the assert macro. This will do the former in release builds (compiled with NDEBUG), and the latter in debug builds.
Not that exceptions are necessarily a bad idea, but they have their problems. Then again, most of those problems go away if you never catch them.
In this case, the caller has to pass you either 0 or 1. If they sometimes pass you 2, and plan to catch the exception that happens when they do, then there may be no hope for them. Don't spend too much time worrying about it.
Another option would be to accept all inputs, but map them on to one or other of the values. For instance you could bitwise-and the input with 1. This makes your code very simple, with the obvious disadvantage that it obscures other people's bugs.
[*] Not to say that professionals don't make mistakes. They do. They just don't expect you to save them from their mistakes.