I'm teaching myself techniques of programming with exception safety mode of ;) and I wonder if dereferencing a pointer could ever throw an exception? I think it would be helpful for all C++ programmers to know all operations that are guarantied to not throw so I would be very grateful if someone could compose such list.
Dereferencing a simple pointer (T*) can lead to Undefined Behavior if there is no valid object of the specified type where the pointer points to. It's the nature of UB that the result might be anything, including, but not limited to, a C++ exception. One could indeed imagine an implementation that would check pointers on access and throw an exception. However, I doubt that such an implementation of C++ will ever exist (if you can spare the runtime overhead of doing this, why use C++?) and the common behavior on most platforms is to either muddle on (if the memory in question is allocated to the process) or crash. On some platforms there are ways to intercept such crashes (like Windows' Structured Exceptions).
However, the unary operator*() might be overloaded, and usually is for smart pointers and iterators. Such implementations can certainly do anything their implementors want, including, but not limited to, throwing an exception. But again due to runtime overhead, common smart pointer implementations only check in debug builds (usually using some form of an assertion), but not in release builds. (A notable exception are the iterators in recent Visual C++ implementations, which do get quite some heat for this unusual behavior.)
There is a very strong tradition in C++ to differentiate between errors the programmer could have prevented (like accessing an array out of bounds) and errors that programmers could not have prevented (like a network connection dying). For raw speed, the former usually lead to UB, because checking them every time would cost performance. It's left to programmers to check where appropriate and necessary.
You can see this distinction in the definition of the standard library's exception hierarchy, which splits into preventable std::logic_error and unpreventable std::runtime_error.
Dereferencing an invalid pointer is undefined behavior, which the implementation can then define as a thrown exception. This is very uncommon in C++, although it's the rule in some other languages.
You can catch memory access exceptions using std::signal( my_handler, SIGSEGV );, although that still depends on platform support, and the Standard does not allow you to translate it into a C++ exception.
I understand that Microsoft has their own "managed" exceptions or some such, which perhaps could be caught with that mechanism and rethrown in C++ conventions. Or maybe that's forbidden; I don't know.
If it's a simple pointer (and not some autopointer object or iterator, etc.) then the act of dereferencing can't throw an exception, because dereferencing on its own doesn't do anything. In the compilation process, dereferencing a pointer is just a way of telling the compiler to compose an instruction that does something with what that pointer points to. If the pointer is invalid and you try to write to it with a dereferenced expression, then it certainly will (or should) error out.
As an example:
int *p = 0xFFFFFFFF; // Invalid pointer
*p; // Dereferenced, but since it doesn't do anything there's no error
*p = 0; // Dereferenced write, so it will halt and catch fire
dereferencing occurs at runtime, and at a very low level(assembly/machine level), thus it cannot throw anything, it will however raise and exception, such as EXCEPTION_ACCESS_VIOLATION or SegFault on linux/unix.
(this assumes raw pointers are being used)
Related
Undefined behaviour in C++ can be really hard to debug. Is there a version of C++ and standard library which does not contain any undefined behaviour but rather throws exceptions? I understand that this will be a performance killer, but I only intend to use this version when I am programming, debugging and compiling in debug mode and don't really care about performance. Ideally this version would be portable and you would be able to easily switch on/off the undefined behaviour checks.
For example, you could implement a safe pointer class like so (only check for null pointer, not actually if it points to a valid block of memory):
template <typename T>
class MySafePointer {
T* value;
public:
auto operator-> () {
#ifndef DEBUG_MODE
assert(value && "Trying to dereference a null pointer");
#endif
return value;
}
/* Other Stuff*/
};
Here the user only needs to #undef DEBUG_MODE if you want to get your performance back.
Is there a library / safe version of C++ which does this?
EDIT: Changed the code above so that it actually makes more sense and doesn't throw an exception but asserts value is non-null. The question is simply a matter of having a descriptive error message vs a crash...
Is there a version of c++ and standard library which does not contain any undefined behaviour but rather throws exceptions?
No, there is not. As mentioned in a comment, there are Address Sanitizer and Undefined Behavior Sanitizer and many other tools you can use to hunt for bugs, but there is no "C++ without undefined behavior" implementation.
If you want an inherently safe language, choose one. C++ isn't it.
Undefined behavior
Undefined behavior means that your program has ended up in a state the behavior of which is not defined by the standard.
So what you're really asking is if there's a language the standard of which defines every possible scenario.
And I can't think of one language like this, for the simple reason that programs are run by machines, but programming languages and standards and written by humans.
Is it always unintentional?
Per the reason explained above, the standard can have unintentional "holes", i.e. undefined behavior that was not intentionally allowed, and maybe not even noticed during standardization.
However, as all the "is undefined behavior" sentences in the standard prove, many times UB is intentionally allowed.
But why? Because that means giving less guarantees to the programmer, with the benefit of being able to make more optimizations or, equivalently, to not waste time verifying that the user is sticking to a defined contract.
So, even if the standard had no holes, there would still be a lot of cases where UB is stated to happen by the standard, because compilers can take advantage of it to make all sort of optmizations.²
The impact of preventing it in some trivial case
One trivial case of undefined behavior is when you access an out-of-bound element of a std::vector via operator[]. Exactly like for C-style arrays, v[i] basically gives you back *(v_ + i), where v_ is the pointer wrapped into v. This is fast and not safe.¹
What if you want to access the ith element safely? You would have to change the implementation of std::vector<>::operator[].
So what would the impact be of supporting the DEBUG_MODE flag? Essentially you would have to write two implementations separated by a #ifdef/(#else/)#endif. Obviously the two implementation can have a lot in common, so you could #-branch several times in the code. But... yeah, my bottom line is the your request can be fulfilled by changing the standard in such a way that it forces the implementers to support a two different implementations (safe and fast/unsafe and slow) for everything.
By the way, for this specific case, the standar does define another function, at, which is required to handle the out-of-bound case. But that's the point: it's another function.
Hypothetically, we could rip all undefined behavior out of C++ or even C. We could have everything be a priori well-defined and remove anything from the language whose evaluation could not be definitely determinable from first principles.
which makes me feel nervous about the answer I've given here.
(¹) This and other examples of UB are listed in this excellent article; search for Out of Bounds for the example I made.
(²) I really recommend reading this answer by Nicol Bolas about UB being absent in constexprs.
Is there a safe version of c++ without undefined behaviour?
No.
For example, you could implement a safe pointer class like so
How is throwing an exception safer than just crashing? You're still trying to find the bug so you can fix it statically, right?
What you wrote allows your buggy program to keep running (unless it just calls terminate, in which case you did some work for no result at all), but that doesn't make it correct, and it hides the error rather than helping you fix it.
Is there a library / safe version of C++ which does this?
Undefined behaviour is only one type of error, and it isn't always wrong. Deliberate use of non-portable platform features may also be undefined by the standard.
Anyway, let's say you catch every uninitialized value and null pointer and signed integer overflow - your program can still produce the wrong result.
If you write code that can't produce the wrong result, it won't have UB either.
There are several special functions which usually guarantee not to throw excpetions, e.g.:
Destructors
swap method
Consider the following swap implementation, as stated in this answer:
friend void swap(dumb_array& first, dumb_array& second)
{
using std::swap;
swap(first.mSize, second.mSize);
swap(first.mArray, second.mArray); // What if stack overlow occurs here?
}
It uses two swap functions - for integer and for pointer. What if the second function will cause stack overflow? Objects will become corrupted. I guess it is not an std::exception, it is some kind of system exception, like Win32-exception. But now we cannot guarantee no-throwing, since we're calling a function.
But all authoritative sources just use swap like it's ok, no exceptions will ever be thrown here. Why?
In general you cannot handle running out of stack. The standard doesn't say what happens if you run out of stack, neither does it talk about what the stack is, how much is available, etc. OSes may let you control it at the time the executable is built or when it is run, all of which is fairly irrelevant if you're writing library code, since you have no control of how much stack the process has, or how much has already been used before the user calls into your library.
You can assume that stack overflow results in the OS doing something external to your program. A very simple OS might just let it go weird (undefined behavior), a serious OS might blow the process away, or if you're really unlucky it throws some implementation-defined exception. I actually don't know whether Windows offers an SEH exception for stack overflow, but if it does then it's probably best not to enable it.
If you're concerned, you can mark your swap function as noexcept. Then in a conforming implementation, any exception that tries to leave the function will cause the program to terminate(). That is to say, it fulfils the noexcept contract at the cost of taking out your program.
What if the second function will cause stack overflow?
Then your program is in an unrecoverable faulted state, and there is no practical way to handle the situation. Hopefully, the overflow has already caused a segmenation fault and terminated the program.
But now we cannot guarantee no-throwing
I've never encountered an implementation that would throw an exception in that state, and I'd be rather scared if it did.
But all authoritative sources just use swap like it's ok, no exceptions will ever be thrown here. Why?
The authoritative sources I've read (like this one, for example) don't "just use it like it's OK"; they say that if you have (for example) a non-throwing swap function, and a non-throwing destructor, then you can provide exception-safety guarantees from functions that use them.
It's useful to categorise functions according to their exception guarantees:
Basic: exceptions leave everything in a valid but unspecified state
Strong: exceptions leave the state unchanged
No-throw: no exceptions will be thrown.
Than a common approach to providing the "strong" guarantee is:
do the work that might throw on a temporary copy of the state
swap that copy with the live state (requiring a non-throwing swap operation)
destroy the old state (requiring a non-throwing destructor)
If you don't have a no-throw guarantee from those operations, then it's more difficult, and perhaps impossible, to provide a strong guarantee.
Both Java and C#, and probably many other languages too, have a predefined exception class that is thrown when a null parameter is used where it should not. Is there anything similar in C++? If not, is there another predefined exception I can use or should I define my own one?
Dereferencing a NULL pointer is undefined behaviour in C++ - which means the code can appear to work. An exception isn't guaranteed to be thrown. You can use the
std::invalid_argument
exception (provide a meaningful value to it - "p is NULL"), but you'll have to do the check yourself.
Usually, in C++ (or C for that matter), you never dereference a NULL pointer. Doing this has undefined behavior (likely a segfault on any implementation I know of, but anything could happen according to the standard). It's probably a bad thing in other languages as well, but I don't know those enough to assert that.
It's best to prevent the situation than to try to recover from it (which can't be done in C or C++ anyway).
The usual pattern to prevent some related programmer errors is to use assert() inside function bodies such as:
int foo(int* myint)
{
// Ensure myint is not NULL
assert(myint);
// Do something with myint
(*myint)++;
return *myint;
}
Such assert() calls are completely ignored on release builds and thus have no cost in production. They just help the development. On debug builds, and if the condition is not met, the program aborts immediately with a very explicit error message. Running it through a debugger, you can easily check the call stack to investigate for the exact reason.
There is no standard exception in C++ for dereferencing a NULL pointer.
If you want it, you can implement it yourself. On UNIX set up a SIGSEGV signal handler and throw an exception from the handler. On Windows, use the _set_se_translator() API to install a "Structured Exception" handler.
In nearly all cases that involve wrongly using a null pointer (esp., derefencing it), the C++ Standard simply leaves the behaviour undefined. No specific exception type is provided for (and no exception will be thrown).
One possible exception to this rule comes to mind, though. std::function, which is a C++11 standard-library template that can be used to wrap functions, can be assigned a null pointer:
std::function<void(int)> func = nullptr;
And if you then make an attempt to call the function wrapped by it by doing func(arg); for some argument arg, it will throw a std::bad_function_call exception.
This is, of course, not fully equivalent to the null pointer exceptions of other languages, because it is far less generally applicable.
FTR, in C# you don't use NullReferenceExceptions for anything unless you want to get stabbed by your team mates. There is a ArgumentNullException instead for rejecting null arguments. NREs are meant to be thrown by the runtime, not you.
But note that there is no actually advantage of this over an assertion because you should not be catching one of these: if they are thrown, they indicate a bug. These are what Eric Lippert calls boneheaded exceptions and they are your own darn fault, and there is nothing your code should be doing specifically with them.
Under C++ dereferencing null pointer will result in undefined behaviour, what mostly ends application with segmentation fault. Under Visual Studio you can use extensions like Structured Exception Handling (SEH), that allow to catch null pointer dereferencing.
You can wrap a pointer in a template class which provides a limited interface to the pointer. It can do a nullptr check whenever you access the pointer, and throw an exception.
I have found out that C++ standard functions show very different behavior when having an exception. This seem to contradict its touting of "try/throw/catch". Can anyone please briefly explain what are the C++ designer's reasoning behind these choices?
Do nothing, for example, try to pop() a stack when it is empty (instead of throw a range_error), do sqrt(-1) (instead of throw a domain_error)
Return a zero pointer: for example, when doing illegal pointer downcasting (interesting, doing an illegal reference downcasting will throw a bad_cast)
Throw an exception, but this appear to a minority of functions, for example, substr()
Give user a choice of whether to throw an exception, for example, new() will throw bad_alloc() when out of memory, but you can also choose (nothrow) as an option of new().
Most of the behaviour of C++ library functions can be explained by the general C++ philosophy "you don't pay for what you don't use". That means that any particular construction shouldn't incur any unneeded overhead when you use it correctly.
Optionally, more expensive, checked versions may exist, such as std::vector::at(), but it is up to you whether or not to use them.
The example of stack::pop() and sqrt() shows this philosophy in action: In order to throw an exception on error, you would always have to check whether the call is valid. This check is unnecessary if you already know that your call will succeed, so there is no mandatory check built into those functions. If you want a check, you can write one yourself.
The default new is slightly different, as it incorporates facilities for calling a new_handler, and so the checking is done anyway. (Recall that an exception is only expensive if you actually throw it, so that aspect isn't so important.) If you wanted to, you could always replace your own global operator new() by one which literally just forwards the argument to malloc(). (That would of course make it unsafe to use the default new expression, as you have no way of checking now that you can construct an object at the returned pointer. So you'll end up writing a check yourself and using placement-new, which is almost exactly what the nothrow-version does.)
Return a zero pointer: for example, when doing illegal pointer downcasting (interesting, doing an illegal reference downcasting will throw a bad_cast)
dynamic_cast provides an way to check the validity of the cast. It is by no means an exception as such.With pointers the cast returns a NULL because throwing an exception would be an overhead and the same can be achieved by returning a NULL, With the restriction that References cannot be NULL, there was no option but to throw an exception, there was no other way to return the result to user in this case.
Give user a choice of whether to throw an exception, for example, new() will throw bad_alloc() when out of memory, but you can also choose (nothrow) as an option of new().
Long ago new just returned NULL as in case of malloc, but later on it was standardized to throw an bad_alloc exception, This would have meant all the code previously written using new would have to be modified to a large extent to handle the exception, to avoid this and maintain a compatibility the nothrow version of new was introduced.
Pointer downcasts returning null is a simpler and faster way to test whether a given object is of the given subclass. I.e. you can write stuff like if (dynamic_cast<A*>(v) || dynamic_cast<B*>(v)), or if (A* a = dynamtic_cast<A*>(v)) doStuffWith(a);, which would've been cumbersome with exceptions. Here you actually expect casts to fail, while exceptions are exceptional by their nature, in that they are supposed to rarely be thrown during the normal execution of your program.
In other cases explicit checks for incorrect values may be omitted just for performance reasons. C++ is supposed to be efficient, instead of attempting to prevent one from shooting himself in the leg.
Most of this depends on whether there are enough appropriate return values to express the "failure" conditions and whether those values are convenient or not.
Rationale behind cited examples
std::stack<>::pop() does not throw an error because it doesn't have an error value and it simplifies calling code. It might not be an application logic error to attempt to pop an empty stack.
std::sqrt() has an appropriate value (not a number) which was included in floating point representations precisely for this purpose. It also has the benefit of propagating cleanly through other computations.
dynamic_cast<> on pointer types returns a null pointer to indicate a failed cast because the null pointer is already a standard way to represent "points to nothing". However, there is no such equivalent for references, so it must throw an exception as the return value cannot be used to indicate an error.
In contrast, std::string::substr() can't return an empty string to represent failure as an empty string is a valid sub-string of all strings.
new throwing std::bad_alloc or not seems to have historical roots, but, like dynamic_cast<> on pointers, it might be convenient for some code that would try alternatives not to pay for exception handling.
The c++ standard library is designed to be efficient, it avoids unnecessary runtime checks when possible.
1 Contains violations on the preconditions on very small/fast methods, checking these preconditions would most likely take longer than executing the methods themselves (pop is most likely a single decrement on a stack of simple types).
2 dynamic_cast checks and casts a given pointer to a compatible type. There is no separate way of only checking if a cast is possible since it would have to do the same work as the cast. Since c++ provides no separate way to only check if the cast is possible we have to expect that it may fail and it has a good error value to use when it fails (NULL). The reference version has to throw an exception as it cannot return an error value.
3 substr guarantees an exception, this may be for two reasons. One: the substr method is quite a bit more complex and slow than the methods mentioned in 1 and therefor the overhead of checking the precondition is negligible. Two: string processing is one of the biggest contributors to security holes as you are most likely processing user input, checking for overflows or out of bounds access is necessary to keep the process secure/stable. The c-library provides fast, unchecked and insecure methods to manipulate strings for those who need the speed.
4 new has to check whether it can return an address or fail in both cases, since running out of memory is unexpected by most applications the exception is reasonable. However you can write c++ while using a small subset of its features and many projects do not use exceptions as making your code exception safe is hard (especially if you use third party libraries which are not), since new is a central part of c++ an exception free implementation becomes necessary.
I thought dereferencing a NULL pointer was dangerous, if so then what about this implementation of an auto_ptr?
http://ootips.org/yonat/4dev/smart-pointers.html
If the default constructor is invoked without a parameter the internal pointer will be NULL, then when operator*() is invoked won't that be dereferencing a null pointer?
Therefore what is the industrial strength implementation of this function?
Yes, dereferencing NULL pointer = bad.
Yes, constructing an auto_ptr with NULL creates a NULL auto_ptr.
Yes, dereferencing a NULL auto_ptr = bad.
Therefore what is the industrial strength implementation of this function?
I don't understand the question. If the definition of the function in question created by the industry itself is not "industrial strength" then I have a very hard time figuring out what might be.
std::auto_ptr is intended to provide essentially the same performance as a "raw" pointer. To that end, it doesn't (necessarily) do any run-time checking that the pointer is valid before being dereferenced.
If you want a pointer that checks validity, it's relatively easy to provide that, but it's not the intended purpose of auto_ptr. In fairness, I should add that the real intent of auto_ptr is rather an interesting question -- its specification was changed several times during the original standardization process, largely because of disagreements over what it should try to accomplish. The version that made it into the standard does have some uses, but quite frankly, not very many. In particular, it has transfer-of-ownership semantics that make it unsuitable for storage in a standard container (among other things), removing one of the obvious purposes for smart pointers in general.
Its purpose to help prevent memory leaks by ensuring that delete is performed on the underlying pointer whenever the auto_ptr goes out of scope (or itself is deleted).
Just like in higher-level languages such as C#, trying to dereference a null pointer/object will still explode, as it should.
Do what you would do if you dereferenced a NULL pointer. On many platforms, this means throw an exception.
Well, just like you said: dereferencing null pointer is illegal, leads to undefined behavior. This immediately means that you must not use operator * on a default-constructed auto_ptr. Period.
Where exactly you see a problem with "industrial strength" of this implementation is not clear to me.
#Jerry Coffin: It is naughty of me to answer your answer here rather than the OP's question but I need more lines than a comment allows..
You are completely right about the ridiculous semantics of the current version, it is so completely rotten that a new feature: "mutable" HAD to be added to the language just to allow these insane semantics to be implemented.
The original purpose of "auto_ptr" was exactly what boost::scoped_ptr does (AFAIK), and I'm happy to see a version of that finally made it into the new Standard. The reason for the name "auto_ptr" is that it should model a first class pointer on the stack, i.e. an automatic variable.
This auto_ptr was an National Body requirement, based on the following logic: if we have catchable exceptions in C++, we MUST have a way to manage pointers which is exception safe IN the Standard. This also applies to non-static class members (although that's a slightly harder problem which required a change to the syntax and semantics of constructors).
In addition a reference counting pointer was required but due to a lot of different possible implementation with different tradeoffs, one can accept that this be left out of the Standard until a later time.
Have you ever played that game where you pass a message around a ring of people and at the end someone reads out the input and output messages? That's what happened. The original intent got lost because some people thought that the auto_ptr, now we HAD to have it, could be made to do more... and finally what got put in the standard can't even do what the original simple scope_ptr style one did (auto_ptr semantics don't assure the pointed at object is destroyed because it could be moved elsewhere).
If I recall the key problem was returning the value of a auto_ptr: the core design simply doesn't allow that (it's uncopyable). A sane solution like
return ap.swap(NULL)
unfortunately still destroys the intended invariant. The right way is probably closer to:
return ap.clone();
which copies the object and returns the copy, destroying the original: the compiler is then free to optimise away the copy (as written not exception safe .. the clone might leak if another exception is thrown before it returns: a ref counted pointer solves this of course).