Why is undefined behavior allowed in the STL? - c++

By default, the "underlying container" of an std::stack is an std::deque. Therefore anything that is undefined behavior for a std::deque is undefined behavior for a std::stack. cppreference and other sites use the terminology "effectively" when describing the behavior of member functions. I take this to mean that it is for all intents and purposes. So therefore, calling top() and pop() is equivalent to calling back() and pop_back(), and calling these on an empty container is undefined behavior.
From my understanding, the reason why it's undefined behavior is to preserve the no-throw guarantee. My reasoning is that operator[] for std::vector has a no-throw guarantee and is undefined behavior if container size is greater than N, but at() has a strong guarantee, and throws std::out_of_range if n is out of bounds.
So my question is, what is the rationale behind some things having possibly undefined behavior and having a no throw guarantee versus having a strong guarantee but throwing an exception instead?

When undefined behaviour is allowed, it's usually for reasons of efficiency.
If the standard specified what has to happen when you access an array out of bounds, it would force the implementation to check whether the index is in bounds. Same goes for a vector, which is just a wrapper for a dynamic array.
In other cases the behaviour is allowed to be undefined in order to allow freedom in the implementation. But that, too, is really about efficiency (as some possible implementation strategies could be more efficient on some machines than on others, and C++ leaves it up to the implementer to pick the most efficient strategy, if they so desire.)

According to Herb Sutter one marked reason is efficiency. He states that the standard does not impose any requirements on operator[]'s exception specification or whether or not it requires bound checking. This is up to the implementation.
On the other hand, vector<T>::operator[]() is allowed, but not
required, to perform bounds checking. There's not a breath of wording
in the standard's specification for operator[]() that says anything
about bounds checking, but neither is there any requirement that it
have an exception specification, so your standard library implementer
is free to add bounds-checking to operator[](), too. So, if you use
operator[]() to ask for an element that's not in the vector, you're
on your own, and the standard makes no guarantees about what will
happen (although your standard library implementation's documentation
might) -- your program may crash immediately, the call to
operator[]() might throw an exception, or things may seem to work
and occasionally and/or mysteriously fail.
Given that bounds checking protects us against many common problems,
why isn't operator[]() required to perform bounds checking? The
short answer is: Efficiency. Always checking bounds would cause a
(possibly slight) performance overhead on all programs, even ones that
never violate bounds. The spirit of C++ includes the dictum that, by
and large, you shouldn't have to pay for what you don't use, and so
bounds checking isn't required for operator[](). In this case we
have an additional reason to want the efficiency: vectors are intended
to be used instead of built-in arrays, and so should be as efficient
as built-in arrays, which don't do bounds-checking. If you want to be
sure that bounds get checked, use at() instead.
If you're curious about the performance benefits, see these two questions:
::std::vector::at() vs operator[] << surprising results!! 5 to 10 times slower/faster!
vector::at vs. vector::operator[]
The consensus seems to be that operator[] is more efficient (since std::vector is just a wrapper around a dynamic array, operator[] should be just as efficient as if you would call it on an array.) And Herb Sutter seems to suggest that whether or not it is exception-safe is up to the compiler-vendor.

Related

How do I determine from the documentation what type of exception a function can throw?

I am new to C++ programming but have programmed in higher-level languages to find my way around most documentation. I'm learning about exception handling in C++, specifically with this example:
vector<int> myNums;
try
{
myNums.resize(myNums.max_size() + 1);
}
catch (const bad_alloc& err)
{
cout << err.what() << endl;
}
This code doesn't catch the exception because the exception thrown by the .resize() method isn't bad_alloc; it's a length_error. So, from this documentation, how do you get to that? Maybe I missed something obvious.
https://cplusplus.com/reference/vector/vector/resize/
The only specific exception mentioned in there is bad_alloc. Can someone walk me through how you'd get to know that length_error is the right exception starting from that page?
This is not uncommon. The complexity of the language has increased so much over the years, accumulating multiple revisions to the C++ standard, that even the C++ standard itself can be at odds with itself, sometimes.
Let's just see what the C++ standard itself says about two versions of the overloaded resize() vector method. I happen to have a copy of N4860 handy which is, basically, the C++20 version, and while looking up what the C++ standard itself says about resize()'s exceptions, I found that the two resize() overloads define their exception behavior as follows:
constexpr void resize(size_type sz);
// ...
Remarks: If an exception is thrown other than by the move constructor
of a non-Cpp17CopyInsertable T there are no effects.
// ...
constexpr void resize(size_type sz, const T& c);
// ...
Remarks: If an exception is thrown there are no effects.
That's the only mention of exceptions in resize(). I found nothing on a more general in vector itself, nor in "Container Requirements", there was some discussion of exception guarantees but none pertaining to the specific details of vector's resize() or reserve().
This is an obvious oversight. It's fairly obvious that when it comes to exceptions that might be generated as a result of reallocations, both overloads should have the same exception behavior. The first overload's description is lifted straight from reserve() that just precedes it. It goes without saying that resize() uses reserve() to grow the vector's capacity, when needed, and inherits its exception guarantees/behavior.
But the same thing must be true with the 2nd resize() overload. The only difference between them is that one default-constructs new values when the vector grows and the other one copy-constructs. But in terms of exception behavior during reallocation they must be identical. The total overall difference, related to exceptions, between the two overloads is due to any exception differences between the value's default constructor and/or its copy/move constructors.
My question is, from looking at this documentation, how do you get to that? > Maybe I missed something obvious.
No, you did not miss anything. The C++ standard itself has some gaps; not to mention 2nd-hand sources of documentation like the one you're looking at.
You get where you want to go by studying everything about the class, template, or algorithm in question, understanding how it must work -- i.e. the resize()s inheriting certain parts of their behavior from reserve() -- and then drawing the inescapable inferences.
TLDR: it is what it is.
Starting with https://cplusplus.com/reference/vector/vector/resize/, if you've got a case like yours pushing max_size(), you might pay special attention to the case listed on this doc page which states:
If n is also greater than the current container capacity, an automatic reallocation of the allocated storage space takes place.
Since your case is absolutely going to be greater than the current container capacity, this might be worth looking into. Linked in this chunk of text is the doc page for capacity: https://cplusplus.com/reference/vector/vector/capacity/. From the capacity page, you would read that vector::reserve is used for explicitly increasing the capacity of the vector. Since your case with max_size() + 1 is certainly going to involve increasing vector capacity, you might suspect this function is involved. So you might go to the doc page: https://en.cppreference.com/w/cpp/container/vector/reserve
Here you would read that vector::reserve takes a parameter new_cap which determines the new capacity of a vector. It throws length_error when new_cap > max_size().
I give this series of steps not because I think anyone would/should be expected to dig this much through docs every time they write code. Only because you were curious what steps might have led you to the exception that was thrown.
I agree it would be much better if the documentation for resize just covered all it's bases with regards to which exceptions get thrown in all cases. Unfortunately, this is all too common with regards to documentation.
You need to dig a bit to track it down.
I'm going to use documentation at cppreference, which does a reasonably decent job of tracking what happens in the standards. (The standards are the authoritative source, but the standards have evolved over time).
According to https://en.cppreference.com/w/cpp/container/vector, the second template argument when instantiating a vector is an allocator type, which defaults to std::allocator<T> (where T is the vector's element type).
Because the allocator is defaulted, it is not often referred to explicitly in user code (most developers do not need to use a non-default allocator).
But
std::vector<int> myNums;
is actually equivalent to
std::vector<int, std::allocator<int> > myNums;
The specification of a vectors resize() member function describes what happens when it throws, but not the circumstances in which it will throw - or what it may throw.
Memory allocation for std::vector is actually handled by its reserve() member function. Documentation for that function at https://en.cppreference.com/w/cpp/container/vector/reserve states it throws std::length_error if new_cap > max_size(), or any exception thrown by Allocator::allocate() (typically std::bad_alloc). Allocator is the name of the second template parameter mentioned above.
That is a hint, but we can get even more specific by digging into documentation for the default allocator at https://en.cppreference.com/w/cpp/memory/allocator and for its allocate() member function at https://en.cppreference.com/w/cpp/memory/allocator/allocate which reveals that function will throw std::bad_alloc if allocation fails.
Rather than read through documentation or code to find an answer that may not be correct, I think the best option is to do something you should be doing anyway: write a test.
If your thoroughly test your code, the exceptions it throws will become naturally apparent and you can handle them.

Is there a safe version of C++ without undefined behaviour?

Undefined behaviour in C++ can be really hard to debug. Is there a version of C++ and standard library which does not contain any undefined behaviour but rather throws exceptions? I understand that this will be a performance killer, but I only intend to use this version when I am programming, debugging and compiling in debug mode and don't really care about performance. Ideally this version would be portable and you would be able to easily switch on/off the undefined behaviour checks.
For example, you could implement a safe pointer class like so (only check for null pointer, not actually if it points to a valid block of memory):
template <typename T>
class MySafePointer {
T* value;
public:
auto operator-> () {
#ifndef DEBUG_MODE
assert(value && "Trying to dereference a null pointer");
#endif
return value;
}
/* Other Stuff*/
};
Here the user only needs to #undef DEBUG_MODE if you want to get your performance back.
Is there a library / safe version of C++ which does this?
EDIT: Changed the code above so that it actually makes more sense and doesn't throw an exception but asserts value is non-null. The question is simply a matter of having a descriptive error message vs a crash...
Is there a version of c++ and standard library which does not contain any undefined behaviour but rather throws exceptions?
No, there is not. As mentioned in a comment, there are Address Sanitizer and Undefined Behavior Sanitizer and many other tools you can use to hunt for bugs, but there is no "C++ without undefined behavior" implementation.
If you want an inherently safe language, choose one. C++ isn't it.
Undefined behavior
Undefined behavior means that your program has ended up in a state the behavior of which is not defined by the standard.
So what you're really asking is if there's a language the standard of which defines every possible scenario.
And I can't think of one language like this, for the simple reason that programs are run by machines, but programming languages and standards and written by humans.
Is it always unintentional?
Per the reason explained above, the standard can have unintentional "holes", i.e. undefined behavior that was not intentionally allowed, and maybe not even noticed during standardization.
However, as all the "is undefined behavior" sentences in the standard prove, many times UB is intentionally allowed.
But why? Because that means giving less guarantees to the programmer, with the benefit of being able to make more optimizations or, equivalently, to not waste time verifying that the user is sticking to a defined contract.
So, even if the standard had no holes, there would still be a lot of cases where UB is stated to happen by the standard, because compilers can take advantage of it to make all sort of optmizations.²
The impact of preventing it in some trivial case
One trivial case of undefined behavior is when you access an out-of-bound element of a std::vector via operator[]. Exactly like for C-style arrays, v[i] basically gives you back *(v_ + i), where v_ is the pointer wrapped into v. This is fast and not safe.¹
What if you want to access the ith element safely? You would have to change the implementation of std::vector<>::operator[].
So what would the impact be of supporting the DEBUG_MODE flag? Essentially you would have to write two implementations separated by a #ifdef/(#else/)#endif. Obviously the two implementation can have a lot in common, so you could #-branch several times in the code. But... yeah, my bottom line is the your request can be fulfilled by changing the standard in such a way that it forces the implementers to support a two different implementations (safe and fast/unsafe and slow) for everything.
By the way, for this specific case, the standar does define another function, at, which is required to handle the out-of-bound case. But that's the point: it's another function.
Hypothetically, we could rip all undefined behavior out of C++ or even C. We could have everything be a priori well-defined and remove anything from the language whose evaluation could not be definitely determinable from first principles.
which makes me feel nervous about the answer I've given here.
(¹) This and other examples of UB are listed in this excellent article; search for Out of Bounds for the example I made.
(²) I really recommend reading this answer by Nicol Bolas about UB being absent in constexprs.
Is there a safe version of c++ without undefined behaviour?
No.
For example, you could implement a safe pointer class like so
How is throwing an exception safer than just crashing? You're still trying to find the bug so you can fix it statically, right?
What you wrote allows your buggy program to keep running (unless it just calls terminate, in which case you did some work for no result at all), but that doesn't make it correct, and it hides the error rather than helping you fix it.
Is there a library / safe version of C++ which does this?
Undefined behaviour is only one type of error, and it isn't always wrong. Deliberate use of non-portable platform features may also be undefined by the standard.
Anyway, let's say you catch every uninitialized value and null pointer and signed integer overflow - your program can still produce the wrong result.
If you write code that can't produce the wrong result, it won't have UB either.

Why there is no throw or sigsegv while accessing empty std::optional?

The example:
#include <optional>
#include <iostream>
using namespace std;
int main()
{
optional<int> t{}; // nullopt (empty) by default
cout << *t << endl;
return 0;
}
Actually this program prints some int (uninitialized value of type int).
Also, libcxx uses assert-check for accessing non-engaged value.
Why the Standard does not require throwing or sigsegv here?
Why the Standard does not require throwing or sigsegv here?
Because requiring some particular behaviour implicitly imposes the requirement to add a branch to check whether that behaviour - be it throwing or something else - should occur.
By specifying that the behaviour is undefined, the standard allows the implementation to not check whether optional is empty upon every indirection. Branching the execution is potentially slower than not branching.
Rather than mandating safety, the committee let the standard library implementers to choose performance (and simplicity). The implementation that you tested seems to have chosen to not throw an exception or otherwise inform you of the mistake.
C++ embraces the idea of undefined behavior.
Not all C++ operations have behavior defined by the standard. This permits compilers to assume they never happen, and can result in much faster code in many cases.
Here, by leaving the result of using a std::optional that is unengaged undefined, it the cost of accessing data stored in a std::optional is the same as the cost of accessing data not stored in a std::optional. The only costs are the extra room required, and you as a programmer promising to keep track of if it is engaged or not.
Now compilers are free to insert checks there, and some do in debug builds.
Note that usually C++ std library types include safe and unsafe methods for accessing data.
The fact that invalid pointers sometimes result in a sigsev is because most OS's protect addresses around 0 and crash programs that access it. This is because this was low cost, and it catches a bunch of bad behavior from many assembly, C and C++ programs.
If you want optional to throw when empty, use .value(). If you don't, use operator*. If you want a default value if one isn't there, use .value_or.
Because it is undefined behavior, section [optional.observe]p5 says:
Requires: *this contains a value.
and violating a requires clause is undefined behavior, from [res.on.required#1]p1 which is under Library-wide requirements:
Violation of any preconditions specified in a function's Requires: element results in undefined behavior unless the function's Throws: element specifies throwing an exception when the precondition is violated.
So you have no expecation as to the result. From the definition of undefined behavior:
behavior for which this document imposes no requirements
Requiring the implementation to check would be a cost and not all users would want to take that cost. So this becomes a quality of implementation issue. An implementation is free to performing checks in different modes of operation for example when assertions are enabled.
The user has the option of taking the cost themselves via has_value or value_or. If the user wants an operation that can throw they can use value.
Note that sigsegv, segfaults etc... are an implementation defined behavior.

Does std::vector::erase() really invalidate the iterator at the point of erase?

I was playing around to understand iterator invalidation rule. However, when I run following code in c++14 compiler, output really confuses me..
std::vector<int> test = {1,2,3};
auto it = test.begin() + 1;
test.erase(it);
std::cout << *it << std::endl;
output = 3
shouldn't it invalidate at this point? Why it seems to jump to the next pos?
Many thanks in advance
Dereferencing an invalidated iterator has undefined results. Your program may crash, it may stop with a runtime error or break in the debugger (if you are running a debug build with a debug version of the STL with iterator debugging/validation) and it may "seem to work", i.e., deliver the value that was erased from the collection.
This is because iterators MAY be implemented as just pointers. This is not necessarily the case, but defining behavior in this situation as undefined allows such an efficient and simple implementation. Invalid iterators implemented as pointers MAY still point to a valid memory location, which MAY still contain the value it previously contained, even though it is logically no longer part of the data structure (collection) it was a part of. There is no validation code which checks if the iterator is valid when it is dereferenced (except sometimes in debug builds).
This is both one of the characteristics strengths and one of the weaknesses of C++, as it gives your program better performance at the cost of stability and security in case your program does something undefined (due to a bug or using unvalidated user input).
When describing iterator invalidation, the C++ standard takes the simplifying assumption that iterators refer to elements, and a valid iterator value always refers to same element. Invalidating references, pointers or iterators to an element all follow the same rules. (The exception is the end iterator).
Clearly references or pointers to the erased element are invalidated by a call to erase, thus under the standard's simple rules so are all iterators. It could have described what new element must be moved in place abd substituted what iterators refer to, but the writers of the standard chise not to go there. They instead simply dictated the iterator was invalid.
Because it is invalid, dereferencing it or doing anything but destroying it or assigning anitger iterator to it is declared undefined behaviour.
In theory this permits a myriad of optimization opportunities, but I am unaware of any compiler that exploits them. At best compilers add debug checks at this time.
So dereferencing it "works", but being UB the result is inheritly fragile. Future compilers could assume you do not do this and cause arbitrary side effects, including time travel (I am not joking; current compilers can cause UB to time travel and corrupt program state before the UB occurs; in particular, int overflow optimizations).
Every current compiler uses at best thinly wrapped pointers for vector iterators. But relying on non-standard mandated behaviour quirks of the implementation is a bad plan, when doing it correctly only requires a bit more work. If you find a case where assuming that behaviour would be highly useful, I encourage you to write a proposal to define that behaviour using your use case as motivation.

How can I find out the exact conditions when STL containers throw exceptions?

For STL containers (so far, std::vector<> and std::deque<>), I'm looking for documentation that says exactly when they throw exceptions. Something like, "It throws X in situation A. It throws Y in situation B. It throws no other exceptions under any circumstances."
I'd like to reassure my exception-phobic colleagues that we know exactly what can trigger exceptions in the STL classes we use.
The most accurate information will come from the C++ standard matching your compiler, and compiler documentation. However, the spec costs money. If you're willing to settle for a few typos, the draft C++11 specification can be found here: http://open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf for free, and the latest publicly available draft (preparing for C++14) seems to be http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf.
The number 1 used container is vector, so lets go over that.
Technically the only vector member that throws an exception is at, if given an out of range index. (tada! we're done!)
Less technically, vector assign/insert/emplace/reserve/resize/push_back/emplace_back/shrink_to_fit/etc can cause a resize, which uses std::allocator<T>:allocate, which can throw std::bad_alloc in theory. In weird situations with weird allocators, swap can also trigger this too. On some systems (Linux), this pretty much never happens, because it only throws if it runs out of virtual memory, and often the program will run out of physical memory first, and the OS will simply kill the whole program. That happens regardless of exceptions though, so this doesn't count against C++ exceptions.
Probably relevant is that the elements in a vector can throw any exception when copied, which affects constructors, assignment, insert/emplace/push_back/emplace_back, reserve, resize/shrink_to_fit. (If your element has a noexcept move constructor and move assignment, and it really really really should, then happens only when copying the entire vector).
The spec details exactly what exceptions are thrown and often also specifies under exactly what conditions they're thrown.
The C++ standard documents when exceptions will be thrown and under what circumstances for the standard library containers. There are also general rules about which methods will not throw exceptions for containers.
Alternatively, you can search the headers for throw (or the equivalent macro) to determine under what circumstances exceptions will trigger.