I'm working on some code for an extra credit project and came across something peculiar that I wanted to see if someone could explain for a better understanding. I have a for loop that populates a std::stack and then another that pops the stack for the same amount of time it's populated. I was wondering what would happen if I attempted to pop() when the stack itself is empty already.
If you are using the default std::stack container of std::deque, calling pop() on an empty stack would invoke undefined behavior.
"undefined behavior" means that your program's behavior can no longer be relied upon in any way.
Here's a documentation trail to follow. You are asking about the behavior of std::stack::pop() in a certain situation. So start with the documentation for that function.
std::stack<T,Container>::pop
Effectively calls c.pop_back()
It is not explicitly clear what is meant by c, but further down the page is a mention of Container::pop_back, so it is reasonable to infer that that is the next thing to look up. (Note that Container is the second template parameter.) You might have a difficulty here if you did not specify a second template parameter for your stack. In that case, back up to the documentation for std::stack to see what the default is.
std::stack
By default, if no container class is specified for a particular stack class instantiation, the standard container std::deque is used.
Aha! So we need to look up the pop_back() member of std::deque.
std::deque<T,Allocator>::pop_back
Calling pop_back on an empty container results in undefined behavior.
There's your answer: undefined behavior. Now you might be asking yourself what is undefined behavior in C++? In brief, undefined behavior allows your program's behavior to be whatever is convenient for the compiler. Technically, it allows any behavior whatsoever, but in practice, compilers just do whatever is convenient.
Related
In his excellent book "C++ Concurrency in Action" (2nd edition including C++17) Anthony Williams discusses the implementation of a thread-safe stack.
In the course of this, he proposes an adapter implementation of std::stack, which, among other things, would combine the calls of top() and pop() into one. The separation into 2 separate functions, however, was done for a reason in std::stack, namely to avoid losing data in the case that a potential copy made when returning the popped element to the caller throws an exception inside the copy constructor. When returning, the element will have already been popped off and is consequentially lost.
Instead of having a function T pop(), he proposes other variations of pop that would be able to remove the element off the stack and provide it to the caller in one operation, all of which come with their own problems, though.
The 1st alternative he proposes has the signature void pop(T&). The caller passes in a reference to a T and gets the popped off object that way. This way of doing it, however, comes with the problem that a T need be constructed prior to the call to pop, which might be an expensive operation, or it might not be possible to construct a T beforehand at all because necessary data might not be available yet at the time. Another problem the author mentions is that T might not be assignable, which would be required for this solution, though.
Now my question: Wouldn't all of the mentioned problems be solved if we passed a std::optional<T>& instead of a T&?
In that case, no instance of T would need to be constructed prior to the call to pop. Furthermore, assignability would not be required anymore either, since the object to be returned could be constructed into the std::optional<T> instance directly using its emplace function.
Am I missing something crucial here or am I right? If I am indeed right, I would be curious to know why this was not considered (for a good reason or just plainly an oversight?).
std::optional does solve all of the mentioned problems, and using it to control lifetime can be quite valuable, although it would appear a bit strange in
std::optional<T> o;
st.pop(o);
to have o always engaged.
That said, with a stupid scope-guard trick it's possible in C++17 to safely return T even without requiring no-throw-movability:
T pop() {
struct pop_guard {
C &c;
int u=std::uncaught_exceptions();
~pop_guard() {if(std::uncaught_exceptions()==u) c.pop_back();}
} pg{c};
return std::move(c.back());
}
(We could of course test for a throwing move and just move (perhaps twice) in its absence.)
However, what wasn't mentioned is that separate top and pop allows a T that isn't even movable, so long as the underlying container supports it. std::stack<std::mutex> works (with emplace, not push!) because std::deque doesn't require movability.
Undefined behaviour in C++ can be really hard to debug. Is there a version of C++ and standard library which does not contain any undefined behaviour but rather throws exceptions? I understand that this will be a performance killer, but I only intend to use this version when I am programming, debugging and compiling in debug mode and don't really care about performance. Ideally this version would be portable and you would be able to easily switch on/off the undefined behaviour checks.
For example, you could implement a safe pointer class like so (only check for null pointer, not actually if it points to a valid block of memory):
template <typename T>
class MySafePointer {
T* value;
public:
auto operator-> () {
#ifndef DEBUG_MODE
assert(value && "Trying to dereference a null pointer");
#endif
return value;
}
/* Other Stuff*/
};
Here the user only needs to #undef DEBUG_MODE if you want to get your performance back.
Is there a library / safe version of C++ which does this?
EDIT: Changed the code above so that it actually makes more sense and doesn't throw an exception but asserts value is non-null. The question is simply a matter of having a descriptive error message vs a crash...
Is there a version of c++ and standard library which does not contain any undefined behaviour but rather throws exceptions?
No, there is not. As mentioned in a comment, there are Address Sanitizer and Undefined Behavior Sanitizer and many other tools you can use to hunt for bugs, but there is no "C++ without undefined behavior" implementation.
If you want an inherently safe language, choose one. C++ isn't it.
Undefined behavior
Undefined behavior means that your program has ended up in a state the behavior of which is not defined by the standard.
So what you're really asking is if there's a language the standard of which defines every possible scenario.
And I can't think of one language like this, for the simple reason that programs are run by machines, but programming languages and standards and written by humans.
Is it always unintentional?
Per the reason explained above, the standard can have unintentional "holes", i.e. undefined behavior that was not intentionally allowed, and maybe not even noticed during standardization.
However, as all the "is undefined behavior" sentences in the standard prove, many times UB is intentionally allowed.
But why? Because that means giving less guarantees to the programmer, with the benefit of being able to make more optimizations or, equivalently, to not waste time verifying that the user is sticking to a defined contract.
So, even if the standard had no holes, there would still be a lot of cases where UB is stated to happen by the standard, because compilers can take advantage of it to make all sort of optmizations.²
The impact of preventing it in some trivial case
One trivial case of undefined behavior is when you access an out-of-bound element of a std::vector via operator[]. Exactly like for C-style arrays, v[i] basically gives you back *(v_ + i), where v_ is the pointer wrapped into v. This is fast and not safe.¹
What if you want to access the ith element safely? You would have to change the implementation of std::vector<>::operator[].
So what would the impact be of supporting the DEBUG_MODE flag? Essentially you would have to write two implementations separated by a #ifdef/(#else/)#endif. Obviously the two implementation can have a lot in common, so you could #-branch several times in the code. But... yeah, my bottom line is the your request can be fulfilled by changing the standard in such a way that it forces the implementers to support a two different implementations (safe and fast/unsafe and slow) for everything.
By the way, for this specific case, the standar does define another function, at, which is required to handle the out-of-bound case. But that's the point: it's another function.
Hypothetically, we could rip all undefined behavior out of C++ or even C. We could have everything be a priori well-defined and remove anything from the language whose evaluation could not be definitely determinable from first principles.
which makes me feel nervous about the answer I've given here.
(¹) This and other examples of UB are listed in this excellent article; search for Out of Bounds for the example I made.
(²) I really recommend reading this answer by Nicol Bolas about UB being absent in constexprs.
Is there a safe version of c++ without undefined behaviour?
No.
For example, you could implement a safe pointer class like so
How is throwing an exception safer than just crashing? You're still trying to find the bug so you can fix it statically, right?
What you wrote allows your buggy program to keep running (unless it just calls terminate, in which case you did some work for no result at all), but that doesn't make it correct, and it hides the error rather than helping you fix it.
Is there a library / safe version of C++ which does this?
Undefined behaviour is only one type of error, and it isn't always wrong. Deliberate use of non-portable platform features may also be undefined by the standard.
Anyway, let's say you catch every uninitialized value and null pointer and signed integer overflow - your program can still produce the wrong result.
If you write code that can't produce the wrong result, it won't have UB either.
I am aware that after using std::move the variable is still valid, but in an unspecified state.
Unfortunately, recently I have come across several bugs in our code base where a function was accessing the moved variable, and weird things were happening. These issues were extremely hard to track down.
Is there any compiler option (in clang) or any way to throw an error either during runtime or compilation?
Some things that may help :
Use a static analyzer. Xcode has it built-in.
https://clang-analyzer.llvm.org/
Use Address Sanitizer and Undefined Behaviour sanitizer
http://clang.llvm.org/docs/AddressSanitizer.html
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
Code changes that can make such bugs easy to track down:
I'm assuming that if you're using std::move on something, it is (not always) a heavy container.
If so, try to use std::unique_ptr<T> to create it. Calls to movers must explicitly use std::move, which is easy to spot. And other non-owning access functions can just work with .get(). You can also check for nullability and throw if it's nullptr at any point where you need to access it.
I am aware that after using std::move the variable is still valid, but in an unspecified state.
This is not a universal truth. More generally, the object that was moved from is in whatever state in which the move constructor / assignment operator left it.
The standard library does have the guarantee that you describe at minimum. But it is also possible to implement a member function for your class which doesn't abide by it, and leaves the moved from object in an invalid state. It is however a good design choice to implement move operations in the way you describe.
How to check if variable is still valid or std::move was used on it?
There is no way to do such check in general within the language.
Is there any compiler option (in clang) or any way to throw an error either during runtime or compilation?
Note that using a variable after a move is not necessarily a bug at all, but can instead be an entirely correct thing to do. Some types specify exactly the state of the moved from object (std::unique_ptr for example) and others which have the validity guarantee can be used in ways that have no pre-conditions (such as calling container.size()).
As such, using a moved from object is only a problem if that violates a pre-condition, which would result in undefined behaviour. Clang and other compilers have runtime sanitisers that may be able to catch some undefined behaviour. There are also many warning options and static analysers that diagnose cases where bugs are likely.
Using them is a very good idea, but you should not rely solely on them because they won't be able to find all bugs. The programmer still needs to be careful when writing the program, and needs to compare it with the rules of the language. Following common idioms such as RAII, avoiding bare owning pointers (and other resource handles) goes a long way in avoiding typical bugs.
By default, the "underlying container" of an std::stack is an std::deque. Therefore anything that is undefined behavior for a std::deque is undefined behavior for a std::stack. cppreference and other sites use the terminology "effectively" when describing the behavior of member functions. I take this to mean that it is for all intents and purposes. So therefore, calling top() and pop() is equivalent to calling back() and pop_back(), and calling these on an empty container is undefined behavior.
From my understanding, the reason why it's undefined behavior is to preserve the no-throw guarantee. My reasoning is that operator[] for std::vector has a no-throw guarantee and is undefined behavior if container size is greater than N, but at() has a strong guarantee, and throws std::out_of_range if n is out of bounds.
So my question is, what is the rationale behind some things having possibly undefined behavior and having a no throw guarantee versus having a strong guarantee but throwing an exception instead?
When undefined behaviour is allowed, it's usually for reasons of efficiency.
If the standard specified what has to happen when you access an array out of bounds, it would force the implementation to check whether the index is in bounds. Same goes for a vector, which is just a wrapper for a dynamic array.
In other cases the behaviour is allowed to be undefined in order to allow freedom in the implementation. But that, too, is really about efficiency (as some possible implementation strategies could be more efficient on some machines than on others, and C++ leaves it up to the implementer to pick the most efficient strategy, if they so desire.)
According to Herb Sutter one marked reason is efficiency. He states that the standard does not impose any requirements on operator[]'s exception specification or whether or not it requires bound checking. This is up to the implementation.
On the other hand, vector<T>::operator[]() is allowed, but not
required, to perform bounds checking. There's not a breath of wording
in the standard's specification for operator[]() that says anything
about bounds checking, but neither is there any requirement that it
have an exception specification, so your standard library implementer
is free to add bounds-checking to operator[](), too. So, if you use
operator[]() to ask for an element that's not in the vector, you're
on your own, and the standard makes no guarantees about what will
happen (although your standard library implementation's documentation
might) -- your program may crash immediately, the call to
operator[]() might throw an exception, or things may seem to work
and occasionally and/or mysteriously fail.
Given that bounds checking protects us against many common problems,
why isn't operator[]() required to perform bounds checking? The
short answer is: Efficiency. Always checking bounds would cause a
(possibly slight) performance overhead on all programs, even ones that
never violate bounds. The spirit of C++ includes the dictum that, by
and large, you shouldn't have to pay for what you don't use, and so
bounds checking isn't required for operator[](). In this case we
have an additional reason to want the efficiency: vectors are intended
to be used instead of built-in arrays, and so should be as efficient
as built-in arrays, which don't do bounds-checking. If you want to be
sure that bounds get checked, use at() instead.
If you're curious about the performance benefits, see these two questions:
::std::vector::at() vs operator[] << surprising results!! 5 to 10 times slower/faster!
vector::at vs. vector::operator[]
The consensus seems to be that operator[] is more efficient (since std::vector is just a wrapper around a dynamic array, operator[] should be just as efficient as if you would call it on an array.) And Herb Sutter seems to suggest that whether or not it is exception-safe is up to the compiler-vendor.
My C++ knowledge is somewhat piecemeal. I was reworking some code at work. I changed a function to return a reference to a type. Inside, I look up an object based on an identifier passed in, then return a reference to the object if found. Of course I ran into the issue of what to return if I don't find the object, and in looking around the web, many people claim that returning a "null reference" in C++ is impossible. Based on this advice, I tried the trick of returning a success/fail boolean, and making the object reference an out parameter. However, I ran into the roadblock of needing to initialize the references I would pass as actual parameters, and of course there is no way to do this. I retreated to the usual approach of just returning a pointer.
I asked a colleague about it. He uses the following trick quite often, which is accepted by both a recent version of the Sun compiler and by gcc:
MyType& someFunc(int id)
{
// successful case here:
// ...
// fail case:
return *static_cast<MyType*>(0);
}
// Use:
...
MyType& mt = somefunc(myIdNum);
if (&mt) // test for "null reference"
{
// whatever
}
...
I have been maintaining this code base for a while, but I find that I don't have as much time to look up the small details about the language as I would like. I've been digging through my reference book but the answer to this one eludes me.
Now, I had a C++ course a few years ago, and therein we emphasized that in C++ everything is types, so I try to keep that in mind when thinking things through. Deconstructing the expression: "static_cast<MyType>(0);", it indeed seems to me that we take a literal zero, cast it to a pointer to MyType (which makes it a null pointer), and then apply the dereferencing operator in the context of assigning to a reference type (the return type), which should give me a reference to the same object pointed to by the pointer. This sure looks like returning a null reference to me.
Any advice in explaining why this works (or why it shouldn't) would be greatly appreciated.
Thanks,
Chuck
This code doesn't work, though it may appear to work. This line dereferences a null pointer:
return *static_cast<MyType*>(0);
The zero, cast to a pointer type, results in a null pointer; this null pointer is then dereferenced using the unary-*.
Dereferencing a null pointer results in undefined behavior, so your program may do anything. In the example you describe, you get a "null reference" (or, it appears you get a null reference), but it would also be reasonable for your program to crash or for anything else to happen.
I agree with other posters that the behaviour of your example is undefined and really shouldn't be used. I offer some alternatives here. Each of them has pros and cons
If the object can't be found, throw an exception which is caught in the calling layer.
Create a globally accessible instance of MyType which is a simple shell object (i.e. static const MyType BAD_MYTYPE) and can be used to represent a bad object.
If it's likely that the object will not be found often then maybe pass the object in by reference as a parameter and return a bool or other error code indicating success / failure. If it can't find the object, you just don't assign it in the function.
Use pointers instead and check for 0 on return.
Use Boost smart pointers which allow for the validity of the returned object to be checked.
My personal preference would be for one of the first three.
That is undefined behavior. Because it is undefined behavior, it may "work" on your current compiler, but it could break if you ever upgrade/change your compiler.
From the C++03 spec:
8.3.2/4 ... A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
If you are married to that return-a-reference interface, then the Right Thing®
would be to throw an exception if you can't find an object for the given ID. At least that way your poor user can trap the condition with a catch.
If you go dereferencing a null pointer on them, they have no defined way to handle the error.
The line return *static_cast<MyType*>(0); is dereferencing the null pointer which causes undefined behaviour.
Well, you said it yourself in the title of your question. The code appears to work with a null reference.
When people say null references do not exist in C++, it does not mean that the compiler will generate an error message if you try to create one. As you've found out, there's nothing to stop you from creating a reference from a null pointer.
It simply means that the C++ standard does not specify what should happen if you do it. Null references are not possible because the C++ standard says that references must not be created from null pointers. (but doesn't say anything about generating a compile error if you try to do it.)
It is undefined behavior. In practice, because references are typically implemented as pointers, it usually seems to work if you try to create a reference from a null pointer. But there's no guarantee that it'll work. Or that it'll keep working tomorrow. It's not possible in C++ because if you do it, the behavior specified by C++ no longer applies. Your program might do anything
This might all sound a bit hypothetical, because hey, it seems to work just fine. But keep in mind that just because it works, and it makes sense that it works, when naively compiled, it may break when the compiler tries to apply some optimization or other. When the compiler sees a reference, it is guaranteed by the language rules that it is not null. So what happens if the compiler uses this assumption to speed up the code, and you go behind its back creating a "null reference"?
As others have already said, the code unfortunately only appears to work... however, more fundamentally, you are breaking a contract here.
In C++, a reference is meant to be an alias for an object. Based on the signature of your function I would never expect to be passed a 'NULL' reference, and I would certainly NOT test it before using it. As James McNellis said, creating a NULL reference is undefined behavior, however in most case this behavior only creeps in when you actually try to use it, and you are now exposing the users of your methods to nasty / tricky to nail down bugs.
I won't go any further on that issue, just points you toward Herb Sutter's pick on the issue.
Now for the solution to your problem.
The evident solution is of course a plain pointer. People expect a pointer to be possibly null, so they will test it when it's returned to them (and if they don't it's their damn fault for being lazy).
There are other alternatives, but they mainly boil down to having a special value indicating your failure and there is not much point using complicated designs just for the sake of it...
The last alternative is the use of exceptions here, however I am myself partial to the advice. Exceptions are meant for exceptional situations you see, and for a find/search feature it really depends on the expected result:
if you are implementing some internal factory where you register modules and then call them back later on, then not being able to retrieve one module would indicate an error in the program and as such being reported by an exception is fine
if you are implementing a search engine for a database of yours, thus dealing with user input, then not finding a result matching the input criteria is quite likely to occur, and thus I would not use exceptions in this circumstance, for it's not a programming error but a perfectly normal course
Note: other ways include
Boost.Optional, though I find it clumsy to wrap a reference with it
A shared pointer or weak pointer, to control/monitor the object lifetime in case it may be deleted while you still use it... monitoring does not work by itself in Multi-Threaded environment though
A sentinel value (usually declared static const), but it only works if your object has a meaningful "bad" or "null" value. It's certainly not the approach I would recommend since once again you give an object but it blows up in the users hand if they do anything with it
As others have mentioned, your code is erroneous since it dereferences a null pointer.
Secondly, you are using the reference return type incorrectly, returning a reference in your example is usually not good in C/C++ since it violates the memory management model of the language where all objects are referenced to by pointers to a memory address. If you rewrite C/C++ code that was written using pointers into code that uses references you will end up with these problems. The code using pointers could return a NULL pointer without causing problems, but when returning a reference you must return something that can be cast into a 0 or false statement.
Most important of all is the pattern where your erroneous code only gets executed in the case of an exception, in this example your "fail case". Incorrect error handling, error logging with bugs etc are the most disastreous bugs in computer systems since they never show up in the happy case but always causes a system breakdown when something doesnt follow the normal flow.
The only way to ensure that your code is correct is to have testcases with 100% coverage, that means also testing the error handling, which in your example probably would cause a segmentation fault on your program.