Why downcast and then assign to base-class in C++? - c++

I have stumbled upon the following code structure and I'm wondering whether this is intentional or just poor understanding of casting mechanisms:
struct AbstractBase{
virtual void doThis(){
//Basic implementation here.
};
virtual void doThat()=0;
};
struct DerivedA: public AbstractBase{
virtual void doThis(){
//Other implementation here.
};
virtual void doThat(){
// some stuff here.
};
};
// More derived classes with similar structure....
// Dubious stuff happening here:
void strangeStuff(AbstractBase* pAbstract, int switcher){
AbstractBase* a = NULL;
switch(switcher){
case TYPE_DERIVED_A:
// why would someone use the abstract base pointer here???
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
a->doThat();
break;
// similar case statement with other derived classes...
}
}
// "main"
DerivedA* pDerivedA = new DerivedA;
strangeStuff( pDerivedA, TYPE_DERIVED_A );
My guess is, that this dynamic_cast statement is just the result of poor understanding and very bad programming style in general (the whole way the code works, just feels painful to me) and that it doesn't cause any change in behaviour for this specific use case.
However, since I'm not an expert on casting, I'd like to know whether there are any subtle side-effects that I'm not aware of.

Blockquote [C++11: 5.2.7/9]: The value of a failed cast to pointer type is the null pointer value of the required result type.
The dynamic_cast can return NULL if the type was wrong, making the following lines crash. Hence, this can be either 1. an attempt to make (logical) errors more explicit, or 2. some sort of in-code documentation.
So while it doesn't look like the best design, it is not exactly true that the cast has no effect whatsoever.

My guess would be that the coder screwed up.
A second guess would be that you skipped a check for a being null in your simplification.
A third, and highly unlikely possibility, is that the coder was exploiting undefined behavior to optimize.
With this code:
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
if a is not of type DerivedA* (or more derived), the a->doThis() is undefined behavior. And if it is of type DerivedA*, then the dynamic_cast does absolutely nothing (guaranteed).
A compiler can, in theory, optimize out any other possibility away, even if you did not change the type of a, and remain conforming behavior. Even if someone later checks if a is null, the execution of undefined behavior on the very next line means that the compiler is free not to set a to null on the dynamic_cast line.
I would doubt that a given compiler would do this, but I could be wrong.
There are compilers that detect certain paths cause undefined behavior (in the future), eliminate such possibilities from happening backwards in execution to the point where the undefined behavior would have been set in motion, and then "know" that the code in question cannot be in the state that would trigger undefined behavior. It can then use this knowledge to optimize the code in question.
Here is an example:
std::string foo( unsigned int x ) {
std::string r;
if (x == (unsigned)-1)) {
r = "hello ";
}
int y = x;
std::stringstream ss;
ss << y;
r += ss.str();
return r;
}
The compiler can see the y=x line above. If x would overflow an int, then the conversion y=x is undefined behavior. It happens regardless of the result of the first branch.
In short, if the first branch runs, undefined behavior would result. And undefined behavior can do anything, including time travel -- it can go back in time and prevent that branch from being taken.
So the
if (x == (unsigned)-1)) {
r = "hello ";
}
branch can be eliminated by the optimizer, legally in C++.
While the above is just a toy case, gcc does optimizations very much like this. There is a flag to tell it not do.
(unsigned -1 is defined behavior, but overflowing an int is not, in C++. In practice, this is because there are platforms in which signed int overflow causes problems, and C++ doesn't want to impose extra costs on them to make a conforming implementation.)

dynamic_cast will confirm that the dynamic type does match the type indicated by the switcher variable, making the code slightly less dangerous. However, it will give a null pointer in the case of a mismatch, and the code neglects to check for that.
But it seems more likely that the author didn't really understand the use of virtual functions (for uniform treatment of polymorphic types) and RTTI (for the rarer cases where you need to distinguish between types), and attempted to invent their own form of manual, error-prone type identification.

Related

Accessing an array out of bounds, but returning earlier - UB?

I have code that calculates an array index, and if it is valid accesses that array item. Something like:
int b = rowCount() - 1;
if (b == -1) return;
const BlockInfo& bi = blockInfo[b];
I am worried that this might be triggering undefined behavior. For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result? Does it change if blockInfo is not an actual array, but an container like a vector? If this is unsafe, could I fix it by putting the access in an else clause?
if (b == -1) {
return;
} else {
const BlockInfo& bi = blockInfo[b];
}
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
For clarification: My concern is specifically because of a different issue, where you intend to test whether a pointer is non-null before accessing it. The compiler turns this around and reasons that, since you are dereferencing it, it cannot have been null! Something like this (untested):
void someFunc(struct MyStruct *s) {
if (s != NULL) {
cout << s->someField << endl;
delete s;
}
}
I recall hearing that simply forming an out-of-bounds array access is UB in C++. Thus the compiler could legally assume the array index is not out of bounds, and remove checks to the contrary.
There is no access to blockInfo[-1] in your program. Your code specifically prohibits that.
For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.
No, it cannot do that, precisely because an access to index -1 (or, rather, (std::size_t)-1) may or may not be a valid index. The language does let you pass -1 as an index; it'll just be converted first to a std::size_t with the well-defined unsigned wrap-around logic that comes with doing so. So there is not, and cannot be, any rule whereby the compiler is permitted to assume that you will never pass int -1 as an index.
Even if there were, it'd still make no sense to let the compiler completely ignore the if statement. If it could, if our if statements were not reliable, every program in the world would be unsafe! There'd be no way to enforce any of your operations' preconditions.
The compiler may only skip or re-order things when it can prove that doing so results in a well-defined program with the same behaviour as your original instructions, given any possible input.
In fact, this is where UB comes from: where proving correctness is really difficult, that's usually where the standard throws compilers a bone and says something is "undefined" and the compiler can just do whatever it likes.
One interesting example of this is kind of the opposite of your case, where a check is [erroneously] placed after the access, and the compiler therefore assumes the check passes, whether it actually did or not:
void foo(char* ptr)
{
char x = *ptr;
if (ptr)
bar();
else
baz();
}
The function foo may call bar() even if ptr is null! That might sound unlikely to you, but it actually does happen (e.g. this crash in a widely-used library).
could I fix it by putting the access in an else clause?
Those two pieces of code are semantically equivalent; it's the same program.
Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?
The compiler already does the obvious thing, as long as "obvious" is "according to the C++ standard".
the compiler might assume
If the compiler proceeds from a wrong assumption, then it's wrong and defective.
Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result?
It is never safe to access an array out of bounds, because that produces UB before you have a chance to use or not-use the result. However, an untaken branch in the code doesn't count as an access, as in your first or second examples. So, if I understand your last question, there's no need for a special flag.

Function returning reference to local variable [duplicate]

The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?

Segmentation fault during a function call from a non-reference std::vector parameter [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
At a certain point in my program, I am getting a segmentation fault at a line like the one in g() below:
// a data member type in MyType_t below
enum class DataType {
BYTE, SHORT, INT, VCHAR, FLOAT, DOUBLE, BOOL, UNKNOWN
};
// a data member type in MyType_t below
enum class TriStateBool {
crNULL, crTRUE, crFALSE
};
// this is the underlying type for std::vector used when
// calling function f() from g(), which causes the segmentation fault
struct MyType_t {
DataType Type;
bool isNull;
std::string sVal;
union {
TriStateBool bVal;
unsigned int uVal;
int nVal;
double dVal;
};
// ctors
explicit MyType_t(DataType type = DataType::UNKNOWN) :
Type{type}, isNull{true}, dVal{0}
{ }
explicit MyType_t(std::string sVal_) :
Type{DataType::VCHAR}, isNull{sVal.empty()}, sVal{sVal_}, dVal{0}
{ }
explicit MyType_t(const char* psz) :
Type{DataType::VCHAR}, isNull{psz ? true : false}, sVal{psz}, dVal{0}
{ }
explicit MyType_t(TriStateBool bVal_) :
Type{DataType::BOOL}, isNull{false}, bVal{bVal_}
{ }
explicit MyType_t(bool bVal_) :
Type{DataType::BOOL}, isNull{false}, bVal{bVal_ ? TriStateBool::crTRUE : TriStateBool::crFALSE}
{ }
MyType_t(double dVal_, DataType type, bool bAllowTruncate = false) {
// sets data members in a switch-block, no function calls
//...
}
};
void f(std::vector<MyType_t> v) { /* intentionally empty */ } // parameter by-val
void g(const std::vector<MyType_t>& v) {
//...
f(v); // <-- this line raises segmentation fault!!!
// (works fine if parameter of f() is by-ref instead of by-val)
//...
}
When I inspect the debugger, I see that it originates from the sVal data member of MyType_t.
The code above does not reproduce the problem, so I don't expect any specific answers. I do however appreciate suggestions to get me closer to finding the source of it.
Here is a bit of context:
I have a logical expression parser. The expressions can contain arithmetical expressions, late-bind variables and function calls. Given such an expression, parser creates a tree with nodes of several types. Parse stage is always successful. I have the problem in evaluation stage.
Evaluation stage is destructive to the tree structure, so the its nodes are "restored" using a backup copy before each evaluation. 1st evaluation is also always successful. I'm having the problem in the following evaluation, with some certain expressions. Even then, the error is not consistent.
Please assume that I have no memory leaks and double releases. (I am using an global new() overload to track allocations/deallocations.)
Any ideas on how I should tackle this?
f(v); // <-- this line raises segmentation fault!!!
There are only two ways this line could trigger SIGSEGV:
You've run out of stack (this is somewhat common for deeply recursive procedures). You can do ulimit -s unlimited, and see if the problem goes away.
You have corrupted v and so its copy constructor triggers SIGSEGV.
Please assume that I have no memory leaks and double releases.
There are many more ways to corrupt heap (e.g. by overflowing heap block) than double release. It is very likely that (if you do not have stack overflow) valgrind or AddressSanitizer will point you straight at the problem.
When I inspect the debugger, I see that it [the segfault] originates from the sVal data member of MyType_t.
Well, now that you've actually added most of the relevant code, we see why asking for help with no/insufficient code, because you just know the rest is A-OK, is a terrible idea:
Problem 1
explicit MyType_t(std::string sVal_) :
Type{DataType::VCHAR}, isNull{sVal.empty()}, sVal{sVal_}, dVal{0}
{ }
AWOOGA! Your initialiser for isNull is checking whether the member, sVal, is empty(), before you have initialised sVal from the parameter, sVal_. (N.B. Although they're the same in this case, the real order is that of declaration, not of initialisation.) You might think isNull will always be set true as it's always checking the initially default-constructed, empty member sVal, rather than the input sVal_...
But it's worse than that! The member isn't default-constructed, because the compiler knows that's pointless as it's about to be copy-constructed. So, at the point of calling empty(), you are acting on an uninitialised variable, and that's undefined behaviour.
(Aside: This vindicates my preferred naming convention, m_sVal. It's much more difficult to accidentally type 2 characters than to forget a trailing underscore. But if anything, people normally include the trailing underscore in the member, not the argument. Anyway!)
But wait!
Problem 2
While your first constructor for the VCHAR variant of MyType_t guarantees UB, your second strongly invites it:
explicit MyType_t(const char* psz) :
Type{DataType::VCHAR}, isNull{psz ? true : false}, sVal{psz}, dVal{0}
{ }
In this version, you allow the possibility that sVal will be constructed from a null pointer. But let's have a look at http://en.cppreference.com/w/cpp/string/basic_string/basic_string - where the std::string constructors taking char const *, (4) and (5) with the latter being the one called here, are annotated as follows:
5) Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. The behavior is undefined if s does not point at an array of at least Traits::length(s)+1 elements of CharT, including the case when s is a null pointer.
This is formalised in the C++11 Standard at 21.4.2.9: https://stackoverflow.com/a/10771938/2757035
It's worth noting that GCC's libstdc++ as used by g++ does throw an exception when passed a nullptr: "what(): basic_string::_S_construct null not valid". I might infer from your mention of gdb that you're using g++ - in which case, this probably isn't your problem, unless you're using an older version than me. Still, the mere possibility should still be avoided. So, guard your initialisation as follows:
sVal{psz ? psz : ""}
Result
From this point on, due to UB, anything can happen - at any point in your program. But a common 'default behaviour' for UB is a segfault.
With the constructor taking std::string, even if the invalid call to .empty() completes, isNull won't just get a 'random' value based on the uninitialised .size(): rather, because it's constructed from UB, isNull is undefined, and tests of it might be removed or inlined as some arbitrary constant, potentially leading to wrong code paths being taken, or right ones not. So although sVal gets a valid value eventually, isNull doesn't.
With the constructor taking char const *, your member std::string sVal itself will be followed everywhere by UB if the passed pointer is null. If so, isNull would be OK, but any attempts to use sVal would be undefined.
In both cases, any number of unknowable things might happen, because UB is involved. If you want to know exactly what's happened in this case, disassemble your program.
The reason for the segfault not occurring when the vector is passed by reference is nebulous and of little discursive value; in either case, you're reading or copy-constructing from MyType_t elements whose construction involved UB in one way or another.
Solution
The point is that you need to fix both of these erroneous, UB-generating pieces of code and determine whether it resolves the error. It's very likely that it will because what the first and possibly second constructor are currently doing is so UB that, in a practical sense, your program nearly guaranteed to crash somewhere.
You must now comb over your program for any such coding errors and eliminate every single one, or they will catch you out eventually, "on a long enough timeline".

const casting an int in a class vs outside a class

I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.

Why do compilers give a warning about returning a reference to a local stack variable if it is undefined behaviour?

The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?