The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?
Related
Does the following program have undefined behavior in C++17 and later?
struct A {
void f(int) { /* Assume there is no access to *this here */ }
};
int main() {
auto a = new A;
a->f((a->~A(), 0));
}
C++17 guarantees that a->f is evaluated to the member function of the A object before the call's argument is evaluated. Therefore the indirection from -> is well-defined. But before the function call is entered, the argument is evaluated and ends the lifetime of the A object (see however the edits below). Does the call still have undefined behavior? Is it possible to call a member function of an object outside its lifetime in this way?
The value category of a->f is prvalue by [expr.ref]/6.3.2 and [basic.life]/7 does only disallow non-static member function calls on glvalues referring to the after-lifetime object. Does this imply the call is valid? (Edit: As discussed in the comments I am likely misunderstanding [basic.life]/7 and it probably does apply here.)
Does the answer change if I replace the destructor call a->~A() with delete a or new(a) A (with #include<new>)?
Some elaborating edits and clarifications on my question:
If I were to separate the member function call and the destructor/delete/placement-new into two statements, I think the answers are clear:
a->A(); a->f(0): UB, because of non-static member call on a outside its lifetime. (see edit below, though)
delete a; a->f(0): same as above
new(a) A; a->f(0): well-defined, call on the new object
However in all these cases a->f is sequenced after the first respective statement, while this order is reversed in my initial example. My question is whether this reversal does allow for the answers to change?
For standards before C++17, I initially thought that all three cases cause undefined behavior, already because the evaluation of a->f depends on the value of a, but is unsequenced relative to the evaluation of the argument which causes a side-effect on a. However, this is undefined behavior only if there is an actual side-effect on a scalar value, e.g. writing to a scalar object. However, no scalar object is written to because A is trivial and therefore I would also be interested in what constraint exactly is violated in the case of standards before C++17, as well. In particular, the case with placement-new seems unclear to me now.
I just realized that the wording about the lifetime of objects changed between C++17 and the current draft. In n4659 (C++17 draft) [basic.life]/1 says:
The lifetime of an object o of type T ends when:
if T is a class
type with a non-trivial destructor (15.4), the destructor call starts
[...]
while the current draft says:
The lifetime of an object o of type T ends when:
[...]
if T is a class type, the destructor call starts, or
[...]
Therefore, I suppose my example does have well-defined behavior in C++17, but not he current (C++20) draft, because the destructor call is trivial and the lifetime of the A object isn't ended. I would appreciate clarification on that as well. My original question does still stands even for C++17 for the case of replacing the destructor call with delete or placement-new expression.
If f accesses *this in its body, then there may be undefined behavior for the cases of destructor call and delete expression, however in this question I want to focus on whether the call in itself is valid or not.
Note however that the variation of my question with placement-new would potentially not have an issue with member access in f, depending on whether the call itself is undefined behavior or not. But in that case there might be a follow-up question especially for the case of placement-new because it is unclear to me, whether this in the function would then always automatically refer to the new object or whether it might need to potentially be std::laundered (depending on what members A has).
While A does have a trivial destructor, the more interesting case is probably where it has some side effect about which the compiler may want to make assumptions for optimization purposes. (I don't know whether any compiler uses something like this.) Therefore, I welcome answers for the case where A has a non-trivial destructor as well, especially if the answer differs between the two cases.
Also, from a practical perspective, a trivial destructor call probably does not affect the generated code and (unlikely?) optimizations based on undefined behavior assumptions aside, all code examples will most likely generate code that runs as expected on most compilers. I am more interested in the theoretical, rather than this practical perspective.
This question intends to get a better understanding of the details of the language. I do not encourage anyone to write code like that.
It’s true that trivial destructors do nothing at all, not even end the lifetime of the object, prior to (the plans for) C++20. So the question is, er, trivial unless we suppose a non-trivial destructor or something stronger like delete.
In that case, C++17’s ordering doesn’t help: the call (not the class member access) uses a pointer to the object (to initialize this), in violation of the rules for out-of-lifetime pointers.
Side note: if just one order were undefined, so would be the “unspecified order” prior to C++17: if any of the possibilities for unspecified behavior are undefined behavior, the behavior is undefined. (How would you tell the well-defined option was chosen? The undefined one could emulate it and then release the nasal demons.)
The postfix expression a->f is sequenced before the evaluation of any arguments (which are indeterminately sequenced relative to one another). (See [expr.call])
The evaluation of the arguments is sequenced before the body of the function (even inline functions, see [intro.execution])
The implication, then is that calling the function itself is not undefined behavior. However, accessing any member variables or calling other member functions within would be UB per [basic.life].
So the conclusion is that this specific instance is safe per the wording, but a dangerous technique in general.
You seem to assume that a->f(0) has these steps (in that order for most recent C++ standard, in some logical order for previous versions):
evaluating *a
evaluating a->f (a so called bound member function)
evaluating 0
calling the bound member function a->f on the argument list (0)
But a->f doesn't have either a value or type. It's essentially a non-thing, a meaningless syntax element needed only because the grammar decomposes member access and function call, even on a member function call which by define combines member access and function call.
So asking when a->f is "evaluated" is a meaningless question: there is no such thing as a distinct evaluation step for the a->f value-less, type-less expression.
So any reasoning based on such discussions of order of evaluation of non entity is also void and null.
EDIT:
Actually this is worse than what I wrote, the expression a->f has a phony "type":
E1.E2 is “function of parameter-type-list cv returning T”.
"function of parameter-type-list cv" isn't even something that would be a valid declarator outside a class: one cannot have f() const as a declarator as in a global declaration:
int ::f() const; // meaningless
And inside a class f() const doesn't mean "function of parameter-type-list=() with cv=const”, it means member-function (of parameter-type-list=() with cv=const). There is no proper declarator for proper "function of parameter-type-list cv". It can only exist inside a class; there is no type "function of parameter-type-list cv returning T" that can be declared or that real computable expressions can have.
In addition to what others said:
a->~A(); delete a;
This program has a memory leak which itself is technically not undefined behavior.
However, if you called delete a; to prevent it - that should have been undefined behavior because delete would call a->~A() second time [Section 12.4/14].
a->~A()
Otherwise in reality this is as others suggested - compiler generates machine code along the lines of A* a = malloc(sizeof(A)); a->A(); a->~A(); a->f(0);.
Since no member variables or virtuals all three member functions are empty ({return;}) and do nothing. Pointer a even still points to valid memory.
It will run but debugger may complain of memory leak.
However, using any nonstatic member variables inside f() could have been undefined behavior because you are accessing them after they are (implicitly) destroyed by compiler-generated ~A(). That would likely result in a runtime error if it was something like std::string or std::vector.
delete a
If you replaced a->~A() with expression that invoked delete a; instead then I believe this would have been undefined behavior because pointer a is no longer valid at that point.
Despite that, the code should still run without errors because function f() is empty. If it accessed any member variables it may have crashed or led to random results because the memory for a is deallocated.
new(a) A
auto a = new A; new(a) A; is itself undefined behavior because you are calling A() a second time for the same memory.
In that case calling f() by itself would be valid because a exists but constructing a twice is UB.
It will run fine if A does not contain any objects with constructors allocating memory and such. Otherwise it could lead to memory leaks, etc, but f() would access the "second" copy of them just fine.
I'm not a language lawyer but I took your code snippet and modified it slightly. I wouldn't use this in production code but this seems to produce valid defined results...
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g() { std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g()));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm running Visual Studio 2017 CE with compiler language flag set to /std:c++latest and my IDE's version is 15.9.16 and I get the follow console output and exit program status:
console output
5
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
So this does seem to be defined in the case of Visual Studio, I'm not sure how other compilers will treat this. The destructor is being invoked, however the variable a is still in dynamic heap memory.
Let's try another slight modification:
#include <iostream>
#include <exception>
struct A {
int x{5};
void f(int){}
int g(int y) { x+=y; std::cout << x << '\n'; return x; }
};
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
catch(const std::exception& e) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
This time let's not change the class anymore, but let's make call on a's member afterwards...
int main() {
try {
auto a = new A;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
console output
8
10
IDE exit status output
The program '[4128] Test.exe' has exited with code 0 (0x0).
Here it appears that a.x is maintaining its value after a->~A() is called since new was called on A and delete has not yet been called.
Even more if I remove the new and use a stack pointer instead of allocated dynamic heap memory:
int main() {
try {
A b;
A* a = &b;
a->f((a->~A(), a->g(3)));
a->g(2);
} catch( const std::exception& e ) {
std::cerr << e.what();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
I'm still getting:
console output
8
10
IDE exit status output
When I change my compiler's language flag setting from /c:std:c++latest to /std:c++17 I'm getting the same exact results.
What I'm seeing from Visual Studio it appears to be well defined without producing any UB within the contexts of what I've shown. However as from a language perspective when it concerns the standard I wouldn't rely on this type of code either. The above also doesn't consider when the class has internal pointers both stack-automatic storage as well as dynamic-heap allocation and if the constructor calls new on those internal objects and the destructor calls delete on them.
There are also a bunch of other factors than just the language setting for the compiler such as optimizations, convention calling, and other various compiler flags. It is hard to say and I don't have an available copy of the full latest drafted standard to investigate this any deeper. Maybe this can help you, others who are able to answer your question more thoroughly, and other readers to visualize this type of behavior in action.
It is, as far as I have known, been a good rule that a pointer like argument type to a function should be a pointer if the argument can sensible be null and it should be a reference if the argument should never be null.
Based on that "rule", I have naiively expected that doing something like
someMethodTakingAnIntReference(*aNullPointer) would fail when trying to make the call, but to my surprise the following code is running just fine which kinda makes "the rule" less usable. A developer can still read meaning from the argument type being reference, but the compiler doesn't help and the location of the runtime error does not either.
Am I misunderstanding the point of this rule, or is this undefined behavior, or...?
int test(int& something1, int& something2)
{
return something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs fine returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
While the above works, obviously the following does not, but the same would be true if the references were just pointers; meaning that the rule and the compiler is not really helping.
int test(int& something1, int& something2)
{
return something1+something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
Writing test(*i1, *i2) causes undefined behaviour; specifically the part *i1. This is covered in the C++ Standard by [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
This defines the behaviour of *X only for the case where X points to an object or function. Since i1 does not point to an object or function, the standard does not define the behaviour of *i1, therefore it is undefined behaviour. (This is sometimes known as "undefined by omission", and this same practice handles many other uses of lvalues that don't designate objects).
As described in the linked page, undefined behaviour does not necessitate any sort of diagnostic message. The runtime behaviour could literally be anything. The compiler could, but is not required to, generate a compilation warning or error. In general, it's up to the programmer to comply with the rules of the language. The compiler helps out to some extent but it cannot cover all cases.
You're better off thinking of references as little more than a handy notation for pointers.
They are still pointers, and the runtime error occurs when you use (dereference) a null pointer, not when you pass it to a function.
(An added advantage of references is that they can not be changed to reference something else, once initialized.)
int main()
{
const int maxint=100;//The program will crash if this line is put outside the main
int &msg=const_cast<int&>(maxint);
msg=200;
cout<<"max:"<<msg<<endl;
return 0;
}
The function will run ok if the 'const int maxint=100;' definition is put inside the main function but crash and popup a error message said "Access Violation" if put outside.
Someone says it's some kind of 'undefined behavior', and i want to know the exact answer and how i can use the const cast safely?
They are correct, it is undefined behaviour. You're not allowed to modify the value of a const variable, which is the danger of casting away the constness of something: you better know it's not really const.
The compiler, seeing that maxint is const and should never be modified, doesn't even have to give it an address. It can just replace all the uses of maxint with 100 if it sees fit. Also it might just put the constant in to a portion of memory that is read-only, as Matteo Italia points out, which is probably what's happening for you. That's why modifying it produces undefined behaviour.
The only way you can safely cast away the constness of a variable is if the variable is not actually const, but the const qualifier was added to a non-const variable, like this:
int f(const int& x) {
int& nonconst = const_cast<int&>(x);
++nonconst;
}
int blah = 523;
f(blah); // this is safe
const int constblah = 123;
f(constblah); // this is undefined behaviour
Think about this example, which compiles perfectly:
int f(const int& x) {
int& nonconst = const_cast<int&>(x);
++nonconst;
}
int main() {
f(4); // incrementing a number literal???
}
You can see how using const_cast is pretty dangerous because there's no way to actually tell whether a variable is originally const or not. You should avoid using const_cast when possible (with functions, by just not accepting const parameters).
Modifying an object that is const (with the exception of mutable members) results in undefined behavior (from the C++03 standard):
7.1.5.1/4 "The cv-qualifiers"
Except that any class member declared mutable (7.1.1) can be modified,
any attempt to modify a const object during its lifetime (3.8) results
in undefined behavior.
The above undefined behavior is specifically called out in the standard's section on const_cast:
5.2.11/7 "Const cast"
[Note: Depending on the type of the object, a write operation through
the pointer, lvalue or pointer to data member resulting from a
const_cast that casts away a const-qualifier68) may produce undefined
behavior (7.1.5.1). ]
So, if you have a const pointer or reference to an object that isn't actually const, you're allowed to write to that object (by casting away the constness), but not if the object really is const.
The compiler is permitted to place const objects in read-only storage, for example. It doesn't have to though, and apparently doesn't for your test code that doesn't crash.
You are only allowed to cast away constness of an object which is known not to be const. For example, an interface may pass objects through using a const pointer or a const reference but you passed in an object which isn't const and want/need to modify it. In this case it may be right thing to cast away constness.
On the other hand, casting away constness of an object which was const all the way can land you in deep trouble: when accessing this object, in particular when writing to it, the system may cause all kinds of strange things: the behavior is not defined (by the C++ standard) and on a particular system it may cause e.g. an access violation (because the address of the object is arranged to be in a read-only area).
Note that despite another response I saw const objects need to get an address assigned if the address is ever taken and used in some way. In your code the const_cast<int&>(maxint) expression essentially obtains the address of your constant int which is apparently stored in a memory area which is marked to be read-only. The interesting aspect of your code snippet is that it is like to apparently work, especially when turning on optimization: the code is simple enough that the compiler can tell that the changed location isn't really used and doesn't actually attempt to change the memory location! In this case, no access violation is reported. This is what apparently is the case when the constant is declared inside the function (although the constant may also be located on the stack which typically can't be marked as read-only). Another potential outcome of your code is (independent of whether the constant is declared inside the function or not) is that is actually changed and sometimes read as 100 and in other contexts (which in some way or another involve the address) as 200.
I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.
The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?