I have bad function that returns reference to dead string:
std::string & ffff()
{
std::string j = "12346";
return j;
}
And if I call std::string ii = ffff(); I have crash. That is correct and understandable.
But now I have function that returns reference to int:
int & ff()
{
int g = 1;
return g;
}
And I can't understand why I have no crash when I try assign reference of "dead" int g to int i variable.
int i = ff();
The C++ standard does not contain a notion of "crash". There is no standard-conforming, deterministic mechanism by which you can "cause a crash".
The behaviour of your program is undefined, so the language standard does not describe what will and should happen.
As already has been said, you are in the land of "undefined behaviour" and you should expect that: behaviour that may not be "correct and understandable".
If you want to understand what is the compiler doing, then look at the assembler generated. There you will be able to see what is happening, and why there is a crash.
My guess (although I may be wrong) is that the string is destroyed when exiting the function, so it is clearly a "dead string" (using your own terms, not a very rigorous one). On the other hand, returning a reference of an int... the int is not "destroyed" as it is not an object. The compiler probably does a pass by value to simplify things --you return a reference to immediately copy it? the compiler should see through that. Even if you encapsulated it in an object, the memory of an int is constant so it should still be there and probably available even when destroyed.
Anyway, I see that you have been downvoted, as it is not a very interesting question by itself. You are asking why something doesn't crash while using bad programming practices, and this is not constructive --I will not downvote, as I also am a genuinely curios programmer ;)
Related
The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?
I have stumbled upon the following code structure and I'm wondering whether this is intentional or just poor understanding of casting mechanisms:
struct AbstractBase{
virtual void doThis(){
//Basic implementation here.
};
virtual void doThat()=0;
};
struct DerivedA: public AbstractBase{
virtual void doThis(){
//Other implementation here.
};
virtual void doThat(){
// some stuff here.
};
};
// More derived classes with similar structure....
// Dubious stuff happening here:
void strangeStuff(AbstractBase* pAbstract, int switcher){
AbstractBase* a = NULL;
switch(switcher){
case TYPE_DERIVED_A:
// why would someone use the abstract base pointer here???
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
a->doThat();
break;
// similar case statement with other derived classes...
}
}
// "main"
DerivedA* pDerivedA = new DerivedA;
strangeStuff( pDerivedA, TYPE_DERIVED_A );
My guess is, that this dynamic_cast statement is just the result of poor understanding and very bad programming style in general (the whole way the code works, just feels painful to me) and that it doesn't cause any change in behaviour for this specific use case.
However, since I'm not an expert on casting, I'd like to know whether there are any subtle side-effects that I'm not aware of.
Blockquote [C++11: 5.2.7/9]: The value of a failed cast to pointer type is the null pointer value of the required result type.
The dynamic_cast can return NULL if the type was wrong, making the following lines crash. Hence, this can be either 1. an attempt to make (logical) errors more explicit, or 2. some sort of in-code documentation.
So while it doesn't look like the best design, it is not exactly true that the cast has no effect whatsoever.
My guess would be that the coder screwed up.
A second guess would be that you skipped a check for a being null in your simplification.
A third, and highly unlikely possibility, is that the coder was exploiting undefined behavior to optimize.
With this code:
a = dynamic_cast<DerivedA*>(pAbstract);
a->doThis();
if a is not of type DerivedA* (or more derived), the a->doThis() is undefined behavior. And if it is of type DerivedA*, then the dynamic_cast does absolutely nothing (guaranteed).
A compiler can, in theory, optimize out any other possibility away, even if you did not change the type of a, and remain conforming behavior. Even if someone later checks if a is null, the execution of undefined behavior on the very next line means that the compiler is free not to set a to null on the dynamic_cast line.
I would doubt that a given compiler would do this, but I could be wrong.
There are compilers that detect certain paths cause undefined behavior (in the future), eliminate such possibilities from happening backwards in execution to the point where the undefined behavior would have been set in motion, and then "know" that the code in question cannot be in the state that would trigger undefined behavior. It can then use this knowledge to optimize the code in question.
Here is an example:
std::string foo( unsigned int x ) {
std::string r;
if (x == (unsigned)-1)) {
r = "hello ";
}
int y = x;
std::stringstream ss;
ss << y;
r += ss.str();
return r;
}
The compiler can see the y=x line above. If x would overflow an int, then the conversion y=x is undefined behavior. It happens regardless of the result of the first branch.
In short, if the first branch runs, undefined behavior would result. And undefined behavior can do anything, including time travel -- it can go back in time and prevent that branch from being taken.
So the
if (x == (unsigned)-1)) {
r = "hello ";
}
branch can be eliminated by the optimizer, legally in C++.
While the above is just a toy case, gcc does optimizations very much like this. There is a flag to tell it not do.
(unsigned -1 is defined behavior, but overflowing an int is not, in C++. In practice, this is because there are platforms in which signed int overflow causes problems, and C++ doesn't want to impose extra costs on them to make a conforming implementation.)
dynamic_cast will confirm that the dynamic type does match the type indicated by the switcher variable, making the code slightly less dangerous. However, it will give a null pointer in the case of a mismatch, and the code neglects to check for that.
But it seems more likely that the author didn't really understand the use of virtual functions (for uniform treatment of polymorphic types) and RTTI (for the rarer cases where you need to distinguish between types), and attempted to invent their own form of manual, error-prone type identification.
int func() {
int a;
++a; // is this safe?
printf("%d\n", a);
}
I know when I printf a I get undefined behavior, but is ++a safe in C++ standard? Will this assign "another" uninitialized value to a without side effects (throwing exceptions or crashing the program)?
Using an uninitialized variable in anyways gives you Undefined behavior. So,
No incrementing an uninitialized int is not safe in C++.
Your program might not crash but it is certainly not safe. You should always initialize your variables. The worst that can happen is your program will appear to work but will crash at random times without you knowing the cause or simply behave in a strange way.
What do you mean by "undefined behaviour"? In my opinion your program should print some int without any exceptions and so on.
++(uninitizalized int) is absolutely legal, I guess. It will just increment the current value, no matter whether the var was initialized or not.
But anyway, uninitialized vars are EVIL.
I read on the wikipedia page for Null_pointer that Bjarne Stroustrup suggested defining NULL as
const int NULL = 0;
if "you feel you must define NULL." I instantly thought, hey.. wait a minute, what about const_cast?
After some experimenting, I found that
int main() {
const int MyNull = 0;
const int* ToNull = &MyNull;
int* myptr = const_cast<int*>(ToNull);
*myptr = 5;
printf("MyNull is %d\n", MyNull);
return 0;
}
would print "MyNull is 0", but if I make the const int belong to a class:
class test {
public:
test() : p(0) { }
const int p;
};
int main() {
test t;
const int* pptr = &(t.p);
int* myptr = const_cast<int*>(pptr);
*myptr = 5;
printf("t.p is %d\n", t.p);
return 0;
}
then it prints "t.p is 5"!
Why is there a difference between the two? Why is "*myptr = 5;" silently failing in my first example, and what action is it performing, if any?
First of all, you're invoking undefined behavior in both cases by trying to modify a constant variable.
In the first case the compiler sees that MyNull is declared as a constant and replaces all references to it within main() with a 0.
In the second case, since p is within a class the compiler is unable to determine that it can just replace all classInstance.p with 0, so you see the result of the modification.
Firstly, what happens in the first case is that the compiler most likely translates your
printf("MyNull is %d\n", MyNull);
into the immediate
printf("MyNull is %d\n", 0);
because it knows that const objects never change in a valid program. Your attempts to change a const object leads to undefined behavior, which is exactly what you observe. So, ignoring the undefined behavior for a second, from the practical point of view it is quite possible that your *myptr = 5 successfully modified your Null. It is just that your program doesn't really care what you have in your Null now. It knows that Null is zero and will always be zero and acts accordingly.
Secondly, in order to define NULL per recommendation you were referring to, you have to define it specifically as an Integral Constant Expression (ICE). Your first variant is indeed an ICE. You second variant is not. Class member access is not allowed in ICE, meaning that your second variant is significantly different from the first. The second variant does not produce a viable definition for NULL, and you will not be able to initialize pointers with your test::p even though it is declared as const int and set to zero
SomeType *ptr1 = Null; // OK
test t;
SomeType *ptr2 = t.p; // ERROR: cannot use an `int` value to initialize a pointer
As for the different output in the second case... undefined behavior is undefined behavior. It is unpredictable. From the practical point of view, your second context is more complicated, so the compiler was unable to prefrom the above optimization. i.e. you are indeed succeeded in breaking through the language-level restrictions and modifying a const-qualified variable. Language specification does not make it easy (or possible) for the compilers to optimize out const members of the class, so at the physical level that p is just another member of the class that resides in memory, in each object of that class. Your hack simply modifies that memory. It doesn't make it legal though. The behavior si still undefined.
This all, of course, is a rather pointless exercise. It looks like it all began from the "what about const_cast" question. So, what about it? const_cast has never been intended to be used for that purpose. You are not allowed to modify const objects. With const_cast, or without const_cast - doesn't matter.
Your code is modifying a variable declared constant so anything can happen. Discussing why a certain thing happens instead of another one is completely pointless unless you are discussing about unportable compiler internals issues... from a C++ point of view that code simply doesn't have any sense.
About const_cast one important thing to understand is that const cast is not for messing about variables declared constant but about references and pointers declared constant.
In C++ a const int * is often understood to be a "pointer to a constant integer" while this description is completely wrong. For the compiler it's instead something quite different: a "pointer that cannot be used for writing to an integer object".
This may apparently seem a minor difference but indeed is a huge one because
The "constness" is a property of the pointer, not of the pointed-to object.
Nothing is said about the fact that the pointed to object is constant or not.
The word "constant" has nothing to do with the meaning (this is why I think that using const it was a bad naming choice). const int * is not talking about constness of anything but only about "read only" or "read/write".
const_cast allows you to convert between pointers and references that can be used for writing and pointer or references that cannot because they are "read only". The pointed to object is never part of this process and the standard simply says that it's legal to take a const pointer and using it for writing after "casting away" const-ness but only if the pointed to object has not been declared constant.
Constness of a pointer and a reference never affects the machine code that will be generated by a compiler (another common misconception is that a compiler can produce better code if const references and pointers are used, but this is total bogus... for the optimizer a const reference and a const pointer are just a reference and a pointer).
Constness of pointers and references has been introduced to help programmers, not optmizers (btw I think that this alleged help for programmers is also quite questionable, but that's another story).
const_cast is a weapon that helps programmers fighting with broken const-ness declarations of pointers and references (e.g. in libraries) and with the broken very concept of constness of references and pointers (before mutable for example casting away constness was the only reasonable solution in many real life programs).
Misunderstanding of what is a const reference is also at the base of a very common C++ antipattern (used even in the standard library) that says that passing a const reference is a smart way to pass a value. See this answer for more details.
The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?