The following C++ program compiles just fine (g++ 5.4 at least gives a warning when invoked with -Wall):
int main(int argc, char *argv[])
{
int i = i; // !
return 0;
}
Even something like
int& p = p;
is swallowed by the compiler.
Now my question is: Why is such an initialization legal? Is there any actual use-case or is it just a consequence of the general design of the language?
This is a side effect of the rule that a name is in scope immediately after it is declared. There's no need to complicate this simple rule just to prevent writing code that's obvious nonsense.
Just because the compiler accepts it (syntactically valid code) does not mean that it has well defined behaviour.
The compiler is not required to diagnose all cases of Undefined Behaviour or other classes of problems.
The standard gives it pretty free hands to accept and translate broken code, on the assumption that if the results were to be undefined or nonsensical the programmer would not have written that code.
So; the absense of warnings or errors from your compiler does not in any way prove that your program has well defined behaviour.
It is your responsibility to follow the rules of the language.
The compiler usually tries to help you by pointing out obvious flaws, but in the end it's on you to make sure your program makes sense.
And something like int i = i; does not make sense but is syntactically correct, so the compiler may or may not warn you, but in any case is within its rights to just generate garbage (and not tell you about it) because you broke the rules and invoked Undefined Behaviour.
I guess the gist of your question is about why the second identifier is recognized as identifying the same object as the first, in int i = i; or int &p = p;
This is defined in [basic.scope.pdecl]/1 of the C++14 standard:
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below. [Example:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
The semantics of these statements are covered by other threads:
Is int x = x; UB?
Why can a Class& be initialized to itself?
Note - the quoted example differs in semantics from int i = i; because it is not UB to evaluate an uninitialized unsigned char, but is UB to evaluate an uninitialized int.
As noted on the linked thread, g++ and clang can give warnings when they detect this.
Regarding rationale for the scope rule: I don't know for sure, but the scope rule existed in C so perhaps it just made its way into C++ and now it would be confusing to change it.
If we did say that the declared variable is not in scope for its initializer, then int i = i; might make the second i find an i from an outer scope, which would also be confusing.
Related
The following code compiles successfully in g++ version 10.1.0:
int main()
{
int& x = x;
return x;
}
The compiler even has a warning defined for this, suggesting it's intentional that it isn't an error. When compiled with -Wall, it returns the following output:
<stdin>: In function ‘int main()’:
<stdin>:3:14: warning: reference ‘x’ is initialized with itself [-Winit-self]
3 | int& x = x;
| ^
<stdin>:3:14: warning: ‘x’ is used uninitialized in this function [-Wuninitialized]
3 | int& x = x;
|
My testing has shown that this results in a reference to a value at address 0, same as if it were declared as the int& x = *(int*)nullptr;. Naturally, this results in a segfault if the value referenced by x is ever used.
I can think of one (questionable) case where this might be useful: if you want to call a method on a class with no accessible constructor, when that method isn't static but nonetheless doesn't make any use of the class's data:
struct S {
S() = delete;
int value() { return 69; }
};
int main()
{
S& s = s;
return s.value();
}
But then there are other ways to do that, which don't even give warnings with -Wall and are less likely to be undefined behavior. (For example, ((S*)0)->value().)
So what I'm asking is: is this in fact undefined behavior? It certainly seems like something that would be. Is there any conceivable situation in which a self-reference is the best solution? And if not, why is the warning suppressed by default?
The reason that this code is accepted is because the C++ grammar allows for it; not because there is any practical reason for it.
In the C++ grammar, the first part of the definition introduces the name identifier which may be in in the rest of the scope, including in the same statement. This is what allows you to have code like:
int x = 1, &y = x;
The first int x produces the name x which may be used anywhere after it is seen. This has the side-effect of also allowing self-assignment, such as:
int x = x;
This assignment is, in fact, undefined behavior because x is not yet a initialized object, and is thus reading from uninitialized data (which is UB).
The same is true with a self-assigned reference; the int& x introduces the name x, whereas = x binds to a type that is bindable to a reference (which also happens to be int& x). This reference was not yet initialized by the time it was assigned, so this access is undefined behavior.
The compiler is free to treat undefined behavior however it chooses -- which it appears you are observing gcc treating the referenced address as 0x0. This is not guaranteed behavior, and may be influenced based on compiler version, optimization level, etc.
There is no real practical purpose for this; but it is easy to combat with using modern practices like auto, since code using auto in a self-construction will be unable to deduce the underlying type:
auto x = x; // error, cannot deduce x
auto& y = y; // error, cannot deduce y
why is the warning suppressed by default?
The C++ standard only indicates a handful of errors that require diagnostic messages. Many errors fall under "no diagnostic required", which compilers may choose to add warnings for through optional flags. This is up to the implementation's discretion, and would be a question for the compiler's authors.
Most uninitialized access errors don't require diagnostics, which I am assuming this falls under.
As for your questionable-reason for using a self-reference: this would be undefined behavior, as is your example ((S*)0)->value().
Since the reference is constructed from UB, accessing any members from it is also UB.
For your other example, any form of member access on a null pointer is undefined behavior regardless of whether that member function accesses any member state, or whether it gets inlined. This doesn't mean it may not work with the right compiler flags on the right compiler; but it is not a portable, reliable, or safe solution.
The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?
All other declaration syntaxes in C++ make a lot of sense, for examples:
int i;
i is an int
int *i;
when i is dereferenced, the result is an int
int i[];
when i is subscripted, the result is and int
int *i[];
when i is subscriped, then the result is derefrenced, the final result is an int
But when you look at the syntax for reference variables, this otherwise consistent reasoning falls apart.
int &i = x;
“when the address of i is taken, the result is an int” makes no sense.
Am I missing something, or is this truly an exception to the apparent reasoning behind the other sytaxes? If it is an exception, why was this syntax chosen?
Edit:
This question addresses why the & symbol may have been chosen for this purpose, but not whether or not there is a universally consistent way to read declarations different from the way described above.
Once bound, a reference becomes an alias for its referent, and cannot be distinguished from it (except by decltype). Since int& is used exactly as int is, a declaration-follows-usage syntax could not work for declaring references.
The syntax for declaring references is pretty straightforward, still. Just write down a declaration for the corresponding pointer type, then replace the * used for the initial dereference by & or &&.
Consider the following code snippet:
union
{
int a;
float b;
};
a = /* ... */;
b = a; // is this UB?
b = b + something;
Is the assignment of one union member to another valid?
Unfortunately I believe the answer to this question is that this operation on unions is under specified in C++, although self assignment is perfectly ok.
Self assignment is well defined behavior, if we look at the draft C++ standard section 1.9 Program execution paragraph 15 has the following examples:
void f(int, int);
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
i = i + 1; // the value of i is incremented
f(i = -1, i = -1); // the behavior is undefined
}
and self assignment is covered in the i = i + 1 example.
The problem here is that unlike C89 forward which supports type-punning in C++ it is not clear. We only know that:
In a union, at most one of the non-static data members can be active at any time
but as this discussion in the WG21 UB study group mailing list shows this concept is not well understood, we have the following comments:
While the standard uses the term "active field", it does not define it
and points out this non-normative note:
Note: In general, one must use explicit destructor calls and placement new operators to change the active member of a union. — end note
so we have to wonder whether:
b = a;
makes b the active member or not? I don't know and I don't see a way to prove it with the any of the current versions of the draft standard.
Although in all practicality most modern compilers for example gcc supports type-punning in C++, which means that the whole concept of the active member is bypassed.
I would expect that unless the source and destination variables are the same type, such a thing would be Undefined Behavior in C, and I see no reason to expect C++ to handle it any differently. Given long long *x,*y;, some compilers might process a statement like *x = *y >>8; by generating code to read all of *y, compute the result, and store it to *x, but a compiler might perfectly legitimately write code that copied parts of *y to *x individually. The standard makes clear that if *x and *y are pointers to the same object of the same type, the compiler must ensure that no part of the value gets overwritten while that part is still needed in the computation, but compilers are not required to deal with aliasing in other situations.
The C++ standard states that returning reference to a local variable (on the stack) is undefined behaviour, so why do many (if not all) of the current compilers only give a warning for doing so?
struct A{
};
A& foo()
{
A a;
return a; //gcc and VS2008 both give this a warning, but not a compiler error
}
Would it not be better if compilers give a error instead of warning for this code?
Are there any great advantages to allowing this code to compile with just a warning?
Please note that this is not about a const reference which could lengthen the lifetime of the temporary to the lifetime of the reference itself.
It is almost impossible to verify from a compiler point of view whether you are returning a reference to a temporary. If the standard dictated that to be diagnosed as an error, writing a compiler would be almost impossible. Consider:
bool not_so_random() { return true; }
int& foo( int x ) {
static int s = 10;
int *p = &s;
if ( !not_so_random() ) {
p = &x;
}
return *p;
}
The above program is correct and safe to run, in our current implementation it is guaranteed that foo will return a reference to a static variable, which is safe. But from a compiler perspective (and with separate compilation in place, where the implementation of not_so_random() is not accessible, the compiler cannot know that the program is well-formed.
This is a toy example, but you can imagine similar code, with different return paths, where p might refer to different long-lived objects in all paths that return *p.
Undefined behaviour is not a compilation error, it's just not a well-formed C++ program. Not every ill-formed program is incompilable, it's just un-predictable. I'd wager a bet that it's not even possible in principle for a computer to decide whether a given program text is a well-formed C++ program.
You can always add -Werror to gcc to make warnings terminate compilation with an error!
To add another favourite SO topic: Would you like ++i++ to cause a compile error, too?
If you return a pointer/reference to a local inside function the behavior is well defined as long as you do not dereference the pointer/reference returned from the function.
It is an Undefined Behavior only when one derefers the returned pointer.
Whether it is a Undefined Behavior or not depends on the code calling the function and not the function itself.
So just while compiling the function, the compiler cannot determine if the behavior is Undefined or Well Defined. The best it can do is to warn you of a potential problem and it does!
An Code Sample:
#include <iostream>
struct A
{
int m_i;
A():m_i(10)
{
}
};
A& foo()
{
A a;
a.m_i = 20;
return a;
}
int main()
{
foo(); //This is not an Undefined Behavior, return value was never used.
A ref = foo(); //Still not an Undefined Behavior, return value not yet used.
std::cout<<ref.m_i; //Undefined Behavior, returned value is used.
return 0;
}
Reference to the C++ Standard:
section 3.8
Before the lifetime of an object has started but after the storage which the object will occupy has been allo-cated 34) or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. Such a pointer refers to allocated storage (3.7.3.2), and using the
pointer as if the pointer were of type void*, is well-defined. Such a pointer may be dereferenced but the resulting lvalue may only be used in limited ways, as described below. If the object will be or was of a class type with a non-trivial destructor, and the pointer is used as the operand of a delete-expression, the program has undefined behavior. If the object will be or was of a non-POD class type, the program has undefined behavior if:
— .......
Because standard does not restrict us.
If you want to shoot to your own foot you can do it!
However lets see and example where it can be useful:
int &foo()
{
int y;
}
bool stack_grows_forward()
{
int &p=foo();
int my_p;
return &my_p < &p;
}
Compilers should not refuse to compile programs unless the standard says they are allowed to do so. Otherwise it would be much harder to port programs, since they might not compile with a different compiler, even though they comply with the standard.
Consider the following function:
int foobar() {
int a=1,b=0;
return a/b;
}
Any decent compiler will detect that I am dividing by zero, but it should not reject the code since I might actually want to trigger a SIG_FPE signal.
As David Rodríguez has pointed out, there are some cases which are undecidable but there are also some which are not. Some new version of the standard might describe some cases where the compiler must/is allowed to reject programs. That would require the standard to be very specific about the static analysis which is to be performed.
The Java standard actually specifies some rules for checking that non-void methods always return a value. Unfortunately I haven't read enough of the C++ standard to know what the compiler is allowed to do.
You could also return a reference to a static variable, which would be valid code so the code must be able to compile.
It's pretty much super-bad practice to rely on this, but I do believe that in many cases (and that's never a good wager), that memory reference would still be valid if no functions are called between the time foo() returns and the time the calling function uses its return value. In that case, that area of the stack would not have an opportunity to get overwritten.
In C and C++ you can choose to access arbitrary sections of memory anyway (within the process's memory space, of course) via pointer arithmetic, so why not allow the possibility of constructing a reference to wherever one so chooses?