I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.
Related
The following code compiles successfully in g++ version 10.1.0:
int main()
{
int& x = x;
return x;
}
The compiler even has a warning defined for this, suggesting it's intentional that it isn't an error. When compiled with -Wall, it returns the following output:
<stdin>: In function ‘int main()’:
<stdin>:3:14: warning: reference ‘x’ is initialized with itself [-Winit-self]
3 | int& x = x;
| ^
<stdin>:3:14: warning: ‘x’ is used uninitialized in this function [-Wuninitialized]
3 | int& x = x;
|
My testing has shown that this results in a reference to a value at address 0, same as if it were declared as the int& x = *(int*)nullptr;. Naturally, this results in a segfault if the value referenced by x is ever used.
I can think of one (questionable) case where this might be useful: if you want to call a method on a class with no accessible constructor, when that method isn't static but nonetheless doesn't make any use of the class's data:
struct S {
S() = delete;
int value() { return 69; }
};
int main()
{
S& s = s;
return s.value();
}
But then there are other ways to do that, which don't even give warnings with -Wall and are less likely to be undefined behavior. (For example, ((S*)0)->value().)
So what I'm asking is: is this in fact undefined behavior? It certainly seems like something that would be. Is there any conceivable situation in which a self-reference is the best solution? And if not, why is the warning suppressed by default?
The reason that this code is accepted is because the C++ grammar allows for it; not because there is any practical reason for it.
In the C++ grammar, the first part of the definition introduces the name identifier which may be in in the rest of the scope, including in the same statement. This is what allows you to have code like:
int x = 1, &y = x;
The first int x produces the name x which may be used anywhere after it is seen. This has the side-effect of also allowing self-assignment, such as:
int x = x;
This assignment is, in fact, undefined behavior because x is not yet a initialized object, and is thus reading from uninitialized data (which is UB).
The same is true with a self-assigned reference; the int& x introduces the name x, whereas = x binds to a type that is bindable to a reference (which also happens to be int& x). This reference was not yet initialized by the time it was assigned, so this access is undefined behavior.
The compiler is free to treat undefined behavior however it chooses -- which it appears you are observing gcc treating the referenced address as 0x0. This is not guaranteed behavior, and may be influenced based on compiler version, optimization level, etc.
There is no real practical purpose for this; but it is easy to combat with using modern practices like auto, since code using auto in a self-construction will be unable to deduce the underlying type:
auto x = x; // error, cannot deduce x
auto& y = y; // error, cannot deduce y
why is the warning suppressed by default?
The C++ standard only indicates a handful of errors that require diagnostic messages. Many errors fall under "no diagnostic required", which compilers may choose to add warnings for through optional flags. This is up to the implementation's discretion, and would be a question for the compiler's authors.
Most uninitialized access errors don't require diagnostics, which I am assuming this falls under.
As for your questionable-reason for using a self-reference: this would be undefined behavior, as is your example ((S*)0)->value().
Since the reference is constructed from UB, accessing any members from it is also UB.
For your other example, any form of member access on a null pointer is undefined behavior regardless of whether that member function accesses any member state, or whether it gets inlined. This doesn't mean it may not work with the right compiler flags on the right compiler; but it is not a portable, reliable, or safe solution.
The following C++ program compiles just fine (g++ 5.4 at least gives a warning when invoked with -Wall):
int main(int argc, char *argv[])
{
int i = i; // !
return 0;
}
Even something like
int& p = p;
is swallowed by the compiler.
Now my question is: Why is such an initialization legal? Is there any actual use-case or is it just a consequence of the general design of the language?
This is a side effect of the rule that a name is in scope immediately after it is declared. There's no need to complicate this simple rule just to prevent writing code that's obvious nonsense.
Just because the compiler accepts it (syntactically valid code) does not mean that it has well defined behaviour.
The compiler is not required to diagnose all cases of Undefined Behaviour or other classes of problems.
The standard gives it pretty free hands to accept and translate broken code, on the assumption that if the results were to be undefined or nonsensical the programmer would not have written that code.
So; the absense of warnings or errors from your compiler does not in any way prove that your program has well defined behaviour.
It is your responsibility to follow the rules of the language.
The compiler usually tries to help you by pointing out obvious flaws, but in the end it's on you to make sure your program makes sense.
And something like int i = i; does not make sense but is syntactically correct, so the compiler may or may not warn you, but in any case is within its rights to just generate garbage (and not tell you about it) because you broke the rules and invoked Undefined Behaviour.
I guess the gist of your question is about why the second identifier is recognized as identifying the same object as the first, in int i = i; or int &p = p;
This is defined in [basic.scope.pdecl]/1 of the C++14 standard:
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below. [Example:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
The semantics of these statements are covered by other threads:
Is int x = x; UB?
Why can a Class& be initialized to itself?
Note - the quoted example differs in semantics from int i = i; because it is not UB to evaluate an uninitialized unsigned char, but is UB to evaluate an uninitialized int.
As noted on the linked thread, g++ and clang can give warnings when they detect this.
Regarding rationale for the scope rule: I don't know for sure, but the scope rule existed in C so perhaps it just made its way into C++ and now it would be confusing to change it.
If we did say that the declared variable is not in scope for its initializer, then int i = i; might make the second i find an i from an outer scope, which would also be confusing.
I just stumbled upon a behavior which surprised me:
When writing:
int x = x+1;
in a C/C++-program (or even more complex expression involving the newly created variable x) my gcc/g++ compiles without errors. In the above case X is 1 afterwards. Note that there is no variable x in scope by a previous declaration.
So I'd like to know whether this is correct behaviour (and even might be useful in some situation) or just a parser pecularity with my gcc version or gcc in general.
BTW: The following does not work:
int x++;
With the expression:
int x = x + 1;
the variable x comes into existence at the = sign, which is why you can use it on the right hand side. By "comes into existence", I mean the variable exists but has yet to be assigned a value by the initialiser part.
However, unless you're initialising a variable with static storage duration (e.g., outside of a function), it's undefined behaviour since the x that comes into existence has an arbitrary value.
C++03 has this to say:
The point of declaration for a name is immediately after its complete declarator (clause 8) and before its initializer (if any) ...
Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
That second case there is pretty much what you have in your question.
It's not, it's undefined behavior.
You're using an uninitialized variable - x. You get 1 out of pure luck, anything could happen.
FYI, in MSVS I get a warning:
Warning 1 warning C4700: uninitialized local variable 'i' used
Also, at run-time, I get an exception, so it's definitely not safe.
int x = x + 1;
is basically
int x;
x = x + 1;
You have just been lucky to have 0 in x.
int x++;
however is not possible in C++ at a parser level! The previous could be parsed but was semantically wrong. The second one can't even be parsed.
In the first case you simply use the value already at the place in memory where the variable is. In your case this seems to be zero, but it can be anything. Using such a construct is a recipe for disaster and hard to find bugs in the future.
For the second case, it's simply a syntax error. You can not mix an expression with a variable declaration like that.
The variable is defined from the "=" on, so it is valid and when it is globally defined, it is initialized as zero, so in that case it is defined behavior, in others the variable was unintialized as as such still is unitialized (but increased with 1).
Remark that it still is not very sane or useful code.
3.3.1 Point of declaration 1 The point of declaration for a name is immediately after its complete declarator (clause 8) and before its
initializer (if any), except as noted below. [ Example: int x = 12; {
int x = x; } Here the second x is initialized with its own
(indeterminate) value. —end example ]
The above states so and should have indeterminate value, You are lucky with 1.
Your code has two possiblities:
If x is a local variable, you have undefined behavior, since you use the value of an object before its lifetime begins.
If x has static or thread-local lifetime, it is pre-initialized to zero, and your static initialization will reliably set it to 1. This is well-defined.
You may also wish to read my answer that covers related cases, including variables of other types, and variables which are written to before their initialization is completed
This is undefined behaviour and the compiler should at least to issue a warning. Try to compile using g++ -ansi .... The second example is just a syntax error.
Out of curiosity, I've tried this code, resulting from an interview question[*]
int main(int argc, char *argv[])
{
int a = 1234;
printf("Outer: %d\n", a);
{
int a(a);
printf("Inner: %d\n", a);
}
}
When compiled on Linux (both g++ 4.6.3 and clang++ 3.0) it outputs:
Outer: 1234
Inner: -1217375632
However on Windows (VS2010) it prints:
Outer: 1234
Inner: 1234
The rationale would be that, until the copy-constructor of the second 'a' variable has finished, the first 'a' variable is still accessible. However I'm not sure if this is standard behaviour, or just a(nother) Microsoft quirk.
Any idea?
[*] The actual question was:
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
{
// Not at global scope here
int a = 1234;
{
int a;
// how do you set this a to the value of the containing scope a ?
}
}
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
Unless the outer scope can be explicitly named you cannot do this. You can explicitly name the global scope, namespace scopes, and class scopes, but not function or block statement scopes.
C++11 [basic.scope.pdecl 3.3.2 p1 states:
The point of declaration for a name is immediately after its complete declarator (Clause 8) and before its initializer (if any), except as noted below. [ Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
MSVC correctly implements this example, however it does not correctly implement this when the initializer uses parentheses instead of assignment syntax. There's a bug filed about this on microsoft connect.
Here's an example program with incorrect behavior in VS as a result of this bug.
#include <iostream>
int foo(char) { return 0; }
int foo(int) { return 1; }
int main()
{
char x = 'a';
{
int x = foo(static_cast<decltype(x)>(0));
std::cout << "'=' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
{
int x(foo(static_cast<decltype(x)>(0)));
std::cout << "'()' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
}
C++ includes the following note.
[ Note: Operations involving indeterminate values may cause undefined behavior. —end note ]
However, this note indicates that operations may cause undefined behavior, not that they necessarily do. The above linked bug report includes an acknowledgement from Microsoft that this is a bug and not that the program triggers undefined behavior.
Edit: And now I've changed the example so that the object with indeterminate value is only 'used' in an unevaluated context, and I believe that this absolutely rules out the possibility of undefined behavior on any platform, while still demonstrating the bug in Visual Studio.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
If you want to get technical about the wording, it's pretty easy. A "temporary" has a specific meaning in C++ (see §12.2); any named variable you create is not a temporary. As such, you can just create a local variable (which is not a temporary) initialized with the correct value:
int a = 1234;
{
int b = a;
int a = b;
}
An even more defensible possibility would be to use a reference to the variable in the outer scope:
int a = 1234;
{
int &ref_a = a;
int a = ref_a;
}
This doesn't create an extra variable at all -- it just creates an alias to the variable at the outer scope. Since the alias has a different name, we retain access to the variable at the outer scope, without defining a variable (temporary or otherwise) to do so. Many references are implemented as pointers internally, but in this case (at least with a modern compiler and optimization turned on) I'd expect it not to be -- that the alias really would just be treated as a different name referring to the variable at the outer scope (and a quick test with VC++ shows that it works this way -- the generated assembly language doesn't use ref_a at all).
Another possibility along the same lines would be like this:
const int a = 10;
{
enum { a_val = a };
int a = a_val;
}
This is somewhat similar to the reference, except that in this case there's not even room for argument about whether a_val could be called a variable -- it absolutely is not a variable. The problem is that an enumeration can only be initialized with a constant expression, so we have to define the outer variable as const for it to work.
I doubt any of these is what the interviewer really intended, but all of them answer the question as stated. The first is (admittedly) a pure technicality about definitions of terms. The second might still be open to some argument (many people think of references as variables). Though it restricts the scope, there's no room for question or argument about the third.
What you are doing, initializing a variable with itself, is undefined behavior. All your test cases got it right, this is not a quirk. An implementation could also initialize a to 123456789 and it would still be standard.
Update: The comments on this answer point that initializing a variable with itself is not undefined behavior, but trying to read such variable is.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
You can't. As soon as the identical name is declared, the outer name is inaccessible for the rest of the scope. You'd need a copy or an alias of the outer variable, which means you'd need a temporary variable.
I'm surprised that, even with the warning level cranked up, VC++ doesn't complain on this line:
int a(a);
Visual C++ will sometimes warn you about hiding a variable (maybe that's only for members of derived classes). It's also usually pretty good about telling you you're using a value before it has been initialized, which is the case here.
Looking at the code generated, it happens to initialize the inner a to the same value of the outer a because that's what's left behind in a register.
I had a look at the standard, it's actually a grey area but here's my 2 cents...
3.1 Declarations and definitions [basic.def]
A declaration introduces names into a translation unit or redeclares names introduced by previous declarations.
A declaration is a definition unless... [non relevant cases follow]
3.3.1 Point of declaration
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below [self-assignment example].
A nonlocal name remains visible up to the point of declaration of the local name that hides it.
Now, if we assume that this is the point of declaration of the inner 'a' (3.3.1/1)
int a (a);
^
then the outer 'a' should be visible up to that point (3.3.1/2), where the inner 'a' is defined.
Problem is that in this case, according to 3.1/2, a declaration IS a definition. This means the inner 'a' should be created. Until then, I can't understand from the standard whether the outer 'a' is still visible or not. VS2010 assumes that it is, and all that falls within the parentheses refers to the outer scope. However clang++ and g++ treat that line as a case of self-assignment, which results in undefined behaviour.
I'm not sure which approach is correct, but I find VS2010 to be more consistent: the outer scope is still visible until the inner 'a' is fully created.
I have this question, which i thought about earlier, but figured it's not trivial to answer
int x = x + 1;
int main() {
return x;
}
My question is whether the behavior of the program is defined or undefined if it's valid at all. If it's defined, is the value of x known in main?
I'm pretty sure it's defined, and x should have the value 1. §3.6.2/1 says: "Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place."
After that, I think it's all pretty straightforward.
My question is whether the behavior of the program is defined or undefined if it's valid at all. If it's defined, is the value of x known in main?
This code is definitely not clean, but to me it should work predictably.
int x puts the variable into the data segment which is defined to be zero at the program start. Before main(), static initializers are called. For x that is the code x = x + 1. x = 0 + 1 = 1. Thus the main() would return 1.
The code would definitely work in unpredictable fashion if x is a local variable, allocated on stack. State of stack, unlike the data segment, is pretty much guaranteed to contain undefined garbage.
The 'x' variable in stored in the .bss, which is filled with 0s when you load the program. Consequently, the value of 'x' is 0 when the program gets loaded in memory.
Then before main is called, "x = x + 1" is executed.
I don't know if it's valid or not, but the behavior is not undefined.
Before the main call x must be initialized to 0 therefore it's value must be 1 one you enter main, and you will return 1. It's a defined behavior.