In my mind, always, definition means storage allocation.
In the following code, int i allocates a 4-byte (typically) storage on program stack and bind it to i, and i = 3 assigns 3 to that storage. But because of goto, definition is bypassed which means there is no storage allocated for i.
I heard that local variables are allocated either at the entry of the function (f() in this case) where they reside, or at the point of definition.
But either way, how can i be used while it hasn't been defined yet (no storage at all)? Where does the value three assigned to when executing i = 3?
void f()
{
goto label;
int i;
label:
i = 3;
cout << i << endl; //prints 3 successfully
}
Long story short; goto will result is a runtime jump, variable definition/declaration will result in storage allocation, compile time.
The compiler will see and decide on how much storage to allocate for an int, it will also make so that this allocated storage will be set to 3 when "hitting" i = 3;.
That memory location will be there even if there is a goto at the start of your function, before the declaration/definition, just as in your example.
Very silly simile
If I place a log on the ground and my friend runs (with his eyes closed) and jumps over it, the log will still be there - even if he hasn't seen or felt it.
It's realistic to say that he could turn around (at a later time) and set it on fire, if he wanted to. His jump doesn't make the log magically disappear.
Your code is fine. The variable lives wherever it would live had the goto not been there.
Note that there are situations where you can't jump over a declaration:
C++11 6.7 Declaration statement [stmt.dcl]
3 It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A
program that jumps from a point where a variable with automatic storage duration is not in scope to a
point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default
constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the
preceding types and is declared without an initializer (8.5). [ Example:
void f()
{
// ...
goto lx; // ill-formed: jump into scope of `a'
// ...
ly:
X a = 1;
// ...
lx:
goto ly; // ok, jump implies destructor
// call for `a' followed by construction
// again immediately following label ly
}
—end example ]
Definitions are not executable code. They are just instructions to the compiler, letting it know the size and the type of the variable. In this sense, the definition is not bypassed by the goto statement.
If you use a class with a constructor instead of an int, the call of the constructor would be bypassed by the goto, but the storage would be allocated anyway. The class instance would remain uninitialized, however, so using it before its definition/initialization line gets the control is an error.
In my mind, always, definition means storage allocation.
This is not correct. The storage for the variable is reserved by the compiler when it creates the stack-layout for the function. The goto just bypasses the initialization. Since you assign a value before printing, everything is fine.
The control of flow has nothing to do with variable's storage which is reserved at compile time by the compiler.
The goto statement only effects the dynamic initialization of the object. For built-in types and POD types, it doesn't matter, for they can remain uninitialized. However, for non-POD types, this would result in compilation error. For example see this
struct A{ A(){} }; //it is a non-POD type
void f()
{
goto label;
A a; //error - you cannot skip this!
label:
return;
}
Error:
prog.cpp: In function ‘void f()’:
prog.cpp:8: error: jump to label ‘label’
prog.cpp:5: error: from here
prog.cpp:6: error: crosses initialization of ‘A a’
See here : http://ideone.com/p6kau
In this example A is a non-POD type because it has user-defined constructor, which means the object needs to be dynamically initialized, but since the goto statement attempts to skip this, the compiler generates error, as it should.
Please note that objects of only built-in types and POD types can remain uninitialized.
To make it short, variable declaration is lexical, i.e. pertaining to the lexical {}-enclosed blocks. The binding is valid from the line it is declared to the end of the block. It is unaffected by flow control (goto).
Variable assignment of locol (stack) variables, on the other hand, is a runtime operation performed when the control flow gets there. So goto has an influence on that.
Things get a bit more tricky when object construction becomes involved, but that's not your case here.
The position of the declaration of i is irrelevant to the compiler. You can prove this to yourself by compiling your code with int i before the goto and then after and comparing the generated assembly:
g++ -S test_with_i_before_goto.cpp -o test1.asm
g++ -S test_with_i_after_goto.cpp -o test2.asm
diff -u test1.asm test2.asm
The only difference in this case is the source file name (.file) reference.
The definition of a variable DOES NOT allocate memory for the variable. It does tell the compiler to prepare appropriate memory space to store the variable though, but the memory is not allocated when control passed the definition.
What really matters here is initialization.
Related
void f()
{
auto x = func();
int y = func();
auto z = f1() * f2() + static_cast<int>(f3());
}
I believe it should be defined that call to the func will always happened first, before memory allocation for x, for the case with auto, but couldn't found info about it.
Is it so?
And is it defined for the case when type is explicitly written?
Evaluation of the initialization expression (func() or f1() * f2() + static_cast<int>(f3())) definitely happens only when the particular line of code is reached.1
Memory for the variable may be obtained (aka "allocation") at any time earlier... however there's no way to use that memory prior to the definition because until the definition is reached there's no way to name that memory. The variable name is introduced into scope by the definition.
The lifetime of the object living in the variable doesn't begin until the initializer is fully evaluated and placed2 into the new object. See [basic.life]:
The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
its initialization (if any) is complete (including vacuous initialization)
If the address of the variable is never taken, the variable might not need any memory at all (it could fit in a CPU register for its entire lifetime).
1 Well, under the as-if rule, the compiler can move it around so long as you can't tell the difference. Unless you have undefined behavior such as a data race, it will always act exactly like the computation is done when reaching that line of code.
2 For the copy-initialization syntax used in the question, old versions of C++ generally required a copy or move, while newer versions mandate in-place construction via copy-elision.
I have encountered a problem in my learning of C++, where a local variable in a function is being passed to the local variable with the same name in another function, both of these functions run in main().
When this is run,
#include <iostream>
using namespace std;
void next();
void again();
int main()
{
int a = 2;
cout << a << endl;
next();
again();
return 0;
}
void next()
{
int a = 5;
cout << a << endl;
}
void again()
{
int a;
cout << a << endl;
}
it outputs:
2
5
5
I expected that again() would say null or 0 since 'a' is declared again there, and yet it seems to use the value that 'a' was assigned in next().
Why does next() pass the value of local variable 'a' to again() if 'a' is declared another time in again()?
http://en.cppreference.com/w/cpp/language/ub
You're correct, an uninitialized variable is a no-no. However, you are allowed to declare a variable and not initialize it until later. Memory is set aside to hold the integer, but what value happens to be in that memory until you do so can be anything at all. Some compilers will auto-initialize variables to junk values (to help you catch bugs), some will auto-initialize to default values, and some do nothing at all. C++ itself promises nothing, hence it's undefined behavior. In your case, with your simple program, it's easy enough to imagine how the compiler created assembly code that reused that exact same piece of memory without altering it. However, that's blind luck, and even in your simple program isn't guaranteed to happen. These types of bugs can actually be fairly insidious, so make it a rule: Be vigilant about uninitialized variables.
An uninitialized non-static local variable of *built-in type (phew! that was a mouthful) has an indeterminate value. Except for the char types, using that value yields formally Undefined Behavior, a.k.a. UB. Anything can happen, including the behavior that you see.
Apparently with your compiler and options, the stack area that was used for a in the call of next, was not used for something else until the call of again, where it was reused for the a in again, now with the same value as before.
But you cannot rely on that. With UB anything, or nothing, can happen.
* Or more generally of POD type, Plain Old Data. The standard's specification of this is somewhat complicated. In C++11 it starts with §8.5/11, “If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.”. Where “automatic … storage duration” includes the case of local non-static variable. And where the “no initialization” can occur in two ways via §8.5/6 that defines default initialization, namely either via a do-nothing default constructor, or via the object not being of class or array type.
This is completely coincidental and undefined behavior.
What's happened is that you have two functions called immediately after one another. Both will have more or less identical function prologs and both reserve a variable of exactly the same size on the stack.
Since there are no other variables in play and the stack is not modified between the calls, you just happen to end up with the local variable in the second function "landing" in the same place as the previous function's local variable.
Clearly, this is not good to rely upon. In fact, it's a perfect example of why you should always initialize variables!
I have a program with a switch statement similar to this:
switch(n)
{
case 0:
/* stuff */
break;
int foo;
case 1:
foo = 5;
break;
case 2:
foo = 6;
break;
}
Notice the int foo; between case 0 and case 1. This statement is unreachable: if you walk through the program, you'll never step over it.
This compiles without warnings or errors with Clang, but it seemed to be jacked up when I ran it (though that could be due to other causes).
Is it well-defined behavior to declare a variable in an unreachable statement and use it in reachable statements, and is it going to work?
It is well-defined behavior as long as the variable has trivial construction, and has (approximately)
the same effect as if the variable was declared in a larger scope.
If any initialization is needed, you'll get an error.
section 6.7 says
It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program that jumps from a point where a variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the preceding types and is declared without an initializer.
Out of curiosity, I've tried this code, resulting from an interview question[*]
int main(int argc, char *argv[])
{
int a = 1234;
printf("Outer: %d\n", a);
{
int a(a);
printf("Inner: %d\n", a);
}
}
When compiled on Linux (both g++ 4.6.3 and clang++ 3.0) it outputs:
Outer: 1234
Inner: -1217375632
However on Windows (VS2010) it prints:
Outer: 1234
Inner: 1234
The rationale would be that, until the copy-constructor of the second 'a' variable has finished, the first 'a' variable is still accessible. However I'm not sure if this is standard behaviour, or just a(nother) Microsoft quirk.
Any idea?
[*] The actual question was:
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
{
// Not at global scope here
int a = 1234;
{
int a;
// how do you set this a to the value of the containing scope a ?
}
}
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
Unless the outer scope can be explicitly named you cannot do this. You can explicitly name the global scope, namespace scopes, and class scopes, but not function or block statement scopes.
C++11 [basic.scope.pdecl 3.3.2 p1 states:
The point of declaration for a name is immediately after its complete declarator (Clause 8) and before its initializer (if any), except as noted below. [ Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
MSVC correctly implements this example, however it does not correctly implement this when the initializer uses parentheses instead of assignment syntax. There's a bug filed about this on microsoft connect.
Here's an example program with incorrect behavior in VS as a result of this bug.
#include <iostream>
int foo(char) { return 0; }
int foo(int) { return 1; }
int main()
{
char x = 'a';
{
int x = foo(static_cast<decltype(x)>(0));
std::cout << "'=' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
{
int x(foo(static_cast<decltype(x)>(0)));
std::cout << "'()' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
}
C++ includes the following note.
[ Note: Operations involving indeterminate values may cause undefined behavior. —end note ]
However, this note indicates that operations may cause undefined behavior, not that they necessarily do. The above linked bug report includes an acknowledgement from Microsoft that this is a bug and not that the program triggers undefined behavior.
Edit: And now I've changed the example so that the object with indeterminate value is only 'used' in an unevaluated context, and I believe that this absolutely rules out the possibility of undefined behavior on any platform, while still demonstrating the bug in Visual Studio.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
If you want to get technical about the wording, it's pretty easy. A "temporary" has a specific meaning in C++ (see §12.2); any named variable you create is not a temporary. As such, you can just create a local variable (which is not a temporary) initialized with the correct value:
int a = 1234;
{
int b = a;
int a = b;
}
An even more defensible possibility would be to use a reference to the variable in the outer scope:
int a = 1234;
{
int &ref_a = a;
int a = ref_a;
}
This doesn't create an extra variable at all -- it just creates an alias to the variable at the outer scope. Since the alias has a different name, we retain access to the variable at the outer scope, without defining a variable (temporary or otherwise) to do so. Many references are implemented as pointers internally, but in this case (at least with a modern compiler and optimization turned on) I'd expect it not to be -- that the alias really would just be treated as a different name referring to the variable at the outer scope (and a quick test with VC++ shows that it works this way -- the generated assembly language doesn't use ref_a at all).
Another possibility along the same lines would be like this:
const int a = 10;
{
enum { a_val = a };
int a = a_val;
}
This is somewhat similar to the reference, except that in this case there's not even room for argument about whether a_val could be called a variable -- it absolutely is not a variable. The problem is that an enumeration can only be initialized with a constant expression, so we have to define the outer variable as const for it to work.
I doubt any of these is what the interviewer really intended, but all of them answer the question as stated. The first is (admittedly) a pure technicality about definitions of terms. The second might still be open to some argument (many people think of references as variables). Though it restricts the scope, there's no room for question or argument about the third.
What you are doing, initializing a variable with itself, is undefined behavior. All your test cases got it right, this is not a quirk. An implementation could also initialize a to 123456789 and it would still be standard.
Update: The comments on this answer point that initializing a variable with itself is not undefined behavior, but trying to read such variable is.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
You can't. As soon as the identical name is declared, the outer name is inaccessible for the rest of the scope. You'd need a copy or an alias of the outer variable, which means you'd need a temporary variable.
I'm surprised that, even with the warning level cranked up, VC++ doesn't complain on this line:
int a(a);
Visual C++ will sometimes warn you about hiding a variable (maybe that's only for members of derived classes). It's also usually pretty good about telling you you're using a value before it has been initialized, which is the case here.
Looking at the code generated, it happens to initialize the inner a to the same value of the outer a because that's what's left behind in a register.
I had a look at the standard, it's actually a grey area but here's my 2 cents...
3.1 Declarations and definitions [basic.def]
A declaration introduces names into a translation unit or redeclares names introduced by previous declarations.
A declaration is a definition unless... [non relevant cases follow]
3.3.1 Point of declaration
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below [self-assignment example].
A nonlocal name remains visible up to the point of declaration of the local name that hides it.
Now, if we assume that this is the point of declaration of the inner 'a' (3.3.1/1)
int a (a);
^
then the outer 'a' should be visible up to that point (3.3.1/2), where the inner 'a' is defined.
Problem is that in this case, according to 3.1/2, a declaration IS a definition. This means the inner 'a' should be created. Until then, I can't understand from the standard whether the outer 'a' is still visible or not. VS2010 assumes that it is, and all that falls within the parentheses refers to the outer scope. However clang++ and g++ treat that line as a case of self-assignment, which results in undefined behaviour.
I'm not sure which approach is correct, but I find VS2010 to be more consistent: the outer scope is still visible until the inner 'a' is fully created.
I just realised that this program compiles and runs (gcc version 4.4.5 / Ubuntu):
#include <iostream>
using namespace std;
class Test
{
public:
// copyconstructor
Test(const Test& other);
};
Test::Test(const Test& other)
{
if (this == &other)
cout << "copying myself" << endl;
else
cout << "copying something else" << endl;
}
int main(int argv, char** argc)
{
Test a(a); // compiles, runs and prints "copying myself"
Test *b = new Test(*b); // compiles, runs and prints "copying something else"
}
I wonder why on earth this even compiles. I assume that (just as in Java) arguments are evaluated before the method / constructor is called, so I suspect that this case must be covered by some "special case" in the language specification?
Questions:
Could someone explain this (preferably by referring to the specification)?
What is the rationale for allowing this?
Is it standard C++ or is it gcc-specific?
EDIT 1: I just realised that I can even write int i = i;
EDIT 2: Even with -Wall and -pedantic the compiler doesn't complain about Test a(a);.
EDIT 3: If I add a method
Test method(Test& t)
{
cout << "in some" << endl;
return t;
}
I can even do Test a(method(a)); without any warnings.
The reason this "is allowed" is because the rules say an identifiers scope starts immediately after the identifier. In the case
int i = i;
the RHS i is "after" the LHS i so i is in scope. This is not always bad:
void *p = (void*)&p; // p contains its own address
because a variable can be addressed without its value being used. In the case of the OP's copy constructor no error can be given easily, since binding a reference to a variable does not require the variable to be initialised: it is equivalent to taking the address of a variable. A legitimate constructor could be:
struct List { List *next; List(List &n) { next = &n; } };
where you see the argument is merely addressed, its value isn't used. In this case a self-reference could actually make sense: the tail of a list is given by a self-reference. Indeed, if you change the type of "next" to a reference, there's little choice since you can't easily use NULL as you might for a pointer.
As usual, the question is backwards. The question is not why an initialisation of a variable can refer to itself, the question is why it can't refer forward. [In Felix, this is possible]. In particular, for types as opposed to variables, the lack of ability to forward reference is extremely broken, since it prevents recursive types being defined other than by using incomplete types, which is enough in C, but not in C++ due to the existence of templates.
I have no idea how this relates to the specification, but this is how I see it:
When you do Test a(a); it allocates space for a on the stack. Therefore the location of a in memory is known to the compiler at the start of main. When the constructor is called (the memory is of course allocated before that), the correct this pointer is passed to it because it's known.
When you do Test *b = new Test(*b);, you need to think of it as two steps. First the object is allocated and constructed, and then the pointer to it is assigned to b. The reason you get the message you get is that you're essentially passing in an uninitialized pointer to the constructor, and the comparing it with the actual this pointer of the object (which will eventually get assigned to b, but not before the constructor exits).
The second one where you use new is actually easier to understand; what you're invoking there is exactly the same as:
Test *b;
b = new Test(*b);
and you're actually performing an invalid dereference. Try to add a << &other << to your cout lines in the constructor, and make that
Test *b = (Test *)0xFOOD1E44BADD1E5;
to see that you're passing through whatever value a pointer on the stack has been given. If not explicitly initialized, that's undefined. But even if you don't initialize it with some sort of (in)sane default, it'll be different from the return value of new, as you found out.
For the first, think of it as an in-place new. Test a is a local variable not a pointer, it lives on the stack and therefore its memory location is always well defined - this is very much unlike a pointer, Test *b which, unless explicitly initialized to some valid location, will be dangling.
If you write your first instantiation like:
Test a(*(&a));
it becomes clearer what you're invoking there.
I don't know a way to make the compiler disallow (or even warn) about this sort of self-initialization-from-nowhere through the copy constructor.
The first case is (perhaps) covered by 3.8/6:
before the lifetime of an object has
started but after the storage which
the object will occupy has been
allocated or, after the lifetime of an
object has ended and before the
storage which the object occupied is
reused or released, any lvalue which
refers to the original object may be
used but only in limited ways. Such an
lvalue refers to allocated storage
(3.7.3.2), and using the properties of
the lvalue which do not depend on its
value is well-defined.
Since all you're using of a (and other, which is bound to a) before the start of its lifetime is the address, I think you're good: read the rest of that paragraph for the detailed rules.
Beware though that 8.3.2/4 says, "A reference shall be initialized to refer to a valid object or function." There is some question (as a defect report on the standard) what "valid" means in this context, so possibly you can't bind the parameter other to the unconstructed (and hence, "invalid"?) a.
So, I'm uncertain what the standard actually says here - I can use an lvalue, but not bind it to a reference, perhaps, in which case a isn't good, while passing a pointer to a would be OK as long as it's only used in the ways permitted by 3.8/5.
In the case of b, you're using the value before it's initialized (because you dereference it, and also because even if you got that far, &other would be the value of b). This clearly is not good.
As ever in C++, it compiles because it's not a breach of language constraints, and the standard doesn't explicitly require a diagnostic. Imagine the contortions the spec would have to go through in order to mandate a diagnostic when an object is invalidly used in its own initialization, and imagine the data flow analysis that a compiler might have to do to identify complex cases (it may not even be possible at compile time, if the pointer is smuggled through an externally-defined function). Easier to leave it as undefined behavior, unless anyone has any really good suggestions for new spec language ;-)
If you crank your warning levels up, your compiler will probably warn you about using uninitialized stuff. UB doesn't require a diagnostic, many things that are "obviously" wrong may compile.
I don't know the spec reference, but I do know that accessing an uninitialized pointer always results in undefined behaviour.
When I compile your code in Visual C++ I get:
test.cpp(20): warning C4700:
uninitialized local variable 'b' used