Why special rules for `for` statement scope? - c++

I stumbled recently in this problem
for(int i=0,n=v.size(); i<n; i++) {
...
P2d n = ... <<<--- error here
}
the compiler was complaining about the fact that the n local variable has been already defined, despite that the open brace looks like it should start a new scope.
Indeed the standard has a special wording for this and while the code compiled fine with g++4.6.3, it complains with more recent versions and other compilers.
What is the rationale (if there is any) behind this special rule?
To be more clear: the standard explains that this is not permitted and I've no questions about the technical reason for which that's an error: I was just wondering why the committee decided to use special extra rules instead of just creating another nested scope when seeing the opening brace (like it happens in other places).
For example to make the code legal you can just wrap the body with two brace pairs instead of one...
Please also note that braces after for/while/if, while considered good practice, are not mandatory and not part of the syntax, but still a scope containing the loop variables exists (therefore using function definition as another example where the scope of the locals is the body of the function is not relevant: a function body is not a statement and braces are mandatory).
In the C++ syntax the body of a for is just a statement; however if this statement happens to be a braced group then it gets a special handling in for/while/if (that doesn't happen when you use a braced group as statement elsewhere in the language).
What is the reason for adding this extra complication to the language? It's apparently not needed and just treating the braces as another inner scope seems (to me) simpler.
Are there use cases in which this simpler and more regular approach doesn't work?
Note that I'm not asking opinions. Either you know why the committee took this decision (requiring also a quite elaborate wording in the standard instead of just having the body as a regular statement with the regular handling of a brace enclosed block when used as statement) or you don't.
EDIT
The "single scope" view for the syntax is for me unnatural but technically possible for the for statement that can be rationalized as a single block with a backward goto statement, but it's hard to defend in a very similar case for the if statement:
if (int x = whatever()) {
int x = 3; // Illegal
} else {
int x = 4; // Illegal here too
}
but this is instead legal
if (int x = whatever()) {
int z = foo();
} else {
int z = bar();
}
So are the condition, the then part and the else part of an if statement the same scope? No because you can declare two z variables. Are they separate scopes? No because you cannot declare x.
The only rationalization I can see is that the then and else part are indeed separate scopes, but with the added (strange) rule that the variable declared in the condition cannot be declared in the scope. Why this extra strange limitation rule is present is what I'm asking about.

int i = 0;
for (MyObject o1; i<10; i++) {
MyObject o2;
}
Can be translated from the point view of recent compilers into:
int i = 0;
{
MyObject o1;
Label0:
MyObject o2; //o2 will be destroyed and reconstructed 10 times, while being with the same scope as o1
i++;
if (i < 10)
goto Label0;
}
This is the answer to your last question mark at the end, they didn't add something complicated, just used goto to label in the same scope, and not goto to out of the scope and then enter to it again. I don't see clear reason why it's better. (While it will do some incompatibility with older codes)

The semantics are not special for the for loop! if (bool b = foo()) { } works the same. The odd one out is really a { } block on its own. That would be rather useless if it didn't introduce a new scope. So the apparent inconsistency is due to a misplaced generalization from an exceptional case.
[edit]
An alternative view would be to consider an hypothetical, optional keyword:
// Not a _conditional_ statement theoretically, but grammatically identical
always()
{
Foo();
}
This unifies the rules, and you wouldn't expect three scope (inside, intermediate,outside) here either.
[edit 2] (please don't make this a moving target to answer)
You wonder about lifetime and scopes (two different things) in
int i = 0;
for (MyObject o1; i<10; i++) {
MyObject o2;
}
Let's generalize that:
MyObject o2; // Outer scope
int i = 0;
for (MyObject o1; i<o1.fooCount(); i++) {
std::cout << o2.asString();
MyObject o2;
}
Clearly the call to o2.asString() refers to the outer o2, in all iterations. It's not like the inner o2 survives the loop iteration. Name lookup doesn't will use names from the outer scope when the names aren't yet defined in the inner scope - and "not yet defined" is a compile-time thing. The repeated construction and destruction of the inner o2 is a runtime thing.

Look at it this way:
A pair of braces allows you to hide variables visible inside an enclosing pair of braces (or globally):
void foo(int n)
{
// the containing block
for (int i = 0; i < n; ++i)
{
int n = 5; // allowed: n is visible inside the containing { }
int i = 5; // not allowed: i is NOT visible inside the containing { }
}
}
If you think about it this way you realize there are no special rules here.

The brackets ({}) deliminate a section of code as a block. Everything in this block is within it's own local scope:
int main(int argc, char** argv)
{
int a = 5;
std::cout<<a<<std::endl // 5
{
int a = 10;
std::cout<<a<<std::endl //10
}
std::cout<<a<<std::endl // 5
}
But wait, there is something else in that code...
int main(int argc, char** argv)
{
}
This is similar to the structure of a for loop:
for (int i = 0 ; i < 5; i++)
{
}
The function definition has code outside the {...} block too!
in this case, argc and argv are defined, and they are local to the scope of the function just like the definition of i in the above for loop.
In fact you can generalise the syntax to:
definition { expression }
Where the entirety of the above is within the scope.
In this case, the 'raw' brackets ({}) form the same structure but with an empty definition statement.
edit:
to answer your edit, in:
int i = 0;
for (MyObject o1; i<10; i++) {
MyObject o2;
}
the constructor for o2 is looped over for each loop, while the the constructor for o1 isn't.
for loop behavior goes as follows (where XXX is the current block being executed:
init
for(XXX; ; ){ }
test loop exp
for( ;XXX; ){ }
execute block
for( ; ; ){XXX}
final operation
for( ; ;XXX){ }
Back to 2.

As there is there was the c tag I would answer from that perspective. Here is an example:
#include <stdio.h>
int main(void) {
int a[] = {1, 2, 3, 4, 5, 6, 7, 8};
for (int i = 0, n = 8; i < n; i++) {
int n = 100;
printf("%d %d\n", n, a[i]);
}
return 0;
}
It compiles without issues, see it working at ideone (C99 strict mode, 4.8.1).
C standard is clear that both scopes are considered as separate, N1570 6.8.5/p5 (emphasis mine):
An iteration statement is a block whose scope is a strict subset of
the scope of its enclosing block. The loop body is also a block whose
scope is a strict subset of the scope of the iteration statement.
There is a warning, but only with the -Wshadow option, as expected:
$ gcc -std=c99 -pedantic -Wall -Wextra -Wshadow check.c
check.c: In function ‘main’:
check.c:7: warning: declaration of ‘n’ shadows a previous local
check.c:6: warning: shadowed declaration is here

The loop control variables (i and n in this case) are considered part of the for loop.
And since they are already declared in the loop's initialization statement, most attempts (other than re-defining by using nested braces) to re-define them within the loop results in an error!

I cannot tell you why there is just one scope opened by the for loop, not a second one due to the braces. But I can say what was given back then as the reason for changing where that single scope is: Locality. Take this kind of pretty standard code:
void foo(int n) {
int s=0;
for (int i=0; i<n; ++i) {
s += global[i];
}
// ... more code ...
for (int i=0; i<n; ++i) {
global[i]--;
}
}
Under the old rules, that would have been illegal code, defining i twice in the same scope, the function. (In C back then, it was even illegal because you had to declare variables at the beginning of the block.)
That usually meant you’d leave out the declaration in the second loop – and run into problems if the code with the first loop was removed. And whatever you did, you had variables with a long time to live, which as usual makes reasoning about your code harder. (That was before everyone and their brother started to consider ten lines a long function.) Changing for to start its own scope before the variable declaration here made code much easier to maintain.

You problem is that the definition part of the for is considered inside the scope of the for.
// V one definition
for(int i=0,n=v.size(); i<n; i++) {
...
// V second definition
P2d n = ... <<<--- error here
}

Related

What happens if the initialization part of a for-loop is missing?

While trying to understand the code below I've encountered the following question(s): What happens, as shown in the example below, if the initialization part of a for loop is missing?
using namespace std;
void print(int a) {
int b;
for (; a ; a = a/10)
for (b = a+2; b>a; b=b-1)
cout << b;
}
int main()
{
int a{10};
print(a);
cout << a;
return 0;
}
Each of the three statements making up the for control loop are optional. This came from C and makes the for loop structure remarkably powerful.
If the initialisation part is missing then nothing is initialised.
In your particular case a is passed to the function (and therefore needs to be initialised before it's passed), so it makes no sense for the for loop to perform further initialisation.
Note that if the condition check statement if missing then you need to control the loop exit condition yourself in the loop body. In your program that statement is the expression a, in other words your loop terminates when a is zero.

Variables names ambiguity in C++

I'm a bit confused at this situation:
#include <iostream>
void function(int origin)
{
if (origin < 0)
{
double origin = 0.3;
std::cout << origin << std::endl;
}
}
int main()
{
function(-4);
}
where it gets compiled and run successfully using VS2013 under v120 ToolSet. Isn't it wrong C++? 'Cause doing the same but just in the beginning of function it gives a compile-time error.
This is legal according to C++ standard, section 3.3.3.1:
A name declared in a block is local to that block; it has block scope. Its potential scope begins at its point of declaration and ends at the end of its block. A variable declared at block scope is a local variable.
Such redeclaration hides the origin parameter.
Cause doing the same but just in the beginning of function it gives a compile-time error.
You get an error because C++ standard explicitly disallows such redeclaration in section 3.3.3.2:
A parameter name shall not be redeclared in the outermost block of the function definition nor in the outermost block of any handler associated with a function-try-block.
The reason for this exclusion is that function parameters are local to the outer scope of the function, so having a redeclaration without another layer of braces would introduce a duplicate identifier into the same scope.
I don't mind this C++ behaviour. Sure, it can lead to bugs/oversights as you have demonstrated. But you can do this in C++
for (int i = 0; i < 4; ++i) {
for (int i = 0; i < 5; ++i) {
cout << i;
}
cout << i;
}
cout << endl;
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 5; ++j) {
cout << j;
}
cout << i;
}
and the results are identical because i is redefined in the scope of the inner for loop to be a different variable.
In other languages like C# you can't do this. It will tell you have tried to redeclare a variable of the same name in an inner scope.
I find this over-protective. When I'm cutting and pasting code with loops, it is irritating to have to redeclare i, which we all tend to use as the loop variable, to be i1, i2 etc. I invariably miss one, with cut-and-paste code, so I'm using arr[i] in an i3 loop, when I meant arr[i3] (whoops).
In production code, I agree that defensive coding means you should use different names for loop variables in the same function.
But it's nice to be able to reuse variable names in nested for loops when you're experimenting. C++ gives you that choice.
Isn't it wrong C++?
No. It's perfectly legal to redeclare an identifier, as long as it's in a different scope. In this case, the scope is the then-body of the if-statement.
It isn't ambiguous. The nearest preceding declaration will be used.
No, it is not wrong. It is a perfectly valid behavior according to the standard.
void function(int origin)
{
if (origin < 0) // scope of the first 'origin'
{
double origin = 0.3; // scope of the second 'origin' begins
// scope of the first 'origin' is interrupted
std::cout << origin << std::endl;
} //block ends, scope of the second 'origin' ends
//scope of the first 'origin' resumes
}
As tupe_cat said it is always valid to redeclare if their scopes vary. In such cases variable belonging to the inner scope will gain over the outer scope.

Using declaration in condition part of FOR-loop statement

Reading http://en.cppreference.com/w/cpp/language/for I find, that condition part of for loop can be either expression, contextually convertible to bool, or single-variable-declaration with mandatory brace-or-equal initializer (syntax specification in 6.4/1):
condition:
expression
type-specifier-seq declarator = assignment-expression
But I have never seen a using of the later in a source code.
What is the profitable (in sense of brevity, expressiveness, readability) making use of variable declaration in condition part of for loop statement ?
for (int i = 0; bool b = i < 5; ++i) {
// `b` is always convertible to `true` until the end of its scope
} // scope of `i` and `b` ends here
Variable declared in condition can be only convertible to true during whole period of lifetime (scope) if there no side effects influencing on result of convertion to bool.
I can imagine only a couple of use cases:
declaration a variable of class type, having user-defined operator bool.
some kind of changing cv-ref-qualifiers of loop variable:
for (int i = 5; int const & j = i; --i) {
// using of j
}
But both of them are very artifical.
All three statements if, for and while can be used in a similar way. Why is this useful? Sometimes it just is. Consider this:
Record * GetNextRecord(); // returns null when no more records
Record * GetNextRecordEx(int *); // as above, but also store signal number
// Process one
if (Record * r = GetNextRecord()) { process(*r); }
// Process all
while (Record * r = GetNextRecord()) { process(*r); }
// Process all and also keep some loop-local state
for (int n; Record * r = GetNextRecordEx(&n); )
{
process(*r);
notify(n);
}
These statements keep all the variables they need at the minimal possible scope. If the declaration form wasn't allowed inside the statement, you would need to declare a variable outside the statement but you would only need it for the duration of the statement. That means you would either leak into too large a scope, or you would need unsightly extra scopes. Allowing the declaration inside the statement offers a convenient syntax, which although rarely useful is very nice to have when it is useful.
Perhaps the most common-place use case is in a multiple-dispatch cast situation like this:
if (Der1 * p = dynamic_cast<Der1 *>(target)) visit(*p);
else if (Der2 * p = dynamic_cast<Der2 *>(target)) visit(*p);
else if (Der3 * p = dynamic_cast<Der3 *>(target)) visit(*p);
else throw BadDispatch();
As an aside: only the if statement admits a code path for the case where the condition is false, given in an optional else block. Neither while nor for allow you to consume the result of the boolean check in this way, i.e. there is no for ... else or while ... else construction in the language.

Why can't we declare two variable in a for loop?

The below code generates an error when i run it but if i declare at-least one variable outside the loop the code works fine.Why can't i declare both the variables in the loop itself?
Error:
#include<iostream>
#include<conio.h>
using namespace std ;
int main()
{
for(int j=0,int i=0;i<4&&j<2;i++,j++)
{
cout<<"Hello"<<endl ;
}
getch() ;
return 0 ;
}
Works Fine:
#include<iostream>
#include<conio.h>
using namespace std ;
int main()
{
int i ;
for(int j=0,i=0;i<4&&j<2;i++,j++)
{
cout<<"Hello"<<endl ;
}
getch() ;
return 0 ;
}
You can, but the notation for declaring two variables in a single declaration is like this:
int j=0, i=0;
with no second int.
(This is actually what your second version is doing; you might think it's assigning the already-declared i, but actually it's declaring a new one, whose scope is the for-loop.)
Because that's how the Standard defines the syntax. There's nothing "wrong" in particular with the idea, but apparently it was decided that you can only have one declaration in the initialization part.
If you want to declare multiple variables, use a comma to enumerate them (but this way, you can only declare variables of the same type):
for (int i = 0, j = 10; i < 10; i++, j--)
However, I'm not sure you should be doing this. After a certain point, this evolves into an unreadable mess.
Note that the given answers "only" handles making multiple variables of the same type.
If, for some bizarre reason, you would need to do multiple types, this is valid (though awful):
for(struct {int a; double b} loop = {0, 1.5}; loop.a < loop.b; loop.a++)
{
// Awful hacks
}

In C++ why can't I write a for() loop like this: for( int i = 1, double i2 = 0;

or, "Declaring multiple variables in a for loop ist verboten" ?!
My original code was
for( int i = 1, int i2 = 1;
i2 < mid;
i++, i2 = i * i ) {
I wanted to loop through the first so-many squares, and wanted both the number and its square, and the stop condition depended on the square. This code seems to be the cleanest expression of intent, but it's invalid. I can think of a dozen ways to work around this, so I'm not looking for the best alternative, but for a deeper understanding of why this is invalid. A bit of language lawyering, if you will.
I'm old enough to remember when you had to declare all your variables at the start of the function, so I appreciate
the
for( int i = 0; ....
syntax. Reading around it looks like you can only have one type declaration in the first section of a for() statement. So you can do
for( int i=0, j=0; ...
or even the slightly baroque
for( int i=0, *j=&i; ...
but not the to-me-sensible
for( int i=0, double x=0.0; ...
Does anyone know why? Is this a limitation of for()? Or a restriction on comma lists, like "the first element of a comma list may declare a type, but not the other? Are the following uses of commas distinct syntactical elements of C++?
(A)
for( int i=0, j=0; ...
(B)
int i = 0, j = 0;
(C)
int z;
z = 1, 3, 4;
Any gurus out there?
====================================================
Based on the good responses I've gotten, I think I can sharpen the question:
In a for statement
for( X; Y; Z;) {..... }
what are X, Y and Z?
My question was about C++, but I don't have a great C++ refrence. In my C reference (Harbison and Steele 4th ed, 1995), they are all three expressions, and my gcc requires C99 mode to use for( int i = 0;
In Stroustrup, sec 6.3, the for statement syntax is given as
for( for-init-statement; condition; expression ) statements
So C++ has a special syntactic statement dedicated to the first clause in for(), and we can assume they have special rules beyond those for an expression. Does this sound valid?
If you need to use several variables of different type in for-loop then you could use structures as follows:
for( struct {int i; long i2;} x = {1, 1}; x.i2 < mid; x.i++, x.i2 = x.i * x.i )
{
cout << x.i2 << endl;
}
so this is not a limitation, just use a little different syntax.
int i = 1, double i2 = 0; is not a valid declaration statement, so it cannot be used inside the for statement. If the statement can't stand alone outside the for, then it can't be used inside the for statement.
Edit:
Regarding your questions about comma operators, options 'A' and 'B' are identical and are both valid. Option 'C' is also valid, but will probably not do what you would expect. z will be assigned 1, and the statements 3 and 4 don't actually do anything (your compiler will probably warn you about "statements with no effect" and optimize them away).
Update:
To address the questions in your edit, here is how the C++ spec (Sec 6.5) defines for:
for ( for-init-statement condition(opt) ; expression(opt) ) statement
It further defines for-init-statement as either expression-statement or simple-declaration. Both condition and expression are optional.
The for-init-statement can be anything that is a valid expression-statement (such as i = 0;) or simple-declaration (such as int i = 0;). The statement int i = 1, double i2 = 0; is not a valid simple-declaration according to the spec, so it is not valid to use with for. For reference, a simple-declaration is defined (in Section 7) as:
attribute-specifier(opt) decl-specifier-seq(opt) init-declarator-list(opt) ;
where decl-specifier-seq would be the data type plus keywords like static or extern and init-declarator-list would be a comma-separated list of declarators and their optional initializers. Attempting to put more than one data type in the same simple-declaration essentially places a decl-specifier-seq where the compiler expects a init-declarator-list. Seeing this element out of place causes the compiler to treat the line as ill-formed.
The spec also notes that the for loop is equivalent to:
{
for-init-statement
while ( condition ) {
statement
expression ;
}
}
where condition defaults to "true" if it is omitted. Thinking about this "expanded" form may be helpful in determining whether a given syntax may be used with a for loop.
It's actually a limitation of declaration statements:
int i=0, j=0, *k=&i; // legal
int i=0, double x=0.0; // illegel
So, basically, the answer to your final question is: (A) & (B) are the same. (C) is different.
As bta points out:
z = 1,3,4;
is the same as
z = 1;
However, that is because = has a higher precedence than ,. If it were written as:
z = (1,3,4);
then that would be the same as:
z = 4;
As long as you can write a valid statement with the comma , operator, it's acceptable.
C++ (also C and Java) do not permit the declaration of more than one type of variables in the scope of a for loop. In the grammar, that is because the comma does not start a new statement in that context. Effectively, only one declaration is allowed inside the for(;;) statement. The rationale is because that requirement is fairly unusual, and you can get it only with a slightly more verbose construct.
Well, I did some more googling, and I think the answer for C++ is "for() statements are very special places" Ick.
Excerpting from an ISO spec:
for ( for-init-statement conditionopt ; expressionopt ) statement
where
for-init-statement:
expression-statement
simple-declaration
and they have to specify that
[Note: a for-init-statement ends with a semicolon. ]
So the C++ syntax spec. is specifically hacked so that only one decl-spec (i.e. type) is allowed in the first slot. Looks like our attempts to argue from basic principles were doomed. Thanks for all the responses.
I can see why you hope that would work, but---given that even using the rather simple minded teaching tool that
for (i=0; i<max; i++){
...
}
is equivalent to
i=0;
while (i<max){
...
i++;
}
you syntax doesn't work---I can't see why you expected that it would. EAch of the bits need to be valid.