how is an "incomplete" initializer list parsed? [duplicate] - c++

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?

It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.

It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};

Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.

I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.

Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.

I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.

Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.

I see one use case that was not mentioned in other answers,
our favorite Macros:
int a [] = {
#ifdef A
1, //this can be last if B and C is undefined
#endif
#ifdef B
2,
#endif
#ifdef C
3,
#endif
};
Adding macros to handle last , would be big pain. With this small change in syntax this is trivial to manage. And this is more important than machine generated code because is usually lot of easier to do it in Turing complete langue than very limited preprocesor.

One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.

It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")

The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.

It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.

The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error

It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?

This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.

Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense

In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.

It makes generating code easier as you only need to add one line and don't need to treat adding the last entry as if it's a special case. This is especially true when using macros to generate code. There's a push to try to eliminate the need for macros from the language, but a lot of the language did evolve hand in hand with macros being available. The extra comma allows macros such as the following to be defined and used:
#define LIST_BEGIN int a[] = {
#define LIST_ENTRY(x) x,
#define LIST_END };
Usage:
LIST_BEGIN
LIST_ENTRY(1)
LIST_ENTRY(2)
LIST_END
That's a very simplified example, but often this pattern is used by macros for defining things such as dispatch, message, event or translation maps and tables. If a comma wasn't allowed at the end, we'd need a special:
#define LIST_LAST_ENTRY(x) x
and that would be very awkward to use.

So that when two people add a new item in a list on separate branches, Git can properly merge the changes, because Git works on a line basis.

It makes editing the code a lot easier.
I'm comparing editinc c/c++ array elements with editing json documents - if you forget to remove the last comma, the JSON will not parse. (Yes, I know JSON is not meant to be edited manually)

If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

Related

why does C++ allow a declaration with no space between the type and a parenthesized variable name? [duplicate]

This question already has an answer here:
Is white space considered a token in C like languages?
(1 answer)
Closed 8 months ago.
A previous C++ question asked why int (x) = 0; is allowed. However, I noticed that even int(x) = 0; is allowed, i.e. without a space before the (x). I find the latter quite strange, because it causes things like this:
using Oit = std::ostream_iterator<int>;
Oit bar(std::cout);
*bar = 6; // * is optional
*Oit(bar) = 7; // * is NOT optional!
where the final line is because omitting the * makes the compiler think we are declaring bar again and initializing to 7.
Am I interpreting this correctly, that int(x) = 0; is indeed equivalent to int x = 0, and Oit(bar) = 7; is indeed equivalent to Oit bar = 7;? If yes, why specifically does C++ allow omitting the space before the parentheses in such a declaration + initialization?
(my guess is because the C++ compiler does not care about any space before a left paren, since it treats that parenthesized expression as it's own "token" [excuse me if I'm butchering the terminology], i.e. in all cases, qux(baz) is equivalent to qux (baz))
It is allowed in C++ because it is allowed in C and requiring the space would be an unnecessary C-compatibility breaking change. Even setting that aside, it would be surprising to have int (x) and int(x) behave differently, since generally (with few minor exceptions) C++ is agnostic to additional white-space as long as tokens are properly separated. And ( (outside a string/character literal) is always a token on its own. It can't be part of a token starting with int(.
In C int(x) has no other potential meaning for which it could be confused, so there is no reason to require white-space separation at all. C also is generally agnostic to white-space, so it would be surprising there as well to have different behavior with and without it.
One requirement when defining the syntax of a language is that elements of the language can be separated. According to the C++ syntax rules, a space separates things. But also according to the C++ syntax rules, parentheses also separate things.
When C++ is compiled, the first step is the parsing. And one of the first steps of the parsing is separating all the elements of the language. Often this step is called tokenizing or lexing. But this is just the technical background. The user does not have to know this. He or she only has to know that things in C++ must be clearly separted from each others, so that there is a sequence "*", "Oit", "(", "bar", ")", "=", "7", ";".
As explained, the rule that the parenthesis always separates is established on a very low level of the compiler. The compiler determines even before knowing what the purpose of the parenthesis is, that a parenthesis separates things. And therefore an extra space would be redundant.
When you ever use parser generators, you will see that most of them just ignore spaces. That means, when the lexer has produced the list of tokens, the spaces do not exist any more. See above in the list. There are no spaces any more. So you have no chance to specify something that explicitly requires a space.

Comma after last element in C++ array [duplicate]

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
I see one use case that was not mentioned in other answers,
our favorite Macros:
int a [] = {
#ifdef A
1, //this can be last if B and C is undefined
#endif
#ifdef B
2,
#endif
#ifdef C
3,
#endif
};
Adding macros to handle last , would be big pain. With this small change in syntax this is trivial to manage. And this is more important than machine generated code because is usually lot of easier to do it in Turing complete langue than very limited preprocesor.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
It makes generating code easier as you only need to add one line and don't need to treat adding the last entry as if it's a special case. This is especially true when using macros to generate code. There's a push to try to eliminate the need for macros from the language, but a lot of the language did evolve hand in hand with macros being available. The extra comma allows macros such as the following to be defined and used:
#define LIST_BEGIN int a[] = {
#define LIST_ENTRY(x) x,
#define LIST_END };
Usage:
LIST_BEGIN
LIST_ENTRY(1)
LIST_ENTRY(2)
LIST_END
That's a very simplified example, but often this pattern is used by macros for defining things such as dispatch, message, event or translation maps and tables. If a comma wasn't allowed at the end, we'd need a special:
#define LIST_LAST_ENTRY(x) x
and that would be very awkward to use.
So that when two people add a new item in a list on separate branches, Git can properly merge the changes, because Git works on a line basis.
It makes editing the code a lot easier.
I'm comparing editinc c/c++ array elements with editing json documents - if you forget to remove the last comma, the JSON will not parse. (Yes, I know JSON is not meant to be edited manually)
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

Why initializer list with dangle comma is allowed in C++11? [duplicate]

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
I see one use case that was not mentioned in other answers,
our favorite Macros:
int a [] = {
#ifdef A
1, //this can be last if B and C is undefined
#endif
#ifdef B
2,
#endif
#ifdef C
3,
#endif
};
Adding macros to handle last , would be big pain. With this small change in syntax this is trivial to manage. And this is more important than machine generated code because is usually lot of easier to do it in Turing complete langue than very limited preprocesor.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
It makes generating code easier as you only need to add one line and don't need to treat adding the last entry as if it's a special case. This is especially true when using macros to generate code. There's a push to try to eliminate the need for macros from the language, but a lot of the language did evolve hand in hand with macros being available. The extra comma allows macros such as the following to be defined and used:
#define LIST_BEGIN int a[] = {
#define LIST_ENTRY(x) x,
#define LIST_END };
Usage:
LIST_BEGIN
LIST_ENTRY(1)
LIST_ENTRY(2)
LIST_END
That's a very simplified example, but often this pattern is used by macros for defining things such as dispatch, message, event or translation maps and tables. If a comma wasn't allowed at the end, we'd need a special:
#define LIST_LAST_ENTRY(x) x
and that would be very awkward to use.
So that when two people add a new item in a list on separate branches, Git can properly merge the changes, because Git works on a line basis.
It makes editing the code a lot easier.
I'm comparing editinc c/c++ array elements with editing json documents - if you forget to remove the last comma, the JSON will not parse. (Yes, I know JSON is not meant to be edited manually)
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

int a[] = {1,2,}; Why is a trailing comma in an initializer-list allowed?

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:
int a[] = {1,2,}; //extra comma in the end
But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.
Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.
So, again, is there any particular reason this redundant comma is explicitly allowed?
It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:
int a[] = {
1,
2,
3
};
... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.
Now think about generating code. Something like (pseudo-code):
output("int a[] = {");
for (int i = 0; i < items.length; i++) {
output("%s, ", items[i]);
}
output("};");
No need to worry about whether the current item you're writing out is the first or the last. Much simpler.
It's useful if you do something like this:
int a[] = {
1,
2,
3, //You can delete this line and it's still valid
};
Ease of use for the developer, I would think.
int a[] = {
1,
2,
2,
2,
2,
2, /*line I could comment out easily without having to remove the previous comma*/
}
Additionally, if for whatever reason you had a tool that generated code for you; the tool doesn't have to care about whether it's the last item in the initialize or not.
I've always assumed it makes it easier to append extra elements:
int a[] = {
5,
6,
};
simply becomes:
int a[] = {
5,
6,
7,
};
at a later date.
Everything everyone is saying about the ease of adding/removing/generating lines is correct, but the real place this syntax shines is when merging source files together. Imagine you've got this array:
int ints[] = {
3,
9
};
And assume you've checked this code into a repository.
Then your buddy edits it, adding to the end:
int ints[] = {
3,
9,
12
};
And you simultaneously edit it, adding to the beginning:
int ints[] = {
1,
3,
9
};
Semantically these sorts of operations (adding to the beginning, adding to the end) should be entirely merge safe and your versioning software (hopefully git) should be able to automerge. Sadly, this isn't the case because your version has no comma after the 9 and your buddy's does. Whereas, if the original version had the trailing 9, they would have automerged.
So, my rule of thumb is: use the trailing comma if the list spans multiple lines, don't use it if the list is on a single line.
I am surprised after all this time no one has quoted the Annotated C++ Reference Manual(ARM), it says the following about [dcl.init] with emphasis mine:
There are clearly too many notations for initializations, but each seems to serve a particular style of use well. The ={initializer_list,opt} notation was inherited from C and serves well for the initialization of data structures and arrays. [...]
although the grammar has evolved since ARM was written the origin remains.
and we can go to the C99 rationale to see why this was allowed in C and it says:
K&R allows a trailing comma in an initializer at the end of an
initializer-list. The Standard has retained this syntax, since it
provides flexibility in adding or deleting members from an initializer
list, and simplifies machine generation of such lists.
Trailing comma I believe is allowed for backward compatibility reasons. There is a lot of existing code, primarily auto-generated, which puts a trailing comma. It makes it easier to write a loop without special condition at the end.
e.g.
for_each(my_inits.begin(), my_inits.end(),
[](const std::string& value) { std::cout << value << ",\n"; });
There isn't really any advantage for the programmer.
P.S. Though it is easier to autogenerate the code this way, I actually always took care not to put the trailing comma, the efforts are minimal, readability is improved, and that's more important. You write code once, you read it many times.
I see one use case that was not mentioned in other answers,
our favorite Macros:
int a [] = {
#ifdef A
1, //this can be last if B and C is undefined
#endif
#ifdef B
2,
#endif
#ifdef C
3,
#endif
};
Adding macros to handle last , would be big pain. With this small change in syntax this is trivial to manage. And this is more important than machine generated code because is usually lot of easier to do it in Turing complete langue than very limited preprocesor.
One of the reasons this is allowed as far as I know is that it should be simple to automatically generate code; you don't need any special handling for the last element.
It makes code generators that spit out arrays or enumerations easier.
Imagine:
std::cout << "enum Items {\n";
for(Items::iterator i(items.begin()), j(items.end); i != j; ++i)
std::cout << *i << ",\n";
std::cout << "};\n";
I.e., no need to do special handling of the first or last item to avoid spitting the trailing comma.
If the code generator is written in Python, for example, it is easy to avoid spitting the trailing comma by using str.join() function:
print("enum Items {")
print(",\n".join(items))
print("}")
The reason is trivial: ease of adding/removing lines.
Imagine the following code:
int a[] = {
1,
2,
//3, // - not needed any more
};
Now, you can easily add/remove items to the list without having to add/remove the trailing comma sometimes.
In contrast to other answers, I don't really think that ease of generating the list is a valid reason: after all, it's trivial for the code to special-case the last (or first) line. Code-generators are written once and used many times.
It allows every line to follow the same form. Firstly this makes it easier to add new rows and have a version control system track the change meaningfully and it also allows you to analyze the code more easily. I can't think of a technical reason.
The only language where it's - in practice* - not allowed is Javascript, and it causes an innumerable amount of problems. For example if you copy & paste a line from the middle of the array, paste it at the end, and forgot to remove the comma then your site will be totally broken for your IE visitors.
*In theory it is allowed but Internet Explorer doesn't follow the standard and treats it as an error
It's easier for machines, i.e. parsing and generation of code.
It's also easier for humans, i.e. modification, commenting-out, and visual-elegance via consistency.
Assuming C, would you write the following?
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
puts("Line 1");
puts("Line 2");
puts("Line 3");
return EXIT_SUCCESS
}
No. Not only because the final statement is an error, but also because it's inconsistent. So why do the same to collections? Even in languages that allow you to omit last semicolons and commas, the community usually doesn't like it. The Perl community, for example, doesn't seem to like omitting semicolons, bar one-liners. They apply that to commas too.
Don't omit commas in multiline collections for the same reason you don't ommit semicolons for multiline blocks of code. I mean, you wouldn't do it even if the language allowed it, right? Right?
This is allowed to protect from mistakes caused by moving elements around in a long list.
For example, let's assume we have a code looking like this.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Super User",
"Server Fault"
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
And it's great, as it shows the original trilogy of Stack Exchange sites.
Stack Overflow
Super User
Server Fault
But there is one problem with it. You see, the footer on this website shows Server Fault before Super User. Better fix that before anyone notices.
#include <iostream>
#include <string>
#include <cstddef>
#define ARRAY_SIZE(array) (sizeof(array) / sizeof *(array))
int main() {
std::string messages[] = {
"Stack Overflow",
"Server Fault"
"Super User",
};
size_t i;
for (i = 0; i < ARRAY_SIZE(messages); i++) {
std::cout << messages[i] << std::endl;
}
}
After all, moving lines around couldn't be that hard, could it be?
Stack Overflow
Server FaultSuper User
I know, there is no website called "Server FaultSuper User", but our compiler claims it exists. Now, the issue is that C has a string concatenation feature, which allows you to write two double quoted strings and concatenate them using nothing (similar issue can also happen with integers, as - sign has multiple meanings).
Now what if the original array had an useless comma at end? Well, the lines would be moved around, but such bug wouldn't have happened. It's easy to miss something as small as a comma. If you remember to put a comma after every array element, such bug just cannot happen. You wouldn't want to waste four hours debugging something, until you would find the comma is the cause of your problems.
Like many things, the trailing comma in an array initializer is one of the things C++ inherited from C (and will have to support for ever). A view totally different from those placed here is mentioned in the book "Deep C secrets".
Therein after an example with more than one "comma paradoxes" :
char *available_resources[] = {
"color monitor" ,
"big disk" ,
"Cray" /* whoa! no comma! */
"on-line drawing routines",
"mouse" ,
"keyboard" ,
"power cables" , /* and what's this extra comma? */
};
we read :
...that trailing comma after the final initializer is not a typo, but a blip in the syntax carried over from aboriginal C. Its presence or absence is allowed but has no significance. The justification claimed in the ANSI C rationale is that it makes automated generation of C easier. The claim would be more credible if trailing commas were permitted in every comma-sepa-rated list, such as in enum declarations, or multiple variable declarators in a single declaration. They are not.
... to me this makes more sense
In addition to code generation and editing ease, if you want to implement a parser, this type of grammar is simpler and easier to implement. C# follows this rule in several places that there's a list of comma-separated items, like items in an enum definition.
It makes generating code easier as you only need to add one line and don't need to treat adding the last entry as if it's a special case. This is especially true when using macros to generate code. There's a push to try to eliminate the need for macros from the language, but a lot of the language did evolve hand in hand with macros being available. The extra comma allows macros such as the following to be defined and used:
#define LIST_BEGIN int a[] = {
#define LIST_ENTRY(x) x,
#define LIST_END };
Usage:
LIST_BEGIN
LIST_ENTRY(1)
LIST_ENTRY(2)
LIST_END
That's a very simplified example, but often this pattern is used by macros for defining things such as dispatch, message, event or translation maps and tables. If a comma wasn't allowed at the end, we'd need a special:
#define LIST_LAST_ENTRY(x) x
and that would be very awkward to use.
So that when two people add a new item in a list on separate branches, Git can properly merge the changes, because Git works on a line basis.
It makes editing the code a lot easier.
I'm comparing editinc c/c++ array elements with editing json documents - if you forget to remove the last comma, the JSON will not parse. (Yes, I know JSON is not meant to be edited manually)
If you use an array without specified length,VC++6.0 can automaticly identify its length,so if you use "int a[]={1,2,};"the length of a is 3,but the last one hasn't been initialized,you can use "cout<

Semicolon in C++?

Is the "missing semicolon" error really required? Why not treat it as a warning?
When I compile this code
int f = 1
int h=2;
the compiler intelligently tells me that where I am missing it. But to me it's like - "If you know it, just treat it as if it's there and go ahead. (Later I can fix the warning.)
int sdf = 1, df=2;
sdf=1 df =2
Even for this code, it behaves the same. That is, even if multiple statements (without ;) are in the same line, the compiler knows.
So, why not just remove this requirement? Why not behave like Python, Visual Basic, etc.
Summary of discussion
Two examples/instances were missing, and a semi-colon would actually cause a problem.
1.
return
(a+b)
This was presented as one of the worst aspects of JavaScript. But, in this scenario, semicolon insertion is a problem for JavaScript, but not
for C++. In C++, you will get another error if ; insertion is done after return. That is, a missing return value.
2
int *y;
int f = 1
*y = 2;
For this I guess, there is no better way than to introduce as statement separator, that is, a semicolon.
It's very good that the C++ compiler doesn't do this. One of the worst aspects of JavaScript is the semicolon insertion. Picture this:
return
(a + b);
The C++ compiler will happily continue on the next line as expected, while a language that "inserts" semicolons, like JavaScript, will treat it as "return;" and miss out the "(a + b);".
Instead of rely on compiler error-fixing, make it a habit to use semicolons.
There are many cases where a semicolon is needed.
What if you had:
int *y;
int f = 1
*y = 2;
This would be parsed as
int *y;
int f = 1 * y = 2;
So without the semicolons it is ambiguous.
First, this is only a small example; are you sure the compiler can intelligently tell you what's wrong for more complex code? For any piece of code? Could all compilers intelligently recognize this in the same way, so that a piece of C++ code could be guaranteed portable with missing semicolons?
Second, C++ was created more than a decade ago when computing resources aren't nearly what they are now. Even today, builds can take a considerable amount of time. Semicolons help to clearly demarcate different commands (for the user and for the compiler!) and assist both the programmer and the compiler in understanding what's happening.
; is for the programmer's convenience. If the line of code is very long then we can press enter and go to second line because we have ; for line separator. It is programming conventions. There must be a line separator.
Having semi-colons (or line breaks, pick one) makes the compiler vastly simpler and error messages more readable.
But contrary to what other people have said, neither form of delimiters (as an absolute) is strictly necessary.
Consider, for example, Haskell, which doesn’t have either. Even the current version of VB allows line breaks in many places inside a statement, as does Python. Neither requires line continuations in many places.
For example, VB now allows the following code:
Dim result = From element in collection
Where element < threshold
Select element
No statement delimiters, no line continuations, and yet no ambiguities whatsoever.
Theoretically, this could be driven much further. All ambiguities can be eliminated (again, look at Haskell) by introducing some rules. But again, this makes the parser much more complicated (it has to be context sensitive in a lot of places, e.g. your return example, which cannot be resolved without first knowing the return type of the function). And again, it makes it much harder to output meaningful diagnostics since an erroneous line break could mean any of several things so the compiler cannot know which error the user has made, and not even where the error was made.
In C programs semicolons are statement terminators, not separators. You might want to read this fun article.
+1 to you both.
The semi-colon is a command line delimiter, unlike VB, python etc. C and C++ ignore white space within lines of code including carriage returns! This was originally because at inception of C computer monitors could only cope with 80 characters of text and as C++ is based on the C specification it followed suit.
I could post up the question "Why must I keep getting errors about missing \ characters in VB when I try and write code over several lines, surely if VB knows of the problem it can insert it?"
Auto insertion as has already been pointed out could be a nightmare, especially on code that wraps onto a second line.
I won't extend much of the need for semi-colon vs line continuation characters, both have advantages and disadvantages and in the end it's a simple language design choice (even though it affects all the users).
I am more worried about the suggestion for the compiler to fix the code.
If you have ever seen a marvelous tool (such as... hum let's pick up a merge tool) and the way it does its automated work, you would be very glad indeed that the compiler did not modify the code. Ultimately if the compiler knew how to fix the code, then it would mean it knew your intent, and thought transmission has not been implemented yet.
As for the warning ? Any programmer worth its salt knows that warnings should be treated as errors (and compilation stopped) so what would be the advantage ?
int sdf = 1,df=2;
sdf=1 df =2
I think the general problem is that without the semicolon there's no telling what the programmer could have actually have meant (e.g may-be the second line was intended as sdf = 1 + df - 2; with serious typos). Something like this might well result from completely arbitrary typos and have any intended meaning, wherefore it might not be such a good idea after all to have the compiler silently "correct" errors.
You may also have noticed that you often get "expected semicolon" where the real problem is not a lack of a semicolon but something completely different instead. Imagine a malformed expression that the compiler could make sense out of by silently going and inserting semicolons.
The semicolon may seem redundant but it is a simple way for the programmer to confirm "yes, that was my intention".
Also, warnings instead of compiler errors are too weak. People compile code with warnings off, ignore warnings they get, and AFAIK the standard never prescribes what the compiler must warn about.