Why can my comment consist of so much forward slash (/)? - c++

I know that there are many types of comment, I will list out a few of them (those related):
// - Normal comment
/// - This would make the comment bold
And surprisingly, the IDE would not raise an error in this code (Even it is not executed, it should still control the programmer somewhere):
/////////////////// HI!
Why would the standard allow this to happen?
BTW, my IDE is Code::Blocks 20.03 if it matters.

As per the C++ standard: lex.comment
The characters // start a comment, which terminates immediately before the next new-line character.
From the above, you can infer that every character (other than newline) which follows the first two / characters is part of the comment.
If that wasn't already clear enough, it goes on to note:
The comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters.

Related

GCC's implementation of angle-brackets includes. Why does it have to be as described below?

This document in its section 2.6 Computed Includes has the following paragraph:
If the line expands to a token stream beginning with a < token and
including a >  token, then the tokens between the <  and the
first > are combined to form the filename to be included. Any
whitespace between tokens is reduced to a single space; then any space
after the initial < is retained, but a trailing space before the
closing > is ignored. CPP searches for the file according to the rules
for angle-bracket includes.
I know this is implementation defined, but why does it have to be this way for GCC? I'm referring specifically to the highlighted sentence above.
EDIT
I have just noticed that the third paragraph before the one quoted above says the following:
You must be careful when you define the macro. #define saves tokens,
not text. The preprocessor has no way of knowing that the macro will
be used as the argument of #include, so it generates ordinary
tokens, not a header name. This is unlikely to cause problems if you
use double-quote includes, which are close enough to string constants.
If you use angle brackets, however, you may have trouble.
Does anyone know what kind of trouble is being pointed out here?
I guess the implementor chose the simplest way when they implemented this functionality, without giving it much thought.
It seems that the initial implementation landed in 2000-07-03 (two decades ago!). The relevant part looks like (source):
for (;;)
{
t = cpp_get_token (pfile);
if (t->type == CPP_GREATER || t->type == CPP_EOF)
break;
CPP_RESERVE (pfile, TOKEN_LEN (t));
if (t->flags & PREV_WHITE)
CPP_PUTC_Q (pfile, ' ');
pfile->limit = spell_token (pfile, t, pfile->limit);
}
Notably, it breaks out when it sees the CPP_GREATER token (i.e. >), before reserving memory for the token. This makes sense, since there's no need to allocate memory when the token will not be written to the buffer.
Then, only after memory is reserved, the preprocessor checks whether the token has preceding whitespace (t->flags & PREV_WHITE) and when it does, writes a whitespace character to the buffer.
As a result, in < foo / bar >, only the whitespaces before foo (that is, after the initial <), /, and bar are kept.

Single line comment continuation

From the C++ standard (going back to at least C++98) § 2.2, note 2 states:
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.
And, section § 2.7 states:
The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates with the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear between it and the new-line that terminates the comment; no diagnostic is required. [Note: The comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment. ]
I would take these two together to mean that the following:
// My comment \
is valid
// My comment \ still valid \
is valid
are legal in C++98. In GCC 4.9.2, these both compile without any diagnostic messages. In MSVC 2013, these both produce the following:
warning C4010: single-line comment contains line-continuation character
If you have warnings as errors enabled (which, I do), this causes the program to not compile successfully (without warnings-as-errors, it works just fine). Is there something in the standard that disallows single-line comment continuations, or is this a case of MSVC non-compliance with the standard?
It's not a question of compliance. You've specifically asked the compiler to treat a valid construct as an error, so that's what it does.
GCC will give the same warning (or error, if requested) if you specify -Wcomment or -Wall.
I'd say it's MS being sensitive to the fact that if you do something like:
#define macro() \
some stuff \
// Intended as comment \
more stuff
then you get VERY interesting errors when you use macro() in the code.
Or other simply accidentally typing a comment like this:
// The files for foo-project are in c:\projects\foo\
int blah;
(Strange errors for "undefined variable blah" occurs)
I would NEVER use line continuation in a single-line comment, but if you have some good reason to, just turn THAT warning off in MSVC.
Also as Mike says: Warnings are not even covered by the standard - it only says what needs to be an error. If you enable "warnings are errors", you will have to either be selective about what warnings you enable, or accept that some constructs that are technically valid (but dubious) will be unacceptable in the build, because the compiler maker has decided to warn about it. Try writing if (c = getchar()) in gcc or clang and see how far you get with much -Werror and warnings on "high". Yet it is perfectly valid according to the standard.

C++ language symbol separator

I need to parse some c++ files to get some information out of it. One user case is I have a enum value "ID_XYZ", I want to find out how many times it appears in a source file. So my question is what are the separator dividing symbols in C++?
You can't really tokenize C or C++ source code based purely on separator characters -- you pretty much need to read in a character at a time, and figure out whether that character can be part of the current token or not.
Just for a couple of examples, when you see a C-style begin-comment token, you need to look at characters until you encounter a close-comment token. Likewise, strings and pre-processor directives (e.g., #if 0 .... #endif sequences). To do it truly correctly, you also need to deal correctly with trigraphs. For example, consider something like this:
// Why doesn't this work??/
ID_XYZ = 1;
If the lexer doesn't handle trigraphs correctly, it will probably identify this as an instance of your ID_XYZ -- but in reality, it's not -- the ??/ at the end of the previous line is really a trigraph that resolves to \, which means the "single-line" comment actually extends to the end of the next line, and the apparent instance of ID_XYZ is really part of the comment.

c++ is a white space independent language, exception to the rule

This wikepedia page defines c++ as a "white space independent language". While mostly true as with all languages there are exceptions to the rule. The only one I can think of at the moment is this:
vector<vector<double> >
Must have a space otherwise the compiler interprets the >> as a stream operator. What other ones are around. It would be interesting to compile a list of the exceptions.
Following that logic, you can use any two-character lexeme to produce such "exceptions" to the rule. For example, += and + = would be interpreted differently. I wouldn't call them exceptions though. In C++ in many contexts "no space at all" is quite different from "one or more spaces". When someone says that C++ is space-independent they usually mean that "one space" in C++ is typically the same as "more than one space".
This is reflected in the language specification, which states (see 2.1/1) that at phase 3 of translation the implementation is allowed to replace sequences of multiple whitespace characters with one space character.
The syntax and semantic rules for parsing C++ are indeed quite complex (I'm trying to be nice, I think one is authorized to say "a mess"). Proof of this fact is that for YEARS different compiler authors where just arguing on what was legal C++ and what it was not.
In C++ for example you may need to parse an unbounded number of tokens before deciding what is the semantic meaning of the first of them (the dreaded "most vexing parse rule" that also often bites newcomers).
Your objection IMO however doesn't really make sense... for example ++ has a different meaning from + +, and in Pascal begin is not the same as beg in. Does this make Pascal a space-dependent language? Is there any space-independent language (except brainf*ck)?
The only problem about C++03 >>/> > is that this mistake when typing was very common so they decided to add even more complexity to the language definition to solve this issue in C++11.
The cases in which one whitespace instead of more whitespaces can make a difference (something that differentiates space-dependent languages and that however plays no role in the > > / >> case) are indeed few:
inside double-quoted strings (but everyone wants that and every language that supports string literals that I know does the same)
inside single quotes (the same, even if something that not many C++ programmers know is that there can be more that one char inside single quotes)
in the preprocessor directives because they work on a line basis (newline is a whitespace and it makes a difference there)
in line continuation as noticed by stefanv: to continue a single line you can put a backslash right before a newline and in that case the language will ignore both characters (you can do this even in the middle of an identifier, even if the typical use is just to make long preprocessor macros readable). If you put other whitespace characters after the backslash and before the newline however the line continuation is not recognized (some compiler accepts it anyway and simply checks if last non-whitespace of a line is a backslash). Line continuation can also be specified using trigraph equivalent ??/ of backslash (any reasonable compiler should IMO emit a warning when finding a trigraph as they most probably were not indented by the programmer).
inside single-line comments // because also there adding a newline to other whitespaces in the middle of a comment makes a difference
Like it or not, but macro's are also part of C++ and multi-line macro's should be separated with a backslash followed by EOL, no whitespace should be in between the backslash and the EOL.
Not a big issue, but still a whitespace exception.
This is because of limitations in the parser pre c++11 this is no longer the case.
The reason being that it was hard to parse >> as end of a template compared to operator >>
While C++03 did interpret >> as the shift operator in all cases (which was overridden for use in streams, but it's still the shift operator), the language parser in C++11 will now attempt to close a brace when reasonable.
Nested template parameters: set<set<int> >.
Character literals: ' '.
String literals: " ".
Justoposition of keywords and identifiers: else return x;, void foo(){}, etc.

"No newline at end of file" compiler warning

What is the reason for the following warning in some C++ compilers?
No newline at end of file
Why should I have an empty line at the end of a source/header file?
Think of some of the problems that can occur if there is no newline. According to the ANSI standard the #include of a file at the beginning inserts the file exactly as it is to the front of the file and does not insert the new line after the #include <foo.h> after the contents of the file. So if you include a file with no newline at the end to the parser it will be viewed as if the last line of foo.h is on the same line as the first line of foo.cpp. What if the last line of foo.h was a comment without a new line? Now the first line of foo.cpp is commented out. These are just a couple of examples of the types of problems that can creep up.
Just wanted to point any interested parties to James' answer below. While the above answer is still correct for C, the new C++ standard (C++11) has been changed so that this warning should no longer be issued if using C++ and a compiler conforming to C++11.
From C++11 standard via James' post:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
The requirement that every source file end with a non-escaped newline was removed in C++11. The specification now reads:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
A conforming compiler should no longer issue this warning (at least not when compiling in C++11 mode, if the compiler has modes for different revisions of the language specification).
C++03 Standard [2.1.1.2] declares:
... If a source file that is not empty does not end in a new-line character, or ends in a new-line character
immediately preceded by a backslash character before any such splicing takes place, the behavior is undefined.
The answer for the "obedient" is "because the C++03 Standard says the behavior of a program not ending in newline is undefined" (paraphrased).
The answer for the curious is here: http://gcc.gnu.org/ml/gcc/2001-07/msg01120.html.
It isn't referring to a blank line, it's whether the last line (which can have content in it) is terminated with a newline.
Most text editors will put a newline at the end of the last line of a file, so if the last line doesn't have one, there is a risk that the file has been truncated. However, there are valid reasons why you might not want the newline so it is only a warning, not an error.
#include will replace its line with the literal contents of the file. If the file does not end with a newline, the line containing the #include that pulled it in will merge with the next line.
Of course in practice every compiler adds a new line after the #include. Thankfully. – #mxcl
not specific C/C++ but a C dialect: when using the GL_ARB_shading_language_include extension the glsl compiler on OS X warns you NOT about a missing newline. So you can write a MyHeader.h file with a header guard which ends with #endif // __MY_HEADER_H__ and you will lose the line after the #include "MyHeader.h" for sure.
I am using c-free IDE version 5.0,in my progrm either of 'c++' or 'c' language i was getting same problem.Just at the end of the program i.e. last line of the program(after braces of function it may be main or any function),press enter-line no. will be increased by 1.then execute the same program,it will run without error.
Because the behavior differs between C/C++ versions if file does not end with new-line. Especially nasty is older C++-versions, fx in C++ 03 the standard says (translation phases):
If a source file that is not empty does not end in a new-line
character, or ends in a new-line character immediately preceded by a
backslash character, the behavior is undefined.
Undefined behavior is bad: a standard conforming compiler could do more or less what it wants here (insert malicous code or whatever) - clearly a reason for warning.
While the situation is better in C++11 it is a good idea to avoid situations where the behavior is undefined in earlier versions. The C++03 specification is worse than C99 which outright prohibits such files (behavior is then defined).
This warning might also help to indicate that a file could have been truncated somehow. It's true that the compiler will probably throw a compiler error anyway - especially if it's in the middle of a function - or perhaps a linker error, but these could be more cryptic, and aren't guaranteed to occur.
Of course this warning also isn't guaranteed if the file is truncated immediately after a newline, but it could still catch some cases that other errors might miss, and gives a stronger hint to the problem.
In my case, I use KOTLIN Language and the compiler is on IntelliJ. Also, I am using a docker container with LINT to fix possible issues with typos, imports, code usage, etc. This error is coming from these lint fixes, most probably - I mean surely.
In short, the error says, 'Add a new line at the end of the file' That is it.
Before there was NO extra empty line: