The second part of translation phase 2 (section 2.2.2 in N3485) basically says that if a source file does not end in a newline character, the compiler should treat it as if it did.
However, if I'm reading it correctly it makes an explicit exception for empty source files, which remain empty.
The exact text (with added emphasis) is:
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. If, as a result, a character sequence that matches the syntax of a universal-character-name is produced, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.
I haven't been able to figure out any situations in which it would make a difference whether a source file was empty or consisted of only a newline character.
I'm hoping someone can shed some light on the reasoning behind this requirement.
This is to specifically support the 1994 winning entry in the international obfuscated C code contest in the category "worst abuse of rules": The world's smallest self-replicating program. Guaranteed.
I think the idea is that a source file normally consists of zero or more lines, and each line consists of a sequence of non-new-line characters followed by a new-line. Any source file not meeting that requirement needs special handling (so you don't get lines composed of text from two different source files).
An empty C++ source file is not particularly useful, but there's no point in forbidding it. The quoted clause isn't about distinguishing between an empty file and a file consisting of just one new-line (there should be no real difference between them).
i guess this means that every line ends with \n, while empty file has no lines
The preprocessor can be used to construct things besides program source, and a blank line can be significant -- it's often used to separate paragraphs in text, for instance.
"A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file."
The second part of translation phase 2 (section 2.2.2 in N3485) basically says that if a source file does not end in a newline character, the compiler should treat it as if it did.
No - it says that if the file "is not empty" AND does not end in a newline, then a newline is added
However, if I'm reading it correctly it makes an explicit exception for empty source files, which remain empty.
Agreed.
I haven't been able to figure out any situations in which it would make a difference whether a source file was empty or consisted of only a newline character. I'm hoping someone can shed some light on the reasoning behind this requirement.
Consider a header file called "header.h" with last line as below with no trailing newline:
#endif // #ifndef INCLUDED_HEADER_H
Say another.cc includes it as follows:
#include "header.h"
#include "another.h"
When another.cc is parsed, the text from header.h is substituted for the line specifying its inclusion. Done naively, that would result in:
#endif // #ifndef INCLUDED_HEADER_H#include "another.h"
Obvious, the compiler would then fail to act on #include "another.h", considering it part of the comment begun in header.h.
So, the rule for incomplete rules avoids these problems (which could be terribly hard to spot).
If the file was empty anyway, this problem doesn't manifest: there's nothing like the #endif to be prepended to the next line in the including file....
Related
I am referring to: Why should text files end with a newline?
One of the answers quotes the C89 standard. Which in brief dictates that a file must end with a new line, which is not immediately preceded by a backslash.
Does that apply to the most recent C++ standard?
#include <iostream>
using namespace std;
int main()
{
cout << "Hello World!" << endl;
return 0;
}
//\
Is the above valid? (Assuming there is a newline after //\, which I've been unable to display)
The given code is legal in the case of C++, but not for C.
Indeed, the C (N1570) standard says:
Each instance of a backslash character (\) immediately followed by a new-line
character is deleted, splicing physical source lines to form logical source lines.
Only the last backslash on any physical source line shall be eligible for being part
of such a splice. A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character before any such
splicing takes place.
The C++ standard (N3797) formulates it a bit differently (emphasis mine):
Each instance of a backslash character (\) immediately followed by a new-line character is deleted,
splicing physical source lines to form logical source lines. Only the last backslash on any physical
source line shall be eligible for being part of such a splice. If, as a result, a character sequence that
matches the syntax of a universal-character-name is produced, the behavior is undefined. A source file
that is not empty and that does not end in a new-line character, or that ends in a new-line character
immediately preceded by a backslash character before any such splicing takes place, shall be processed
as if an additional new-line character were appended to the file.
As per [lex.phases] p2 and p3, your particular case is also ill-formed in c++ standard.
[lex.phases] p2 says
Each sequence of a backslash character () immediately followed by zero or more whitespace characters other than new-line followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a splice, shall be processed as if an additional new-line character were appended to the file.
Since you said
Assuming there is a newline after //, which I've been unable to display
Hence, the last visible \ is eligible as a splice. So, the sequence consisted of \ and the new-line character is deleted. It means the last character in this source file is / but without being followed by a newline. // starts a comment according to [lex.comment] p1
The characters // start a comment, which terminates immediately before the next new-line character.
As per [lex.phases] p3
The source file is decomposed into preprocessing tokens ([lex.pptoken]) and sequences of whitespace characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment.
In your case, the characters // start a comment but have no new line to terminate it. Hence, it's a partial comment. The program is ill-formed.
I have an array defined as follows:
extern const char config_reg[] = {
0x05, //comment
0x00, //comment
0x00, // \\ <-- double backslash
0x01, //comment
0x03
}
As you can see, there is a double backslash inside a comment (the <-- double backslash and preceding spaces do not appear in the actual source file). When I compile this code (minus the "<-- double backslash") it acts as if the following line is non existent - i.e. equivalent to writing:
extern const char config_reg[] = {
0x05, //comment
0x00, //comment
0x00, //
0x03
}
Is this intended C++ behaviour? If so, what is its intended purpose?
I am compiling using the Parallax Propeller Simple IDE to compile my code - not a particularly good compiler, by all accounts. Is it likely that the compiler implementation is causing this behaviour?
That's correct, assuming that the <-- double backslash and preceding spaces aren't actually in the code.
A single backslash would also produce the same effect.
The newline splicing for backslash-newline occurs before comment analysis, so the 0x01 line is part of the same line as the // \\ comment, so it isn't seen when the comment analysis is done.
The ISO/IEC 14882:2011 (C++11) standard says:
2.2 Phases of translation [lex.phases]
¶1 The precedence among the syntax rules of translation is specified by the following phases.11
Physical source file characters are mapped, in an implementation-defined manner, to the basic source
character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical
source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced
by corresponding single-character internal representations. Any source file character not in the basic
source character set (2.3) is replaced by the universal-character-name that designates that character.
(An implementation may use any internal encoding, so long as an actual extended character
encountered in the source file, and the same extended character expressed in the source file as a
universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this
replacement is reverted in a raw string literal.)
Each instance of a backslash character (\) immediately followed by a new-line character is deleted,
splicing physical source lines to form logical source lines. Only the last backslash on any physical
source line shall be eligible for being part of such a splice. If, as a result, a character sequence that
matches the syntax of a universal-character-name is produced, the behavior is undefined. A source file
that is not empty and that does not end in a new-line character, or that ends in a new-line character
immediately preceded by a backslash character before any such splicing takes place, shall be processed
as if an additional new-line character were appended to the file.
The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters
(including comments). A source file shall not end in a partial preprocessing token or in a partial comment.12 Each comment is replaced by one space character. New-line characters are retained. Whether
each nonempty sequence of white-space characters other than new-line is retained or replaced by one
space character is unspecified. The process of dividing a source file’s characters into preprocessing tokens
is context-dependent. [ Example: see the handling of < within a #include preprocessing directive.
—end example ]
11) Implementations must behave as if these separate phases occur, although in practice different phases might be folded
together.
12) A partial preprocessing token would arise from a source file ending in the first portion of a multi-character token that
requires a terminating sequence of characters, such as a header-name that is missing the closing " or >. A partial comment
would arise from a source file ending with an unclosed /* comment.
Yes, the second phase of translation involves "splicing physical source lines to form logical source lines"; if a line ends with a backslash, the following line is considered to be a continuation of that line. This is the standard behaviour. This occurs before the removal of comments in the third phase, so the fact that the backslash occurs in a comment doesn't change anything.
Line splicing is used quite frequently in C to split macros over multiple lines, since a preprocessor directive extends to the end of the line. It is much rarer in C++, which relies much less on macros than C.
I believe the original purpose in C was to work around restrictions on line length that existed on some now-archaic systems.
A \ at the end of a line escapes the newline character.
Thus in your example, it will extend the comment to the next line. The writer of the snippet probably used \\ instead of just \ for aesthetic purposes. But it doesn't only work with comments. For example this is allowed (but redundant):
int a; \
int b;
Some compilers allow whitespace between the \ and the newline but may issue a warning.
Is it possible in C++ to write a macro, which AFTER expansion will output a backslash sign?
Right now I'm using a code:
#define SOME_ENUM(XX) \
XX(FirstValue,) \
XX(SecondValue,) \
XX(SomeOtherValue,=50) \
XX(OneMoreValue,=100) \
but I want to write a macro, which will generate the code above, so I want to be able to write:
ENUM_BEGIN(name) // it should output: #define SOME_ENUM(XX) \
ENUM(ONE) // it should output: XX(ONE,) \
//...
But I was not able to write a macro like ENUM_BEGIN, because it should expand to something with backslash on the end.
Is it possible in C++?
No, it is not possible. Relevant to this would be §2.2.1, translation phase 2 described in ISO/IEC 14882:2011(E):
Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines to
form logical source lines. Only the last backslash on any physical
source line shall be eligible for being part of such a splice. If, as
a result, a character sequence that matches the syntax of a
universal-character-name is produced, the behavior is undefined. A
source file that is not empty and that does not end in a new-line
character, or that ends in a new-line character immediately preceded
by a backslash character before any such splicing takes place, shall
be processed as if an additional new-line character were appended to
the file.
Basically what will happen is the \\\n (where the \n is physically in the source, not an escape), will be treated as a \ character, followed by a line splice. The remaining \ will most likely result in a syntax error (there may be situations where it is legal, but I don't currently see any), and not treated during subsequent translation phases as a line splice (line splicing occurs in only phase #2).
I haven't found any documentation for it, but I would've thought that you could just do \\ and you'll generate a backslash.
However, in my research, I see that may not be the biggest thing you'll have to deal with. As millsj just commented, you'll have issues outputting the # in your ENUM_BEGIN. See Escaping a # symbol in a #define macro? .
The latest draft of C++0x, n3126, says:
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines.
...
Within the r-char-sequence of a raw string literal, any transformations performed in
phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted.
Technically this means that the C++ preprocessor only recognizes a backslash followed by the newline character, but I know that some C++ implementations also allow Windows- or classic Mac-style line endings as well.
Will conforming implementations of C++0x be required to preserve the newline sequence that immediately followed a backslash character \ within the r-char-sequence of a raw string? Maybe a better question is: would it be expected of a Windows C++0x compiler to undo each line splice with "\\\r\n" instead of "\\\n"?
Translation phase 1 starts with
Physical source file characters are
mapped, in an implementation-defined
manner, to the basic source character
set (introducing newline characters
for end-of-line indicators) if
necessary. Trigraph
sequences (2.3) are replaced [...]
I'd interpret the requirement "any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing)" as explicitly not reverting the transformation from source file characters to the basic source character set. Instead, source characters are later converted to the execution character set, and you get newline characters there.
If you need a specific line ending sequence, you can insert it explicitly, and use string literal concatenation:
char* nitpicky = "I must have a \\r\\n line ending!\r\n"
"Otherwise, some other piece of code will misinterpret this line!";
What is the reason for the following warning in some C++ compilers?
No newline at end of file
Why should I have an empty line at the end of a source/header file?
Think of some of the problems that can occur if there is no newline. According to the ANSI standard the #include of a file at the beginning inserts the file exactly as it is to the front of the file and does not insert the new line after the #include <foo.h> after the contents of the file. So if you include a file with no newline at the end to the parser it will be viewed as if the last line of foo.h is on the same line as the first line of foo.cpp. What if the last line of foo.h was a comment without a new line? Now the first line of foo.cpp is commented out. These are just a couple of examples of the types of problems that can creep up.
Just wanted to point any interested parties to James' answer below. While the above answer is still correct for C, the new C++ standard (C++11) has been changed so that this warning should no longer be issued if using C++ and a compiler conforming to C++11.
From C++11 standard via James' post:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
The requirement that every source file end with a non-escaped newline was removed in C++11. The specification now reads:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
A conforming compiler should no longer issue this warning (at least not when compiling in C++11 mode, if the compiler has modes for different revisions of the language specification).
C++03 Standard [2.1.1.2] declares:
... If a source file that is not empty does not end in a new-line character, or ends in a new-line character
immediately preceded by a backslash character before any such splicing takes place, the behavior is undefined.
The answer for the "obedient" is "because the C++03 Standard says the behavior of a program not ending in newline is undefined" (paraphrased).
The answer for the curious is here: http://gcc.gnu.org/ml/gcc/2001-07/msg01120.html.
It isn't referring to a blank line, it's whether the last line (which can have content in it) is terminated with a newline.
Most text editors will put a newline at the end of the last line of a file, so if the last line doesn't have one, there is a risk that the file has been truncated. However, there are valid reasons why you might not want the newline so it is only a warning, not an error.
#include will replace its line with the literal contents of the file. If the file does not end with a newline, the line containing the #include that pulled it in will merge with the next line.
Of course in practice every compiler adds a new line after the #include. Thankfully. – #mxcl
not specific C/C++ but a C dialect: when using the GL_ARB_shading_language_include extension the glsl compiler on OS X warns you NOT about a missing newline. So you can write a MyHeader.h file with a header guard which ends with #endif // __MY_HEADER_H__ and you will lose the line after the #include "MyHeader.h" for sure.
I am using c-free IDE version 5.0,in my progrm either of 'c++' or 'c' language i was getting same problem.Just at the end of the program i.e. last line of the program(after braces of function it may be main or any function),press enter-line no. will be increased by 1.then execute the same program,it will run without error.
Because the behavior differs between C/C++ versions if file does not end with new-line. Especially nasty is older C++-versions, fx in C++ 03 the standard says (translation phases):
If a source file that is not empty does not end in a new-line
character, or ends in a new-line character immediately preceded by a
backslash character, the behavior is undefined.
Undefined behavior is bad: a standard conforming compiler could do more or less what it wants here (insert malicous code or whatever) - clearly a reason for warning.
While the situation is better in C++11 it is a good idea to avoid situations where the behavior is undefined in earlier versions. The C++03 specification is worse than C99 which outright prohibits such files (behavior is then defined).
This warning might also help to indicate that a file could have been truncated somehow. It's true that the compiler will probably throw a compiler error anyway - especially if it's in the middle of a function - or perhaps a linker error, but these could be more cryptic, and aren't guaranteed to occur.
Of course this warning also isn't guaranteed if the file is truncated immediately after a newline, but it could still catch some cases that other errors might miss, and gives a stronger hint to the problem.
In my case, I use KOTLIN Language and the compiler is on IntelliJ. Also, I am using a docker container with LINT to fix possible issues with typos, imports, code usage, etc. This error is coming from these lint fixes, most probably - I mean surely.
In short, the error says, 'Add a new line at the end of the file' That is it.
Before there was NO extra empty line: