Preprocessing multiline comments and their embedded newlines at the end of file - c++

This is question about C99/C11 (may be C++ too) preprocessor and their standard-compliance.
Let's consider two source files:
/* I'm
* multiline
* comment
*/
and
/* I'm
* multiline
* comment
*/
i_am_a_token;
If we preprocess both files with gcc or clang (several version was tested), there will be a difference. In the first case preprocessor will not keep newlines from the multiline comment. And in the second case all newlines will be kept.
All mentioned standards says (somewhere inside "Translation phases"):
Each comment is replaced by one space character. New-line characters are retained.
Why there is the difference in handling multiline comments at the end of file? And is this behaviour standard-compliant?

The reason is simple - line numbers and error reporting. Since the compiler reports errors with line numbers, it is convenient so that line numbers in the pre-processed file correspond to line numbers in the original file. That's the reason the lines occupied by comment are preserved when they are followed by code, whereas they don't have to be preserved at the end of file.
As for the standards. The standards
C99: ISO/IEC 9899:1999
C11: ISO/IEC 9899:2011
specify the language, preprocessing macros etc., but they don't specify how the language should be processed. You can see it in the scope definition of C11:
ISO/IEC 9899:2011 does not specify
the mechanism by which C programs are transformed for use by a data-processing system;
which means that preprocessor output is pretty much internal issue, out of the scope of the standard.

Related

In Fortran can I rely in reading tabs-delimited ascii files with list-directed "read"

Is it Fortran-standard compliant to read a tabs-delimited Ascii file like this one:
0.11 0.12 0.45
(where the space is actually a tab) with list-directed input like this:
read(11,*) real1,real2,real3
A more interesting question than a first glance might suggest ...
The standard stipulates that blanks are to be recognised as value separators for list-directed input. In Table 3.1 'Special Characters' of (my version of the draft of) the standard a space is denoted Blank character, but there is no further explanation or definition of blank. So a space is definitely a blank inside the source of a Fortran program.
It is well known (??) that the tab character is not part of the Fortran character set, and some compilers will, by default, object to its presence in source files (outside character variable contexts). But I can't see any anyone writing a compiler that would fail to recognise a tab character as a blank for list-directed input of a list of numbers.
I think the answer to the question is
Whether or not a tab character is a value separator for an input list
is processor dependent (i.e. it's left up to the compiler writer) so the standard doesn't stipulate that you can absolutely rely on this behaviour.
but I'll be interested to see what the language lawyers have to contribute on this one.

C++: Is there a standard definition for end-of-line in a multi-line string constant?

If I have a multi-line string C++11 string constant such as
R"""line 1
line 2
line3"""
Is it defined what character(s) the line terminator/separator consist of?
The intent is that a newline in a raw string literal maps to a single
'\n' character. This intent is not expressed as clearly as it
should be, which has led to some confusion.
Citations are to the 2011 ISO C++ standard.
First, here's the evidence that it maps to a single '\n' character.
A note in section 2.14.5 [lex.string] paragraph 4 says:
[ Note: A source-file new-line in a raw string literal results in a
new-line in the resulting execution string-literal. Assuming no
whitespace at the beginning of lines in the following example, the
assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
— end note ]
This clearly states that a newline is mapped to a single '\n'
character. It also matches the observed behavior of g++ 6.2.0 and
clang++ 3.8.1 (tests done on a Linux system using source files with
Unix-style and Windows-style line endings).
Given the clearly stated intent in the note and the behavior of two
popular compilers, I'd say it's safe to rely on this -- though it
would be interesting to see how other compilers actually handle this.
However, a literal reading of the normative wording of the
standard could easily lead to a different conclusion, or at least
to some uncertainty.
Section 2.5 [lex.pptoken] paragraph 3 says (emphasis added):
Between the initial and final double quote characters of the
raw string, any transformations performed in phases 1 and 2
(trigraphs, universal-character-names, and line splicing)
are reverted; this reversion shall apply before any d-char,
r-char, or delimiting parenthesis is identified.
The phases of translation are specified in 2.2 [lex.phases]. In phase 1:
Physical source file characters are mapped, in an
implementation-defined manner, to the basic source character set
(introducing new-line characters for end-of-line indicators) if
necessary.
If we assume that the mapping of physical source file characters to the
basic character set and the introduction of new-line characters are
"tranformations", we might reasonably conclude that, for example,
a newline in the middle of a raw string literal in a Windows-format
source file should be equivalent to a \r\n sequence. (I can imagine
that being useful for Windows-specific code.)
(This interpretation does lead to problems with systems where the
end-of-line indicator is not a sequence of characters, for example
where each line is a fixed-width record. Such systems are rare
these days.)
As "Cheers and hth. - Alf"'s answer
points out, there is an open
Defect Report
for this issue. It was submitted in 2013 and has not yet been
resolved.
Personally, I think the root of the confusion is the word "any"
(emphasis added as before):
Between the initial and final double quote characters of the raw
string, any transformations performed in phases 1 and 2 (trigraphs,
universal-character-names, and line splicing) are reverted; this
reversion shall apply before any d-char, r-char, or delimiting
parenthesis is identified.
Surely the mapping of physical source file characters to
the basic source character set can reasonably be thought of
as a transformation. The parenthesized clause "(trigraphs,
universal-character-names, and line splicing)" seems to be intended
to specify which transformations are to be reverted, but that
either attempts to change the meaning of the word "transformations"
(which the standard does not formally define) or contradicts the use
of the word "any".
I suggest that changing the word "any" to "certain" would express
the apparent intent much more clearly:
Between the initial and final double quote characters of the raw
string, certain transformations performed in phases 1 and 2 (trigraphs,
universal-character-names, and line splicing) are reverted; this
reversion shall apply before any d-char, r-char, or delimiting
parenthesis is identified.
This wording would make it much clearer that "trigraphs,
universal-character-names, and line splicing" are the only
transformations that are to be reverted. (Not everything done
in translation phases 1 and 2 is reverted, just those specific
listed transformations.)
The standard seems to indicate that:
R"""line 1
line 2
line3"""
is equivalent to:
"line 1\nline 2\nline3"
From 2.14.5 String literals of the C++11 standard:
4 [ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—end note ]
5 [ Example: The raw string
R"a(
)\
a"
)a"
is equivalent to "\n)\\\na\"\n".
Note: the question has changed substantially since the answers were posted. Only half of it remains, namely the pure C++ aspect. The network focus in this answer addresses the original question's “sending a multi-line string to a server with well-defined end-of-line requirements”. I do not chase question evolution in general.
Internally in the program, the C++ standard for newline is \n. This is used also for newline in a raw literal. There is no special convention for raw literals.
Usually \n maps to ASCII linefeed, which is the value 10.
I'm not sure what it maps to in EBCDIC, but you can check that if needed.
On the wire, however, it's my impression that most protocols use ASCII carriage return plus linefeed, i.e. 13 followed by 10. This is sometimes called CRLF, after the ASCII abbreviations CR for carriage return and LF for linefeed. When the C++ escapes are mapped to ASCII this is simply \r\n in C++.
You need to abide by the requirements of the protocol you're using.
For ordinary file/stream i/o the C++ standard library takes care of mapping the internal \n to whatever convention the host environment uses. This is called text mode, as opposed to binary mode where no mapping is performed.
For network i/o, which is not covered by the standard library, the application code must do this itself, either directly or via some library functions.
There is an active issue about this, core language defect report #1655 “Line endings in raw string literals”, submitted by Mike Miller 2013-04-26, where he asks,
” is it intended that, for example, a CRLF in the source of a raw string literal is to be represented as a newline character or as the original characters?
Since line ending values differ depending on the encoding of the original file, and considering that in some file systems there is not an encoding of line endings, but instead lines as records, it's clear that the intention is not to represent the file contents as-is – since that's impossible to do in all cases. But as far as I can see this DR is not yet resolved.

Why do preprocessor commands have to start as first nonwhite space

I am trying to do a #ifndef part way through a setter line and I received this error
"Error 20 error C2014: preprocessor command must start as first nonwhite space"
I am aware of the error means, I am just curious of why it is like that? Is it a compiler choice? What is the reasoning behind this? That it is easier for the user to notice?
Here is the code if someone is wondering:
inline void SetSomething(int value) { #ifndef XBOX ASSERT(value <= 1); #endif test = value; };
At first C did not have any standard preprocessor. Then people started using preprocessing as an external tool. You might note that the # is the same as with comments in general in Unix-land shell scripts.
As the language evolved the preprocessor became integrated with the compiler and more part of the language proper, but it kept its totally different structure, namely, in particular that it's line oriented while the C and C++ core languages are free form.
After that the lines have blurred a bit more. Now the preprocessing typically adds #line directives or the equivalent for use by the core language compiler, also #pragma directives are for the core language compiler, and in the other direction we now have _Pragma (IIRC). Still the structure is mostly as it was originally. C and C++ are evolved languages, not designed languages.
Taking a look into the standard (section 16 "Preprocessing Directives") starting with # as the frirst non whitespace character is what makes a preprocessing directive by definition.
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the following constraints:
The first token in the sequence is a # preprocessing token that (at the start of translation phase 4) is either
the first character in the source file (optionally after white space containing no new-line characters) or that
follows white space containing at least one new-line character.
If you want the most important reason, it's because the standard says so.
If you want to know why the standard says so, it's the easiest way to get the neccessary functionality.
Remember that preprocessing and compiling are two potentially completely separate tasks, and the preprocessor has no idea at all about the language of its output.

Source line length limit

What's the maximum length of a source line all compilers are required to accept? Did it change in C++11? If so, what was the old value?
I'm asking this question because I'm doing some heavy preprocessor voodoo (unfortunately, templates won't cut it), and doing so has a tendency to make the lines big very quickly. I want to stay on the safe side, so I won't have to worry about the possibility of compiler X on platform Y rejecting my code because of too long lines.
C++2003, Annex B, (informative)
Implementation quantities (sorry, don't have C++2011 handy)
2) The limits may constrain quantities that include those described below or others. The bracketed number
following each quantity is recommended as the minimum for that quantity. However, these quantities are
only guidelines and do not determine compliance.
…
Characters in one logical source line [65 536].
You didn't ask about these, but they might be useful, also:
Nesting levels of parenthesized expressions within a full expression [256].
Macro identifiers simultaneously defined in one translation unit [65 536].
Arguments in one macro invocation [256].
Number of characters in an internal identifier or macro name [1 024].
Macro identifiers simultaneously defined in one translation unit [65 536].
Parameters in one macro definition [256].
Postscript: It is worth noting what "one logical source line" is. A logical source line is what you have after:
Physical source file characters are mapped to the basic source
character set
Trigraph
sequences (2.3) are replaced by corresponding single-character internal representations
Each instance of a new-line character and an immediately preceding backslash character is deleted
The logical source line is what you have before:
The source file is decomposed into preprocessing tokens
Preprocessing directives are executed and macro invocations are expanded.
[quotes from C++ 2003, 2.1 Phases of Translation]
So, if the OP's concern is that the macros expand to beyond a reasonable line length, my answer is irrelevant. If the OP's concern is that his source code (after dealing with \, \n) might be too long, my answer stands.

"No newline at end of file" compiler warning

What is the reason for the following warning in some C++ compilers?
No newline at end of file
Why should I have an empty line at the end of a source/header file?
Think of some of the problems that can occur if there is no newline. According to the ANSI standard the #include of a file at the beginning inserts the file exactly as it is to the front of the file and does not insert the new line after the #include <foo.h> after the contents of the file. So if you include a file with no newline at the end to the parser it will be viewed as if the last line of foo.h is on the same line as the first line of foo.cpp. What if the last line of foo.h was a comment without a new line? Now the first line of foo.cpp is commented out. These are just a couple of examples of the types of problems that can creep up.
Just wanted to point any interested parties to James' answer below. While the above answer is still correct for C, the new C++ standard (C++11) has been changed so that this warning should no longer be issued if using C++ and a compiler conforming to C++11.
From C++11 standard via James' post:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
The requirement that every source file end with a non-escaped newline was removed in C++11. The specification now reads:
A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file (C++11 §2.2/1).
A conforming compiler should no longer issue this warning (at least not when compiling in C++11 mode, if the compiler has modes for different revisions of the language specification).
C++03 Standard [2.1.1.2] declares:
... If a source file that is not empty does not end in a new-line character, or ends in a new-line character
immediately preceded by a backslash character before any such splicing takes place, the behavior is undefined.
The answer for the "obedient" is "because the C++03 Standard says the behavior of a program not ending in newline is undefined" (paraphrased).
The answer for the curious is here: http://gcc.gnu.org/ml/gcc/2001-07/msg01120.html.
It isn't referring to a blank line, it's whether the last line (which can have content in it) is terminated with a newline.
Most text editors will put a newline at the end of the last line of a file, so if the last line doesn't have one, there is a risk that the file has been truncated. However, there are valid reasons why you might not want the newline so it is only a warning, not an error.
#include will replace its line with the literal contents of the file. If the file does not end with a newline, the line containing the #include that pulled it in will merge with the next line.
Of course in practice every compiler adds a new line after the #include. Thankfully. – #mxcl
not specific C/C++ but a C dialect: when using the GL_ARB_shading_language_include extension the glsl compiler on OS X warns you NOT about a missing newline. So you can write a MyHeader.h file with a header guard which ends with #endif // __MY_HEADER_H__ and you will lose the line after the #include "MyHeader.h" for sure.
I am using c-free IDE version 5.0,in my progrm either of 'c++' or 'c' language i was getting same problem.Just at the end of the program i.e. last line of the program(after braces of function it may be main or any function),press enter-line no. will be increased by 1.then execute the same program,it will run without error.
Because the behavior differs between C/C++ versions if file does not end with new-line. Especially nasty is older C++-versions, fx in C++ 03 the standard says (translation phases):
If a source file that is not empty does not end in a new-line
character, or ends in a new-line character immediately preceded by a
backslash character, the behavior is undefined.
Undefined behavior is bad: a standard conforming compiler could do more or less what it wants here (insert malicous code or whatever) - clearly a reason for warning.
While the situation is better in C++11 it is a good idea to avoid situations where the behavior is undefined in earlier versions. The C++03 specification is worse than C99 which outright prohibits such files (behavior is then defined).
This warning might also help to indicate that a file could have been truncated somehow. It's true that the compiler will probably throw a compiler error anyway - especially if it's in the middle of a function - or perhaps a linker error, but these could be more cryptic, and aren't guaranteed to occur.
Of course this warning also isn't guaranteed if the file is truncated immediately after a newline, but it could still catch some cases that other errors might miss, and gives a stronger hint to the problem.
In my case, I use KOTLIN Language and the compiler is on IntelliJ. Also, I am using a docker container with LINT to fix possible issues with typos, imports, code usage, etc. This error is coming from these lint fixes, most probably - I mean surely.
In short, the error says, 'Add a new line at the end of the file' That is it.
Before there was NO extra empty line: