Unknown meta-character in C/C++ string literal? - c++

I created a new project with the following code segment:
char* strange = "(Strange??)";
cout << strange << endl;
resulting in the following output:
(Strange]
Thus translating '??)' -> ']'
Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen.
Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.
Anyone have an explanation?
search : 'question mark, question mark, close brace' c c++ string literal

What you're seeing is called a trigraph.
In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.
GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.
Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.

Easy way to avoid the trigraph surprise: split a "??" string literal in two:
char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/* ^^^ no punctuation */
Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit
Quotes from the Standard
5.2.1.1 Trigraph sequences
1 Before any other processing takes place, each occurrence of one of the
following sequences of three characters (called trigraph sequences13))
is replaced with the corresponding single character.
??= # ??) ] ??! |
??( [ ??' ^ ??> }
??/ \ ??< { ??- ~
No other trigraph sequences exist. Each ? that does not begin one of
the trigraphs listed above is not changed.
5.1.1.2 Translation phases
1 The precedence among the syntax rules of translation is specified by
the following phases.
1. Physical source file multibyte characters are mapped, in an
implementation-defined manner, to the source character set
(introducing new-line characters for end-of-line indicators)
if necessary. Trigraph sequences are replaced by corresponding
single-character internal representations.

It's a Trigraph!

??) is a trigraph.

That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:
char* strange = "(Strange?\?)";

It's a trigraph.

Trigraphs are the reason. The talk about C in the article also applies to C++

As mentioned several times, you're being bitten by a trigraph. See this previous SO question for more information:
Purpose of Trigraph sequences in C++?
You can fix the problem by using the '\?' escape sequence for the '?' character:
char* strange = "(Strange\?\?)";
In fact, this is the reason for that escape sequence, which is somewhat mysterious if you're unaware of those damn trigraphs.

While trying to cross-compile on GCC it picked my sequence up as a trigraph:
So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)
The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.

Related

C++: Is there a standard definition for end-of-line in a multi-line string constant?

If I have a multi-line string C++11 string constant such as
R"""line 1
line 2
line3"""
Is it defined what character(s) the line terminator/separator consist of?
The intent is that a newline in a raw string literal maps to a single
'\n' character. This intent is not expressed as clearly as it
should be, which has led to some confusion.
Citations are to the 2011 ISO C++ standard.
First, here's the evidence that it maps to a single '\n' character.
A note in section 2.14.5 [lex.string] paragraph 4 says:
[ Note: A source-file new-line in a raw string literal results in a
new-line in the resulting execution string-literal. Assuming no
whitespace at the beginning of lines in the following example, the
assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
— end note ]
This clearly states that a newline is mapped to a single '\n'
character. It also matches the observed behavior of g++ 6.2.0 and
clang++ 3.8.1 (tests done on a Linux system using source files with
Unix-style and Windows-style line endings).
Given the clearly stated intent in the note and the behavior of two
popular compilers, I'd say it's safe to rely on this -- though it
would be interesting to see how other compilers actually handle this.
However, a literal reading of the normative wording of the
standard could easily lead to a different conclusion, or at least
to some uncertainty.
Section 2.5 [lex.pptoken] paragraph 3 says (emphasis added):
Between the initial and final double quote characters of the
raw string, any transformations performed in phases 1 and 2
(trigraphs, universal-character-names, and line splicing)
are reverted; this reversion shall apply before any d-char,
r-char, or delimiting parenthesis is identified.
The phases of translation are specified in 2.2 [lex.phases]. In phase 1:
Physical source file characters are mapped, in an
implementation-defined manner, to the basic source character set
(introducing new-line characters for end-of-line indicators) if
necessary.
If we assume that the mapping of physical source file characters to the
basic character set and the introduction of new-line characters are
"tranformations", we might reasonably conclude that, for example,
a newline in the middle of a raw string literal in a Windows-format
source file should be equivalent to a \r\n sequence. (I can imagine
that being useful for Windows-specific code.)
(This interpretation does lead to problems with systems where the
end-of-line indicator is not a sequence of characters, for example
where each line is a fixed-width record. Such systems are rare
these days.)
As "Cheers and hth. - Alf"'s answer
points out, there is an open
Defect Report
for this issue. It was submitted in 2013 and has not yet been
resolved.
Personally, I think the root of the confusion is the word "any"
(emphasis added as before):
Between the initial and final double quote characters of the raw
string, any transformations performed in phases 1 and 2 (trigraphs,
universal-character-names, and line splicing) are reverted; this
reversion shall apply before any d-char, r-char, or delimiting
parenthesis is identified.
Surely the mapping of physical source file characters to
the basic source character set can reasonably be thought of
as a transformation. The parenthesized clause "(trigraphs,
universal-character-names, and line splicing)" seems to be intended
to specify which transformations are to be reverted, but that
either attempts to change the meaning of the word "transformations"
(which the standard does not formally define) or contradicts the use
of the word "any".
I suggest that changing the word "any" to "certain" would express
the apparent intent much more clearly:
Between the initial and final double quote characters of the raw
string, certain transformations performed in phases 1 and 2 (trigraphs,
universal-character-names, and line splicing) are reverted; this
reversion shall apply before any d-char, r-char, or delimiting
parenthesis is identified.
This wording would make it much clearer that "trigraphs,
universal-character-names, and line splicing" are the only
transformations that are to be reverted. (Not everything done
in translation phases 1 and 2 is reverted, just those specific
listed transformations.)
The standard seems to indicate that:
R"""line 1
line 2
line3"""
is equivalent to:
"line 1\nline 2\nline3"
From 2.14.5 String literals of the C++11 standard:
4 [ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—end note ]
5 [ Example: The raw string
R"a(
)\
a"
)a"
is equivalent to "\n)\\\na\"\n".
Note: the question has changed substantially since the answers were posted. Only half of it remains, namely the pure C++ aspect. The network focus in this answer addresses the original question's “sending a multi-line string to a server with well-defined end-of-line requirements”. I do not chase question evolution in general.
Internally in the program, the C++ standard for newline is \n. This is used also for newline in a raw literal. There is no special convention for raw literals.
Usually \n maps to ASCII linefeed, which is the value 10.
I'm not sure what it maps to in EBCDIC, but you can check that if needed.
On the wire, however, it's my impression that most protocols use ASCII carriage return plus linefeed, i.e. 13 followed by 10. This is sometimes called CRLF, after the ASCII abbreviations CR for carriage return and LF for linefeed. When the C++ escapes are mapped to ASCII this is simply \r\n in C++.
You need to abide by the requirements of the protocol you're using.
For ordinary file/stream i/o the C++ standard library takes care of mapping the internal \n to whatever convention the host environment uses. This is called text mode, as opposed to binary mode where no mapping is performed.
For network i/o, which is not covered by the standard library, the application code must do this itself, either directly or via some library functions.
There is an active issue about this, core language defect report #1655 “Line endings in raw string literals”, submitted by Mike Miller 2013-04-26, where he asks,
” is it intended that, for example, a CRLF in the source of a raw string literal is to be represented as a newline character or as the original characters?
Since line ending values differ depending on the encoding of the original file, and considering that in some file systems there is not an encoding of line endings, but instead lines as records, it's clear that the intention is not to represent the file contents as-is – since that's impossible to do in all cases. But as far as I can see this DR is not yet resolved.

What determines the normalized form of a Unicode string in C++?

When creating string literals in C++, I would like to know how the strings are encoded -- I can specify the encoding form (UTF-8, 16, or 32), but I want to know how the compiler determines the unspecified parts of the encoding.
For UTF-8 the byte-ordering is not relevant, and I would assume the byte ordering of UTF-16 and UTF-32 is, by default, the system byte-ordering. This leaves the normalization. As an example:
std::string u8foo = u8"Föo";
std::u16string u16foo = u"Föo";
std::u32string u32foo = U"Föo";
In all three cases, there are at least two possible encodings -- decomposed or composed. For more complex characters there might by multiple possible encodings, but I would assume that the compiler would generate one of the normalized forms.
Is this a safe assumption? Can I know in advance in what normalization the text in u8foo and u16foo is stored? Can I specify it somehow?
I am of the impression this is not defined by the standard, and that it is implementation specific. How does GCC handle it? Other compilers?
The interpretation of character strings outside of the basic source character set is implementation-dependent. (Standard quote below.) So there is no definitive answer; an implementation is not even obliged to accept source characters outside of the basic set.
Normalisation involves a mapping of possibly multiple source codepoints to possibly multiple internal codepoints, including the possibility of reordering the source character sequence (if, for example, diacritics are not in the canonical order). Such transformations are more complex than the source→internal transformation anticipated by the standard, and I suspect that a compiler which attempted them would not be completely conformant. In any event, I know of no compiler which does so.
So, in general, you should ensure that the source code you provide to the compiler is normalized as per your desired normalization form, if that matters to you.
In the particular case of GCC, the compiler interprets the source according to the default locale's encoding, unless told otherwise (with the -finput-charset command-line option). It will recode if necessary to Unicode codepoints. But it does not alter the sequence of codepoints. So if you give it a normalized UTF-8 string, that's what you get. And if you give it an unnormalized string, that's also what you get.
In this example on coliru, the first string is composed and the second one decomposed (although they are both in some normalization form). (The rendering of the second example string in coliru seems to be browser-dependent. On my machine, chrome renders them correctly, while firefox shifts the diacritics one position to the left. YMMV.)
The C++ standard defines the basic source character set (in §2.3/1) to be letters, digits, five whitespace characters (space, newline, tab, vertical tab and formfeed) and the symbols:
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
It gives the compiler a lot of latitude as to how it interprets the input, and how it handles characters outside of the basic source character set. §2.2 paragraph 1 (from C++14 draft n4527):
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (e.g., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)
It's worth adding that diacritics are characters, from the perspective of the C++ standard. So the composed ñ (\u00d1) is one character and the decomposed ñ (\u006e \u0303) is two characters, regardless of how it looks to you.
A close reading of the above paragraph from the standard suggests that normalization or other transformations which are not strictly 1-1 are not permitted, although the compiler may be able to reject an input which contains characters outside the basic source character set.
Microsoft Visual C++ will keep the normalization used in the source file.
The main problem you have when doing this cross-platform is making sure the compilers are using the right encodings. Here is how MSVC handles it:
Source file encoding
The compiler has to read your source file with the right encoding.
MSVC doesn't have an option to specify the encoding on the command line but relies on the BOM to detect encoding, so it can read the following encodings:
UTF-16 with BOM, if the file starts with that BOM
UTF-8, if the file starts with "\xef\xbb\xbf" (the UTF-8 "BOM")
in all other cases, the file is read using an ANSI code page dependent on your system language setting. In practice this means you can only use ASCII characters in your source files.
Output encoding
Your unicode strings will be encoded with some encoding before written to your executable as a byte string.
Wide literals (L"...") are always written as UTF-16.
MSVC 2010 you can use #pragma execution_character_set("utf-8") to have char strings encoded as UTF-8. By default they are encoded in your local code page. That pragma is apparently missing from MSVC 2012 but it's back in MSVC 2013.
#pragma execution_character_set("utf-8")
const char a[] = "ŦεŞŧ";
Support for the Unicode literals (u"..." and friends) was only just now introduced with MSVC 2015.

Single line comment continuation

From the C++ standard (going back to at least C++98) § 2.2, note 2 states:
Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. Except for splices reverted in a raw string literal, if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined. A source file that is not empty and that does not end in a new-line character, or that ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, shall be processed as if an additional new-line character were appended to the file.
And, section § 2.7 states:
The characters /* start a comment, which terminates with the characters */. These comments do not nest. The characters // start a comment, which terminates with the next new-line character. If there is a form-feed or a vertical-tab character in such a comment, only white-space characters shall appear between it and the new-line that terminates the comment; no diagnostic is required. [Note: The comment characters //, /*, and */ have no special meaning within a // comment and are treated just like other characters. Similarly, the comment characters // and /* have no special meaning within a /* comment. ]
I would take these two together to mean that the following:
// My comment \
is valid
// My comment \ still valid \
is valid
are legal in C++98. In GCC 4.9.2, these both compile without any diagnostic messages. In MSVC 2013, these both produce the following:
warning C4010: single-line comment contains line-continuation character
If you have warnings as errors enabled (which, I do), this causes the program to not compile successfully (without warnings-as-errors, it works just fine). Is there something in the standard that disallows single-line comment continuations, or is this a case of MSVC non-compliance with the standard?
It's not a question of compliance. You've specifically asked the compiler to treat a valid construct as an error, so that's what it does.
GCC will give the same warning (or error, if requested) if you specify -Wcomment or -Wall.
I'd say it's MS being sensitive to the fact that if you do something like:
#define macro() \
some stuff \
// Intended as comment \
more stuff
then you get VERY interesting errors when you use macro() in the code.
Or other simply accidentally typing a comment like this:
// The files for foo-project are in c:\projects\foo\
int blah;
(Strange errors for "undefined variable blah" occurs)
I would NEVER use line continuation in a single-line comment, but if you have some good reason to, just turn THAT warning off in MSVC.
Also as Mike says: Warnings are not even covered by the standard - it only says what needs to be an error. If you enable "warnings are errors", you will have to either be selective about what warnings you enable, or accept that some constructs that are technically valid (but dubious) will be unacceptable in the build, because the compiler maker has decided to warn about it. Try writing if (c = getchar()) in gcc or clang and see how far you get with much -Werror and warnings on "high". Yet it is perfectly valid according to the standard.

C11 Compilation. Phase of translation #1 and #5. Universal character names

I'm trying to understand Universal Character Names in the C11 standard and found that the N1570 draft of the C11 standard has much less detail than the C++11 standard with respect to Translation Phases 1 and 5 and the formation and handling of UCNs within them. This is what each has to say:
Translation Phase 1
N1570 Draft C11 5.1.1.2p1.1:
Physical source file multibyte characters are mapped, in an implementation-defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.
C++11 2.2p1.1:
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.3) is replaced by the universal-character-name that designates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e., using the \uXXXX notation), are handled equivalently except where this replacement is reverted in a raw string literal.)
Translation Phase 5
N1570 Draft C11 5.1.1.2p1.5:
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; [...]
C++ 2.2p1.5:
Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set; [...]
(emphasis was added on differences)
The Questions
In the C++11 standard, it is very clear that source file characters not in the basic source character set are converted to UCNs, and that they are treated exactly as would have been a UCN in that same place, with the sole exception of raw-strings. Is the same true of C11? When a C11 compiler sees a multi-byte UTF-8 character such as °, does it too translate this to \u00b0 in phase 1, and treat it just as if \u00b0 had appeared there instead?
To put it in a different way, at the end of which translation phase, if any, are the following snippets of code transformed into textually equivalent forms for the first time in C11?
const char* hell° = "hell°";
and
const char* hell\u00b0 = "hell\u00b0";
If in 2., the answer is "in none", then during which translation phase are those two identifiers first understood to refer to the same thing, despite being textually different?
In C11, are UCNs in character/string literals also converted in phase 5? If so, why omit this from the draft standard?
How are UCNs in identifiers (as opposed to in character/string literals as already mentioned) handled in both C11 and C++11? Are they also converted in phase 5? Or is this something implementation-defined? Does GCC for instance print out such identifiers in UCN-coded form, or in actual UTF-8?
Comments turned into an answer
Interesting question!
The C standard can leave more of the conversions unstated because they are implementation-defined (and C has no raw strings to confuse the issue).
What it says in the C standard is sufficient — except that it leaves your question 1 unanswerable.
Q2 has to be 'Phase 5', I think, with caveats about it being 'the token stream is equivalent'.
Q3 is strictly N/A, but Phase 7 is probably the answer.
Q4 is 'yes', and it says so because it mentions 'escape sequences' and UCNs are escape sequences.
Q5 is 'Phase 5' too.
Can the C++11-mandated processes in Phase 1 and 5 be taken as compliant within the wording of C11 (putting aside raw strings)?
I think they are effectively the same; the difference arises primarily from the raw literal issue which is specific to C++. Generally, the C and C++ standards try not to make things gratuitously different, and in particular try to the workings of the preprocessor and the low-level character parsing the same in both (which has been easier since C99 added support for C++ // comments, but which evidently got harder with the addition of raw literals to C++11).
One day, I'll have to study the raw literal notations and their implications more thoroughly.
First, please note that these distinction exist since 1998; UCN were first introduced in C++98, a new standard (ISO/IEC 14882, 1st edition:1998), and then made their way into the C99 revision of the C standard; but the C committee (and existing implementers, and their pre-existing implementations) did not feel the C++ way was the only way to achieve the trick, particularly with corner cases and the use of smaller character sets than Unicode, or just different; for example, the requirement to ship the mapping tables from whatever-supported-encodings to Unicode was a preoccupation for C vendors in 1998.
The C standard (consciously) avoids deciding this, and let the compiler chooses how to proceed. While your reasoning takes obviously place with the context of UTF-8 character sets used for both source and execution, there are a large (and pre-existing) range of different C99/C11 compilers available which are using different sets; and the committee felt it should not restrict the implementers too much on this issue. In my experience, most compilers keep it distinct in practice (for performance reasons.)
Because of this freedom, some compilers can have it identical after phase 1 (like a C++ compiler shall), while other can left it distinct as late as phase 7 for the first degree character; the second degree characters (in the string) ought to be the same after phase 5, assuming the degree character is part of the extended execution character set supported by the implementation.
For the other answers, I won't add anything to Jonathan's.
About your additional question about the C++ more deterministic process to be Standard-C-compliant, it is clearly a goal to be so; and if you find a corner case which shows otherwise (a C++11-compliant preprocessor which would not conform to the C99 and C11 standards), then you should consider asking the WG14 committee about a potential defect.
Obviously, the reverse is not true: it is possible to write a pre-processor with handling of UCN which complies to C99/C11 but not to the C++ standards; the most obvious difference is with
#define str(t) #t
#define str_is(x, y) const char * x = y " is " str(y)
str_is(hell°, "hell°");
str_is(hell\u00B0, "hell\u00B0");
which a C-compliant preprocessor can render in a similar same way as your examples (and most do so) and as such, will have distinct renderings; but I am under the impression that a C++-compliant preprocessor is required to transform into (strictly equivalent)
const char* hell° = "hell°" " is " "\"hell\\u00b0\"";
const char* hell\u00b0 = "hell\\u00b0" " is " "\"hell\\u00b0\"";
Last, but not least, I believe not much compilers are fully compliant to this very level of details!

Is the backslash acceptable in C and C++ #include directives?

There are two path separators in common use: the Unix forward-slash and the DOS backslash. Rest in peace, Classic Mac colon. If used in an #include directive, are they equal under the rules of the C++11, C++03, and C99 standards?
C99 says (§6.4.7/3):
If the characters ', \, ", //, or /* occur in the sequence between the < and > delimiters, the behavior is undefined. Similarly, if the characters ', \, //, or /* occur in the sequence between the " delimiters, the behavior is undefined.
(footnote: Thus, sequences of characters that resemble escape sequences cause undefined behavior.)
C++03 says (§2.8/2):
If either of the characters ’ or \, or either of the character sequences /* or // appears in a q-char- sequence or a h-char-sequence, or the character " appears in a h-char-sequence, the behavior is undefined.
(footnote: Thus, sequences of characters that resemble escape sequences cause undefined behavior.)
C++11 says (§2.9/2):
The appearance of either of the characters ’ or \ or of either of the character sequences /* or // in a q-char-sequence or an h-char-sequence is conditionally supported with implementation-defined semantics, as is the appearance of the character " in an h-char-sequence.
(footnote: Thus, a sequence of characters that resembles an escape sequence might result in an error, be interpreted as the character corresponding to the escape sequence, or have a completely different meaning, depending on the implementation.)
Therefore, although any compiler might choose to support a backslash in a #include path, it is unlikely that any compiler vendor won't support forward slash, and backslashes are likely to trip some implementations up by virtue of forming escape codes. (Edit: apparently MSVC previously required backslash. Perhaps others on DOS-derived platforms were similar. Hmmm… what can I say.)
C++11 seems to loosen the rules, but "conditionally supported" is not meaningfully better than "causes undefined behavior." The change does more to reflect the existence of certain popular compilers than to describe a portable standard.
Of course, nothing in any of these standards says that there is such a thing as paths. There are filesystems out there with no paths at all! However, many libraries assume pathnames, including POSIX and Boost, so it is reasonable to want a portable way to refer to files within subdirectories.
Forward slash is the correct way; the pre-compiler will do whatever it takes on each platform to get to the correct file.
It depends on what you mean by "acceptable".
There are two senses in which slashes are acceptable and backslashes are not.
If you're writing C99, C++03, or C1x, backslashes are undefined, while slashes are legal, so in this sense, backslashes are not acceptable.
But this is irrelevant for most people. If you're writing C++1x, where backslashes are conditionally-supported, and the platform you're coding for supports them, they're acceptable. And if you're writing an "extended dialect" of C99/C++03/C1x that defines backslashes, same deal. And, more importantly, this notion of "acceptable" is pretty meaningless in most cases anyway. None of the C/C++ standards define what slashes mean (or what backslashes mean when they're conditionally-supported). Header names are mapped to source files in an implementation-defined manner, period. If you've got a hierarchy of files, and you're asking whether to use backslashes or slashes to refer to them portably in #include directives, the answer is: neither is portable. If you want to write truly portable code, you can't use hierarchies of header files—in fact, arguably, your best bet is to write everything in a single source file, and not #include anything except standard headers.
However, in the real world, people often want "portable-enough", not "strictly portable". The POSIX standard mandates what slashes mean, and even beyond POSIX, most modern platforms—including Win32 (and Win64), the cross-compilers for embedded and mobile platforms like Symbian, etc.—treat slashes the POSIX way, at least as far as C/C++ #include directives. Any platform that doesn't, probably won't have any way for you to get your source tree onto it, process your makefile/etc., and so on, so #include directives will be the least of your worries. If that's what you care about, then slashes are acceptable, but backslashes are not.
Blackslash is undefined behavior and even with a slash you have to be careful. The C99 standard states:
If the characters ', \, ", //, or /*
occur in the sequence between the <
and > delimiters, the behavior is
undefined. Similarly, if the
characters ', \, //, or /* occur in
the sequence between the " delimiters,
the behavior is undefined.
Always use forward slashes - they work on more platforms. Backslash technically causes undefined behaviour in C++03 (2.8/2 in the standard).
The standard says for #include that it:
searches a sequence of implementation-defined places for
a header identified uniquely by the specified sequence between
the delimiters, and causes the replacement of that directive by the
entire contents of the header. How the places are specified or the header
identified is implementation-defined.
Note the last sentence.