c++, How to disable escape sequence in visual studio 2019? [duplicate] - c++

For regular expression \w+\d, in many script language such as perl/python it can be written literally. But in C/C++, I must write it as:
const char *re_str = "\\w+\\d";
which is ugly to eye.
Is there any method to avoid it? MACRO are also acceptable.

Just as an FYI, the next C++ standard (C++ 0x) will have something called raw string literals which should let you do something like:
const char *re_str = R"(\w+\d)";
However until then I think you're stuck with the pain of doubling up your backslashes if you want the regex to be a literal in the source file.

When I reading [C: A reference manual] Chapter 3: Prepressors. An idea emerges:
#define STR(a) #a
#define R(var, re) static char var##_[] = STR(re);\
const char * var = ( var##_[ sizeof(var##_) - 2] = '\0', (var##_ + 1) );
R(re, "\w\d");
printf("Hello, world[%s]\n", re);
It's portable in both C and C++, only uses standard preprocessing features. The trick is to use macro to expand \ inside liternal string and then remove the leading and tailing double quote strings.
Now I think it's the best way until C++0x really introduce the new literal string syntax R"...". And for C I think it'll be the best way for a long time.
The side effect is that we cannot defined such a variable in the global scope in C. Because there's a statement to remove the tailing double-quote character. In C++ it's OK.

You can put your regexp in a file and read the file if you have a lot or need to modify them often. That's the only way I see to avoid backslashes.

No. There is only one kind of string literals in C++, and it's the kind that treats escaped characters.

Related

How to exclude C++ raw string literals from syntax highlighting in Vim?

Quite honestly, raw string literals are a great addition to the C++ language. But (as expected) editors have a hard time to properly display those literals.
I am using Vim 7.4 and out-of-the-box raw string literals completely break the syntax highlighting. For example in
char const txt[] = R"(printf(")";
the 2nd '(' is highlighted red in vim.
Something like
char const txt2[] = R"( "{{" )";
breaks the highlighting of curly braces and the syntax based auto-ident - and so on.
For a start I would be happy to have Vim ignore everything between R"( and )" when doing syntax highlighting.
But note that raw string literals are flexible - arbitrary matching strings are allowed between the first/last double-quote/brace pair, e.g.
R"abcd()")")abcd"
is also a valid raw string literal which encodes
)")"
See also the cppreference link for a general definition of the syntax.
Thus my question how to configure Vim such that C++ raw string literals are properly recognized.
Vim already seems to include some facilities to properly synatx highlight language fragments embedded in a host language (e.g. for compiler-compiler source files). Perhaps they can be used for the raw string literal case as well?
Add this
syntax match cString 'R"\([^(]*\)(\_.*)\1"'
to your custom C++ syntax file (normally ~/.vim/syntax/cpp.vim ; create this file if you don't have one).
Just add cpp-vim as a plugin. I have added strict support for newer string literals in pull-request #14.
This is what you get: http://bl.ocks.org/anonymous/raw/9442865
cpp-vim adds support for other C++11 stuff too.
A tiny tweak on the above syntax rule:
syntax match cString 'R"\([^(]*\)(\_.\{-})\1"'
The original attempts to greedily select the longest match; so if you have multiple raw strings in a file (using the same open/close pattern) is would break.
This one is non-greedy, and should match correctly.
Thank you so much for the original though, it was a huge help to me!

How to disable the escape sequence in C++

I use C++ to process many files, and I have to write the file name in source code like this:
"F:\\somepath\\subpath\\myfile",
I wonder that if there's any way to get rid of typing "\\" to get a character '\' in string literal context, i.e, I hope I can just write "F:\somepath\subpath\myfile" instead the boring one.
Solutions:
use C++11 string literals: R"(F:\somepath\subpath\myfile)"
Use boost::path with forward slashes:
They will validate your path and raise exceptions for problems.
boost::filesystem::path p = "f:/somepath/subpath";
p /= "myfile";
just use forward slashes; Windows should understand them.
If you have C++11, you can use raw string literals:
std::string s = R"F:\somepath\subpath\myfile";
On the other hand, you can just use forward slashes for filesystem paths:
std::string s = "F:/somepath/subpath/myfile";
Two obvious options:
Windows understands forward slashes (or rather, it translates them to backslashes); use those instead.
C++11 has raw string literals. Stuff inside them doesn't need to be escaped.
R"(F:\somepath\subpath\myfile)"

c++ is a white space independent language, exception to the rule

This wikepedia page defines c++ as a "white space independent language". While mostly true as with all languages there are exceptions to the rule. The only one I can think of at the moment is this:
vector<vector<double> >
Must have a space otherwise the compiler interprets the >> as a stream operator. What other ones are around. It would be interesting to compile a list of the exceptions.
Following that logic, you can use any two-character lexeme to produce such "exceptions" to the rule. For example, += and + = would be interpreted differently. I wouldn't call them exceptions though. In C++ in many contexts "no space at all" is quite different from "one or more spaces". When someone says that C++ is space-independent they usually mean that "one space" in C++ is typically the same as "more than one space".
This is reflected in the language specification, which states (see 2.1/1) that at phase 3 of translation the implementation is allowed to replace sequences of multiple whitespace characters with one space character.
The syntax and semantic rules for parsing C++ are indeed quite complex (I'm trying to be nice, I think one is authorized to say "a mess"). Proof of this fact is that for YEARS different compiler authors where just arguing on what was legal C++ and what it was not.
In C++ for example you may need to parse an unbounded number of tokens before deciding what is the semantic meaning of the first of them (the dreaded "most vexing parse rule" that also often bites newcomers).
Your objection IMO however doesn't really make sense... for example ++ has a different meaning from + +, and in Pascal begin is not the same as beg in. Does this make Pascal a space-dependent language? Is there any space-independent language (except brainf*ck)?
The only problem about C++03 >>/> > is that this mistake when typing was very common so they decided to add even more complexity to the language definition to solve this issue in C++11.
The cases in which one whitespace instead of more whitespaces can make a difference (something that differentiates space-dependent languages and that however plays no role in the > > / >> case) are indeed few:
inside double-quoted strings (but everyone wants that and every language that supports string literals that I know does the same)
inside single quotes (the same, even if something that not many C++ programmers know is that there can be more that one char inside single quotes)
in the preprocessor directives because they work on a line basis (newline is a whitespace and it makes a difference there)
in line continuation as noticed by stefanv: to continue a single line you can put a backslash right before a newline and in that case the language will ignore both characters (you can do this even in the middle of an identifier, even if the typical use is just to make long preprocessor macros readable). If you put other whitespace characters after the backslash and before the newline however the line continuation is not recognized (some compiler accepts it anyway and simply checks if last non-whitespace of a line is a backslash). Line continuation can also be specified using trigraph equivalent ??/ of backslash (any reasonable compiler should IMO emit a warning when finding a trigraph as they most probably were not indented by the programmer).
inside single-line comments // because also there adding a newline to other whitespaces in the middle of a comment makes a difference
Like it or not, but macro's are also part of C++ and multi-line macro's should be separated with a backslash followed by EOL, no whitespace should be in between the backslash and the EOL.
Not a big issue, but still a whitespace exception.
This is because of limitations in the parser pre c++11 this is no longer the case.
The reason being that it was hard to parse >> as end of a template compared to operator >>
While C++03 did interpret >> as the shift operator in all cases (which was overridden for use in streams, but it's still the shift operator), the language parser in C++11 will now attempt to close a brace when reasonable.
Nested template parameters: set<set<int> >.
Character literals: ' '.
String literals: " ".
Justoposition of keywords and identifiers: else return x;, void foo(){}, etc.

How to avoid backslash escape when writing regular expression in C/C++

For regular expression \w+\d, in many script language such as perl/python it can be written literally. But in C/C++, I must write it as:
const char *re_str = "\\w+\\d";
which is ugly to eye.
Is there any method to avoid it? MACRO are also acceptable.
Just as an FYI, the next C++ standard (C++ 0x) will have something called raw string literals which should let you do something like:
const char *re_str = R"(\w+\d)";
However until then I think you're stuck with the pain of doubling up your backslashes if you want the regex to be a literal in the source file.
When I reading [C: A reference manual] Chapter 3: Prepressors. An idea emerges:
#define STR(a) #a
#define R(var, re) static char var##_[] = STR(re);\
const char * var = ( var##_[ sizeof(var##_) - 2] = '\0', (var##_ + 1) );
R(re, "\w\d");
printf("Hello, world[%s]\n", re);
It's portable in both C and C++, only uses standard preprocessing features. The trick is to use macro to expand \ inside liternal string and then remove the leading and tailing double quote strings.
Now I think it's the best way until C++0x really introduce the new literal string syntax R"...". And for C I think it'll be the best way for a long time.
The side effect is that we cannot defined such a variable in the global scope in C. Because there's a statement to remove the tailing double-quote character. In C++ it's OK.
You can put your regexp in a file and read the file if you have a lot or need to modify them often. That's the only way I see to avoid backslashes.
No. There is only one kind of string literals in C++, and it's the kind that treats escaped characters.

Why must C/C++ string literal declarations be single-line?

Is there any particular reason that multi-line string literals such as the following are not permitted in C++?
string script =
"
Some
Formatted
String Literal
";
I know that multi-line string literals may be created by putting a backslash before each newline.
I am writing a programming language (similar to C) and would like to allow the easy creation of multi-line strings (as in the above example).
Is there any technical reason for avoiding this kind of string literal? Otherwise I would have to use a python-like string literal with a triple quote (which I don't want to do):
string script =
"""
Some
Formatted
String Literal
""";
Why must C/C++ string literal declarations be single-line?
The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.
There are, of course, ways around this. You can use line splicing:
const char* script = "\
Some\n\
Formatted\n\
String Literal\n\
";
If the \ appears as the last character on the line, the newline will be removed during preprocessing.
Or, you can use string literal concatenation:
const char* script =
" Some\n"
" Formatted\n"
" String Literal\n";
Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.
Using either technique, the string literal ends up as if it were written:
const char* script = " Some\n Formatted\n String Literal\n";
One has to consider that C was not written to be an "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and your user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthermore, string manipulation would not be a primary concern for someone looking to write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testament to the fact they weren't using C to do significant string manipulation.
Additional considerations: it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and come back the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really. It can actually be less work for the compiler. But you were going to come back for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in their programs.
In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Grammatically speaking, it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.
In addition to the existing answers, you can work around this using C++11's raw string literals, e.g.:
#include <iostream>
#include <string>
int main() {
std::string str = R"(a
b)";
std::cout << str;
}
/* Output:
a
b
*/
Live demo.
[n3290: 2.14.5/4]: [ Note: A source-file new-line in a raw string
literal results in a new-line in the resulting execution
string-literal. Assuming no whitespace at the beginning of lines in
the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—end note ]
Though non-normative, this note and the example that follows it in [n3290: 2.14.5/5] serve to complement the indication in the grammar that the production r-char-sequence may contain newlines (whereas the production s-char-sequence, used for normal string literals, may not).
Others have mentioned some excellent workarounds, I just wanted to address the reason.
The reason is simply that C was created at a time when processing was at a premium and compilers had to be simple and as fast as possible. These days, if C were to be updated (I'm looking at you, C1X), it's quite possible to do exactly what you want. It's unlikely, however. Mostly for historical reasons; such a change could require extensive rewrites of compilers, and so will likely be rejected.
The C preprocessor works on a line-by-line basis, but with lexical tokens. That means that the preprocessor understands that "foo" is a token. If C were to allow multi-line literals, however, the preprocessor would be in trouble. Consider:
"foo
#ifdef BAR
bar
#endif
baz"
The preprocessor isn't able to mess with the inside of a token - but it's operating line-by-line. So how is it supposed to handle this case? The easy solution is to simply forbid multiline strings entirely.
Actually, you can break it up thus:
string script =
"\n"
" Some\n"
" Formatted\n"
" String Literal\n";
Adjacent string literals are concatenated by the compiler.
Strings can lay on multiple lines, but each line has to be quoted individually :
string script =
" \n"
" Some \n"
" Formatted \n"
" String Literal ";
I am writing a programming language
(similar to C) and would like to let
write multi-line strings easily (like
in above example).
There is no reason why you couldn't create a programming language that allows multi-line strings.
For example, Vedit Macro Language (which is C-like scripting language for VEDIT text editor) allows multi-line strings, for example:
Reg_Set(1,"
Some
Formatted
String Literal
")
It is up to you how you define your language syntax.
You can also do:
string useMultiple = "this"
"is "
"a string in C.";
Place one literal after another without any special chars.
Literal declarations doesn't have to be single-line.
GPUImage inlines multiline shader code. Checkout its SHADER_STRING macro.