C++ multiline string literal - c++

Is there any way to have multi-line plain-text, constant literals in C++, à la Perl? Maybe some parsing trick with #includeing a file? I can't think of one, but boy, that would be nice. I know it'll be in C++0x.

Well ... Sort of. The easiest is to just use the fact that adjacent string literals are concatenated by the compiler:
const char *text =
"This text is pretty long, but will be "
"concatenated into just a single string. "
"The disadvantage is that you have to quote "
"each part, and newlines must be literal as "
"usual.";
The indentation doesn't matter, since it's not inside the quotes.
You can also do this, as long as you take care to escape the embedded newline. Failure to do so, like my first answer did, will not compile:
const char *text2 =
"Here, on the other hand, I've gone crazy \
and really let the literal span several lines, \
without bothering with quoting each line's \
content. This works, but you can't indent.";
Again, note those backslashes at the end of each line, they must be immediately before the line ends, they are escaping the newline in the source, so that everything acts as if the newline wasn't there. You don't get newlines in the string at the locations where you had backslashes. With this form, you obviously can't indent the text since the indentation would then become part of the string, garbling it with random spaces.

In C++11 you have raw string literals. Sort of like here-text in shells and script languages like Python and Perl and Ruby.
const char * vogon_poem = R"V0G0N(
O freddled gruntbuggly thy micturations are to me
As plured gabbleblochits on a lurgid bee.
Groop, I implore thee my foonting turlingdromes.
And hooptiously drangle me with crinkly bindlewurdles,
Or I will rend thee in the gobberwarts with my blurlecruncheon, see if I don't.
(by Prostetnic Vogon Jeltz; see p. 56/57)
)V0G0N";
All the spaces and indentation and the newlines in the string are preserved.
These can also be utf-8|16|32 or wchar_t (with the usual prefixes).
I should point out that the escape sequence, V0G0N, is not actually needed here. Its presence would allow putting )" inside the string. In other words, I could have put
"(by Prostetnic Vogon Jeltz; see p. 56/57)"
(note extra quotes) and the string above would still be correct. Otherwise I could just as well have used
const char * vogon_poem = R"( ... )";
The parens just inside the quotes are still needed.

You can also do this:
const char *longString = R""""(
This is
a very
long
string
)"""";

#define MULTILINE(...) #__VA_ARGS__
Consumes everything between the parentheses.
Replaces any number of consecutive whitespace characters by a single space.

A probably convenient way to enter multi-line strings is by using macro's. This only works if quotes and parentheses are balanced and it does not contain 'top level' comma's:
#define MULTI_LINE_STRING(a) #a
const char *text = MULTI_LINE_STRING(
Using this trick(,) you don't need to use quotes.
Though newlines and multiple white spaces
will be replaced by a single whitespace.
);
printf("[[%s]]\n",text);
Compiled with gcc 4.6 or g++ 4.6, this produces: [[Using this trick(,) you don't need to use quotes. Though newlines and multiple white spaces will be replaced by a single whitespace.]]
Note that the , cannot be in the string, unless it is contained within parenthesis or quotes. Single quotes is possible, but creates compiler warnings.
Edit: As mentioned in the comments, #define MULTI_LINE_STRING(...) #__VA_ARGS__ allows the use of ,.

You can just do this:
const char *text = "This is my string it is "
"very long";

Just to elucidate a bit on #emsr's comment in #unwind's answer, if one is not fortunate enough to have a C++11 compiler (say GCC 4.2.1), and one wants to embed the newlines in the string (either char * or class string), one can write something like this:
const char *text =
"This text is pretty long, but will be\n"
"concatenated into just a single string.\n"
"The disadvantage is that you have to quote\n"
"each part, and newlines must be literal as\n"
"usual.";
Very obvious, true, but #emsr's short comment didn't jump out at me when I read this the first time, so I had to discover this for myself. Hopefully, I've saved someone else a few minutes.

Since an ounce of experience is worth a ton of theory, I tried a little test program for MULTILINE:
#define MULTILINE(...) #__VA_ARGS__
const char *mstr[] =
{
MULTILINE(1, 2, 3), // "1, 2, 3"
MULTILINE(1,2,3), // "1,2,3"
MULTILINE(1 , 2 , 3), // "1 , 2 , 3"
MULTILINE( 1 , 2 , 3 ), // "1 , 2 , 3"
MULTILINE((1, 2, 3)), // "(1, 2, 3)"
MULTILINE(1
2
3), // "1 2 3"
MULTILINE(1\n2\n3\n), // "1\n2\n3\n"
MULTILINE(1\n
2\n
3\n), // "1\n 2\n 3\n"
MULTILINE(1, "2" \3) // "1, \"2\" \3"
};
Compile this fragment with cpp -P -std=c++11 filename to reproduce.
The trick behind #__VA_ARGS__ is that __VA_ARGS__ does not process the comma separator. So you can pass it to the stringizing operator. Leading and trailing spaces are trimmed, and spaces (including newlines) between words are compressed to a single space then. Parentheses need to be balanced. I think these shortcomings explain why the designers of C++11, despite #__VA_ARGS__, saw the need for raw string literals.

// C++11.
std::string index_html=R"html(
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>VIPSDK MONITOR</title>
<meta http-equiv="refresh" content="10">
</head>
<style type="text/css">
</style>
</html>
)html";

Option 1. Using boost library, you can declare the string as below
const boost::string_view helpText = "This is very long help text.\n"
"Also more text is here\n"
"And here\n"
// Pass help text here
setHelpText(helpText);
Option 2. If boost is not available in your project, you can use std::string_view() in modern C++.

Related

error: multiple repeat for regex in robot [duplicate]

I'm trying to determine whether a term appears in a string.
Before and after the term must appear a space, and a standard suffix is also allowed.
Example:
term: google
string: "I love google!!! "
result: found
term: dog
string: "I love dogs "
result: found
I'm trying the following code:
regexPart1 = "\s"
regexPart2 = "(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)
and get the error:
raise error("multiple repeat")
sre_constants.error: multiple repeat
Update
Real code that fails:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)
On the other hand, the following term passes smoothly (+ instead of ++)
term = 'lg incite" OR author:"http+www.dealitem.com" OR "for sale'
The problem is that, in a non-raw string, \" is ".
You get lucky with all of your other unescaped backslashes—\s is the same as \\s, not s; \( is the same as \\(, not (, and so on. But you should never rely on getting lucky, or assuming that you know the whole list of Python escape sequences by heart.
Either print out your string and escape the backslashes that get lost (bad), escape all of your backslashes (OK), or just use raw strings in the first place (best).
That being said, your regexp as posted won't match some expressions that it should, but it will never raise that "multiple repeat" error. Clearly, your actual code is different from the code you've shown us, and it's impossible to debug code we can't see.
Now that you've shown a real reproducible test case, that's a separate problem.
You're searching for terms that may have special regexp characters in them, like this:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
That p++ in the middle of a regexp means "1 or more of 1 or more of the letter p" (in the others, the same as "1 or more of the letter p") in some regexp languages, "always fail" in others, and "raise an exception" in others. Python's re falls into the last group. In fact, you can test this in isolation:
>>> re.compile('p++')
error: multiple repeat
If you want to put random strings into a regexp, you need to call re.escape on them.
One more problem (thanks to Ωmega):
. in a regexp means "any character". So, ,|.|;|:" (I've just extracted a short fragment of your longer alternation chain) means "a comma, or any character, or a semicolon, or a colon"… which is the same as "any character". You probably wanted to escape the ..
Putting all three fixes together:
term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|\.|;|:|\(|\)|\"|\?+)?\s"
p = re.compile(regexPart1 + re.escape(term) + regexPart2 , re.IGNORECASE)
As Ωmega also pointed out in a comment, you don't need to use a chain of alternations if they're all one character long; a character class will do just as well, more concisely and more readably.
And I'm sure there are other ways this could be improved.
The other answer is great, but I would like to point out that using regular expressions to find strings in other strings is not the best way to go about it. In python simply write:
if term in string:
#do whatever
i have an example_str = "i love you c++" when using regex get error multiple repeat Error. The error I'm getting here is because the string contains "++" which is equivalent to the special characters used in the regex. my fix was to use re.escape(example_str ), here is my code.
example_str = "i love you c++"
regex_word = re.search(rf'\b{re.escape(word_filter)}\b', word_en)
Also make sure that your arguments are in the correct order!
I was trying to run a regular expression on some html code. I kept getting the multiple repeat error, even with very simple patterns of just a few letters.
Turns out I had the pattern and the html mixed up. I tried re.findall(html, pattern) instead of re.findall(pattern, html).
A general solution to "multiple repeat" is using re.escape to match the literal pattern.
Example:
>>>> re.compile(re.escape("c++"))
re.compile('c\\+\\+')
However if you want to match a literal word with space before and after try out this example:
>>>> re.findall(rf"\s{re.escape('c++')}\s", "i love c++ you c++")
[' c++ ']

JSONCPP is adding extra double quotes to string

I have a root in JSONcpp having string value like this.
Json::Value root;
std::string val = "{\"stringval\": \"mystring\"}";
Json::Reader reader;
bool parsingpassed = reader.parse(val, root, false);
Now when I am trying to retrieve this value using this piece of code.
Json::StreamWriterBuilder builder;
builder.settings_["indentation"] = "";
std::string out = Json::writeString(builder, root["stringval"]);
here out string ideally should be giving containing:
"mystring"
whereas it is giving output like this:
"\"mystring\"" \\you see this in debug mode if you check your string content
by the way if you print this value using stdout it will be printed something like this::
"mystring" \\ because \" is an escape sequence and prints " in stdout
it should be printing like this in stdout:
mystring \\Expected output
Any idea how to avoid this kind of output when converting json output to std::string ?
Please avoid suggesting fastwriter as it also adds newline character and it deprecated API as well.
Constraint: I do not want to modify the string by removing extra \" with string manipulation rather I am willing to know how I can I do that with JSONcpp directly.
This is StreamWriterBuilder Reference code which I have used
Also found this solution, which gives optimal solution to remove extra quotes from your current string , but I don't want it to be there in first place
I had this problem also until I realized you have to use the Json::Value class accessor functions, e.g. root["stringval"] will be "mystring", but root["stringval"].asString() will be mystring.
Okay so This question did not get answer after thorough explanation as well and I had to go through JSONCPP apis and documentation for a while.
I did not find any api as of now which takes care of this scenario of extra double quote addition.
Now from their wikibook I could figure out that some escape sequences might come in String. It is as designed and they haven't mentioned exact scenario.
\" - quote
\\ - backslash
\/ - slash
\n - newline
\t - tabulation
\r - carriage return
\b - backspace
\f - form feed
\uxxxx , where x is a hexadecimal digit - any 2-byte symbol
Link Explaining what all extra Escape Sequence might come in String
Anyone coming around this if finds out better explanation for the same issue , please feel free to post your answer.Till then I guess only string manipulation is the option to remove those extra escape sequence..

How to stop Ember.Handlebars.Utils.escapeExpression escaping apostrophes

I'm fairly new to Ember, but I'm on v1.12 and struggling with the following problem.
I'm making a template helper
The helper takes the bodies of tweets and HTML anchors around the hashtags and usernames.
The paradigm I'm following is:
use Ember.Handlebars.Utils.escapeExpression(value); to escape the input text
do logic
use Ember.Handlebars.SafeString(value);
However, 1. seems to escape apostrophes. Which means that any sentences I pass to it get escaped characters. How can I avoid this whilst making sure that I'm not introducing potential vulnerabilities?
Edit: Example code
export default Ember.Handlebars.makeBoundHelper(function(value){
// Make sure we're safe kids.
value = Ember.Handlebars.Utils.escapeExpression(value);
value = addUrls(value);
return new Ember.Handlebars.SafeString(value);
});
Where addUrlsis a function that uses a RegEx to find and replace hashtags or usernames. For example, if it were given #emberjs foo it would return #emberjs foo.
The result of the above helper function would be displayed in an Ember (HTMLBars) template.
escapeExpression is designed to convert a string into the representation which, when inserted in the DOM, with escape sequences translated by the browser, will result in the original string. So
"1 < 2"
is converted into
"1 < 2"
which when inserted into the DOM is displayed as
1 < 2
If "1 < 2" were inserted directly into the DOM (eg with innerHTML), it would cause quite a bit of trouble, because the browser would interpret < as the beginning of a tag.
So escapeExpression converts ampersands, less than signs, greater than signs, straight single quotes, straight double quotes, and backticks. The conversion of quotes is not necessary for text nodes, but could be for attribute values, since they may enclosed in either single or double quotes while also containing such quotes.
Here's the list used:
var escape = {
"&": "&",
"<": "<",
">": ">",
'"': """,
"'": "'",
"`": "`"
};
I don't understand why the escaping of the quotes should be causing you a problem. Presumably you're doing the escapeExpression because you want characters such as < to be displayed properly when output into a template using normal double-stashes {{}}. Precisely the same thing applies to the quotes. They may be escaped, but when the string is displayed, it should display fine.
Perhaps you can provide some more information about input and desired output, and how you are "printing" the strings and in what contexts you are seeing the escaped quote marks when you don't want to.

Strategy to replace spaces in string

I need to store a string replacing its spaces with some character. When I retrieve it back I need to replace the character with spaces again. I have thought of this strategy while storing I will replace (space with _a) and (_a with _aa) and while retrieving will replace (_a with space) and (_aa with _a). i.e even if the user enters _a in the string it will be handled. But I dont think this is a good strategy. Please let me know if anyone has a better one?
Replacing spaces with something is a problem when something is already in the string. Why don't you simply encode the string - there are many ways to do that, one is to convert all characters to hexadecimal.
For instance
Hello world!
is encoded as
48656c6c6f20776f726c6421
The space is 0x20. Then you simply decode back (hex to ascii) the string.
This way there are no space in the encoded string.
-- Edit - optimization --
You replace all % and all spaces in the string with %xx where xx is the hex code of the character.
For instance
Wine having 12% alcohol
becomes
Wine%20having%2012%25%20alcohol
%20 is space
%25 is the % character
This way, neither % nor (space) are a problem anymore - Decoding is easy.
Encoding algorithm
- replace all `%` with `%25`
- replace all ` ` with `%20`
Decoding algorithm
- replace all `%xx` with the character having `xx` as hex code
(You may even optimize more since you need to encode only two characters: use %1 for % and %2 for , but I recommend the %xx solution as it is more portable - and may be utilized later on if you need to code more characters)
I'm not sure your solution will work. When reading, how would you
distinguish between strings that were orginally " a" and strings that
were originally "_a": if I understand correctly, both will end up
"_aa".
In general, given a situation were a specific set of characters cannot
appear as such, but must be encoded, the solution is to choose one of
allowed characters as an "escape" character, remove it from the set of
allowed characters, and encode all of the forbidden characters
(including the escape character) as a two (or more) character sequence
starting with the escape character. In C++, for example, a new line is
not allowed in a string or character literal. The escape character is
\; because of that, it must be encoded as an escape sequence as well.
So we have "\n" for a new line (the choice of n is arbitrary), and
"\\" for a \. (The choice of \ for the second character is also
arbitrary, but it is fairly usual to use the escape character, escaped,
to represent itself.) In your case, if you want to use _ as the
escape character, and "_a" to represent a space, the logical choice
would be "__" to represent a _ (but I'd suggest something a little
more visually suggestive—maybe ^ as the escape, with "^_" for
a space and "^^" for a ^). When reading, anytime you see the escape
character, the following character must be mapped (and if it isn't one
of the predefined mappings, the input text is in error). This is simple
to implement, and very reliable; about the only disadvantage is that in
an extreme case, it can double the size of your string.
You want to implement this using C/C++? I think you should split your string into multiple part, separated by space.
If your string is like this : "a__b" (multiple space continuous), it will be splited into:
sub[0] = "a";
sub[1] = "";
sub[2] = "b";
Hope this will help!
With a normal string, using X characters, you cannot write or encode a string with x-1 using only 1 character/input character.
You can use a combination of 2 chars to replace a given character (this is exactly what you are trying in your example).
To do this, loop through your string to count the appearances of a space combined with its length, make a new character array and replace these spaces with "//" this is just an example though. The problem with this approach is that you cannot have "//" in your input string.
Another approach would be to use a rarely used char, for example "^" to replace the spaces.
The last approach, popular in a combination of these two approaches. It is used in unix, and php to have syntax character as a literal in a string. If you want to have a " " ", you simply write it as \" etc.
Why don't you use Replace function
String* stringWithoutSpace= stringWithSpace->Replace(S" ", S"replacementCharOrText");
So now stringWithoutSpace contains no spaces. When you want to put those spaces back,
String* stringWithSpacesBack= stringWithoutSpace ->Replace(S"replacementCharOrText", S" ");
I think just coding to ascii hexadecimal is a neat idea, but of course doubles the amount of storage needed.
If you want to do this using less memory, then you will need two-letter sequences, and have to be careful that you can go back easily.
You could e.g. replace blank by _a, but you also need to take care of your escape character _. To do this, replace every _ by __ (two underscores). You need to scan through the string once and do both replacements simultaneously.
This way, in the resulting text all original underscores will be doubled, and the only other occurence of an underscore will be in the combination _a. You can safely translate this back. Whenever you see an underscore, you need a lookahed of 1 and see what follows. If an a follows, then this was a blank before. If _ follows, then it was an underscore before.
Note that the point is to replace your escape character (_) in the original string, and not the character sequence to which you map the blank. Your idea with replacing _a breaks. as you do not know if _aa was originally _a or a (blank followed by a).
I'm guessing that there is more to this question than appears; for example, that you the strings you are storing must not only be free of spaces, but they must also look like words or some such. You should be clear about your requirements (and you might consider satisfying the curiosity of the spectators by explaining why you need to do such things.)
Edit: As JamesKanze points out in a comment, the following won't work in the case where you can have more than one consecutive space. But I'll leave it here anyway, for historical reference. (I modified it to compress consecutive spaces, so it at least produces unambiguous output.)
std::string out;
char prev = 0;
for (char ch : in) {
if (ch == ' ') {
if (prev != ' ') out.push_back('_');
} else {
if (prev == '_' && ch != '_') out.push_back('_');
out.push_back(ch);
}
prev = ch;
}
if (prev == '_') out.push_back('_');

Using boost::regex to replace a backslash with double backslash and double quote with a slash quote

I'm going batty trying to get this to work. Here's what I have so far, but ça ne marche pas.
const std::string singleslash("\\\\\\\\");
const std::string doublequote("\\\"\"\\");
const std::string doubleslash("\\\\\\\\\\\\");
const std::string slashquote("\\\\\\\\\"\\");
std::string temp(Variables);
temp.assign(boost::regex_replace(temp,boost::regex(singleslash),doubleslash,boost::match_default));
temp.assign(boost::regex_replace(temp,boost::regex(doublequote),slashquote,boost::match_default));
Someone please save me.
Update It seems that I'm not using regex_replace properly. Here's a simpler example that doesn't work either...
std::string w("Watermelon");
temp.assign(boost::regex_replace(w,boost::regex("W"),"x",boost::match_all | boost::format_all));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
This gives me "Watermelon" instead of "xatermelon"
Update 2 Using boost::regex wrong... this one works
boost::regex pattern("W");
temp.assign(boost::regex_replace(w,pattern,std::string("x")));
Update 3 Here's what ultimately worked
std::string w("Watermelon wishes backslash \\ and another backslash \\ and \"\"fatness\"\"");
temp.assign(w);
MessageBox((HWND)Window, temp.c_str(), "original", MB_OK);
const boost::regex singlebackslashpat("\\\\");
const std::string doublebackslash("\\\\\\\\");
temp.assign(boost::regex_replace(w,singlebackslashpat,doublebackslash));
MessageBox((HWND)Window, temp.c_str(), "double-backslash", MB_OK);
const boost::regex doublequotepat("\"\"");
const std::string backslashquote("\\\\\\\"");
temp.assign(boost::regex_replace(temp,doublequotepat,backslashquote));
MessageBox((HWND)Window, temp.c_str(), "temp", MB_OK);
So, I'm not a boost::regex expert and don't have Boost conveniently installed where I am right now, but let's try to work this through step by step.
The patterns to match against
To match a double-quote in the input, you just need a double-quote in the regex (double-quotes aren't magical in regexes), which means all you need is a string containing a double-quote. "\"" should be fine.
To match a backslash in the input, you need an escaped backslash in the regex, which means two consecutive backslashes; each of those needs to be doubled again in a string literal. So "\\\\". [EDITED: I typed eight instead of four before, which was a mistake.]
The output formats
Again, double-quotes aren't magical in match replacement formats (or whatever the right terminology is) but backslashes are. So to get two backslashes in the output you need four in the string, which means you need 8 in the string literal. So: "\\\\\\\\".
To get a backslash followed by a double-quote, your string needs to be two backslashes and a double-quote, and all of those need to be preceded with backslashes in the string literal. So: "\\\\\"".
[EDITED to add the actual code for easier copy-and-pasting:]
const std::string singleslash("\\\\");
const std::string doublequote("\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");
Matching flags
After reading tofutim's update, I tried to look up match_all and found no documentation for it. It does, however, appear to be a possible match flag value, and the header file in which it's defined has the following cryptic comment next to it: "must find the whole of input even if match_any is set". The similarly-cryptic comment attached to match_any is "don't care what we match". I'm not sure what any of that means and it seems like these flags are deprecated or something, but in any case you probably don't want to be using them.
(After a very quick look at the source, I think what match_all does is to accept a match only if it ends at the end of the input. So you might try replacing n instead of W in your revised test case and see whether that works. Alternatively, perhaps I missed something and it has to match the entire input, which you could check by replacing Watermelon instead of W or n. Or you could not bother, if you happen not to be curious about this.)
Give that a try and report back...
I have no boost here, but single(back)slash must be written as \\ in regex and thus as c++ string literal is four backslahses. The replacement string has to be escaped and in c++ again, so its eight backslashes.
Double quote in regex must not be escaped, so it is "" and in c++ \"\". The replacement again has to be escaped, so its \\", and of course in c++, so it is \\\\\".
according to your update 3 the patterns and replacement strings must be initialized like this:
const std::string singleslashpat("\\\\");
const std::string doublequotepat("\"\"");
const std::string doubleslash("\\\\\\\\");
const std::string slashquote("\\\\\"");