I am working to convert multiline strings into a list of tokens that might be easier for me to work with.
In accordance with the specific needs of my project, I'm padding any carat symbol that appears in my input with spaces, so that "^" gets turned into " ^ ". I'm using something like the following function to do so:
let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)
I then use something like the below function to then turn this multiline string into a list of tokens (ignoring whitespace).
let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;
For some reason, bad_function adds carats to places where they shouldn't be. Take the following line of code:
(bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
The first line of the string turns into:
^ This is some \n ^
When I feed the output from bad_function into string_to_tokens I get the following list:
string_to_tokens (bad_function " This is some
multiline input
with newline characters
and tabs. When I convert this string
into a list of tokens I get ^s showing up where
they shouldn't. ")
["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
"newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
"this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
"^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]
Why is this happening, and how can I fix so these functions behave like I want them to?
As explained in the Str module.
^ Matches at beginning of line: either at the beginning of the
matched string, or just after a '\n' character.
So you have to quote the '^' character using the escape character "\".
However, note that (also from the doc)
any backslash character in the regular expression must be doubled to
make it past the OCaml string parser.
This means you have to put a double '\' to do what you want without getting a warning.
This should do the job:
let bad_function string = Str.global_replace (Str.regexp "\\^") " ^ " (string);;
Take a look at the following example:
cout << "option 1:
\n option 2:
\n option 3";
I know,it's not the best way to output a string,but the question is why does this cause an error saying that a " character is missing?There is a single string that must go to stdout but it just consists of a lot of whitespace charcters.
What about this:
string x="
string_test";
One may interpret that string as: "\nxxxxxxxxxxxxstring_test" where x is a whitespace character.
Is it a convention?
That's called multiline string literal.
You need to escape the embedded newline. Otherwise, it will not compile:
std::cout << "Hello world \
and stackoverflow";
Note: Backslashes must be immediately before the line ends as they need to escape the newline in the source.
Also you can use the fun fact "Adjacent string literals are concatenated by the compiler" for your advantage by this:
std::cout << "Hello World"
"Stack overflow";
See this for raw string literals. In C++11, we have raw string literals. They are kind of like here-text.
Syntax:
prefix(optional) R"delimiter( raw_characters )delimiter"
It allows any character sequence, except that it must not contain the
closing sequence )delimiter". It is used to avoid escaping of any
character. Anything between the delimiters becomes part of the string.
const char* s1 = R"foo(
Hello
World
)foo";
Example taken from cppreference.
I need to generate a string that can match another both containing special characters. I wrote what I thought would be a simple method, but so far nothing has given me a successful match.
I know that specials characters in c++ are preceded with a "\". Per example a single quote would be written as "\'".
string json_string(const string& incoming_str)
{
string str = "\\\"" + incoming_str + "\\\"";
return str;
}
And this is the string I have to compare to:
bool comp = json_string("hello world") == "\"hello world\"";
I can see in the cout stream that in fact I'm generating the string as needed but the comparison still gives a false value.
What am I missing? Any help would be appreciated.
One way is to filter one string and compare this filtered string. For example:
#include <iostream>
#include <algorithm>
using namespace std;
std::string filterBy(std::string unfiltered, std::string specialChars)
{
std::string filtered;
std::copy_if(unfiltered.begin(), unfiltered.end(),
std::back_inserter(filtered), [&specialChars](char c){return specialChars.find(c) == -1;});
return filtered;
}
int main() {
std::string specialChars = "\"";
std::string string1 = "test";
std::string string2 = "\"test\"";
std::cout << (string1 == filterBy(string2, specialChars) ? "match" : "no match");
return 0;
}
Output is match. This code also works if you add an arbitrary number of characters to specialChars.
If both strings contain special characters, you can also put string1 through the filterBy function. Then, something like:
"\"hello \" world \"" == "\"hello world "
will also match.
If the comparison is performance-critical, you might also have a comparison that uses two iterators, getting a comparison complexity of log(N+M), where N and M are the sizes of the two strings, respectively.
bool comp = json_string("hello world") == "\"hello world\"";
This will definitely yield false. You are creating string \"hello world\" by json_string("hello world") but comparing it to "hello world"
The problem is here:
string str = "\\\"" + incoming_str + "\\\"";
In your first string literal of str, the first character backlash that you’re assuming to be treated like escape character is not actually being treated an escape character, rather just a backslash in your string literal. You do the same in your last string literal.
Do this:
string str = "\"" + incoming_str + "\"";
In C++ string literals are delimited by quotes.
Then the problem arises: How can I define a string literal that does itself contain quotes? In Python (for comparison), this can get easy (but there are other drawbacks with this approach not of interest here): 'a string with " (quote)'.
C++ doesn't have this alternative string representation1, instead, you are limited to using escape sequences (which are available in Python, too – just for completeness...): Within a string (or character) literal (but nowhere else!), the sequence \" will be replaced by a single quote in the resulting string.
So "\"hello world\"" defined as character array would be:
{ '"', 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '"', 0 };
Note that now the escape character is not necessary...
Within your json_string function, you append additional backslashes, though:
"\\\""
{ '\', '"', 0 }
//^^^
Note that I wrote '\' just for illustration! How would you define single quote? By escaping again! '\'' – but now you need to escape the escape character, too, so a single backslash actually needs to be written as '\\' here (wheras in comparison, you don't have to escape the single quote in a string literal: "i am 'singly quoted'" – just as you didn't have to escape the double quote in the character literal).
As JSON uses double quotes for strings, too, you'd most likely want to change your function:
return "\"" + incoming_str + "\"";
or even much simpler:
return '"' + incoming_str + '"';
Now
json_string("hello world") == "\"hello world\""
would yield true...
1 Side note (stolen from answer deleted in the meanwhile): Since C++11, there are raw string literals, too. Using these, you don't have to escape either.
how can print escape characters without further processing and as \t or \n or ... in std::cout?
I dont want to process text manually before sending it to output?
Is there any switch to std::cout for this purpose?
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal starts with R"( and ends in )", let's see it in an example the difference between a normal string and a raw string in C++:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
cout<<raw_str<<endl;
result:
~$ ./a.out
First line.\nSecond line.\nEnd of message.\n
If you add one extra slash there as \\t you can see \t in the output of std::cout
For example: cout<<"\\t hello" will print \t hello.
I hope this helps
I can't realize how could it be possible to print a string this way without any complaint by the compiler:
std::cout << "Hello " "World!";
In fact, the above line works exactly like:
std::cout << "Hello " << "World!";
Is there an explanation for this behaviour?
Adjacent literal tokens are concatenated automatically, it's part of the standard.
2.1 Phases of translation [lex.phases]
6) Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
(C++03)
In C++, literals tokens can be concatenated thusly:
const char* thingy = "Hello" "World";
"Hello" and "World" are each a literal token.
This is normal behavior of the strings. In the first line specified strings are concatenated by compiler automatically. As sample you can specify also multiline to avoid very long line.
const char *strLine = "line 1 "
"line 1 "
"line 2 ";
And it will work OK. The second line is cleared, specified another line for output.