C++ string variables not accepting enter and tab whitespace? - c++

Why does the following work:
string input = "a long string of text pasted from a .txt file";
But this version does not?
string input =
"
some
large
string ";
I thought C++ doesn't care about whitespace.

You can do something like this. It's called a raw string literal:
string input =
R"(
some
large
string )";
This will include the endline characters as well. The format is R"(string-literal)"

For the most parts no, it does not care about whitespace. But there are exceptions and string literals are one of them.
The rule is string literals cannot span multiple lines. But adjacent literals are automatically concatenated so you can just do
const char string[] = "very "
"long "
"string";
and it will be equivalent to
const char string[] = "very long string";
I am not sure about the origin of the rule, I suspect it might have been done to prevent confusion whether the newline should be part of the string or not (it's not unless explicitly escaped). Or maybe just some grammar/parser thing. Compiling C/C++ is kind of complicated and happens in multiple phases, see cppreference - string literals already have plenty of special treatment.

Related

Convert string to raw string

char str[] = "C:\Windows\system32"
auto raw_string = convert_to_raw(str);
std::cout << raw_string;
Desired output:
C:\Windows\system32
Is it possible? I am not a big fan of cluttering my path strings with extra backslash. Nor do I like an explicit R"()" notation.
Any other work-around of reading a backslash in a string literally?
That's not possible, \ has special meaning inside a non-raw string literal, and raw string literals exist precisely to give you a chance to avoid having to escape stuff. Give up, what you need is R"(...)".
Indeed, when you write something like
char const * str{"a\nb"};
you can verify yourself that strlen(str) is 3, not 4, which means that once you compile that line, in the binary/object file there's only one single character, the newline character, corresponding to \n; there's no \ nor n anywere in it, so there's no way you can retrieve them.
As a personal taste, I find raw string literals great! You can even put real Enter in there. Often just for the price of 3 characters - R, (, and ) - in addtion to those you would write anyway. Well, you would have to write more characters to escape anything needs escaping.
Look at
std::string s{R"(Hello
world!
This
is
Me!)"};
That's 28 keystrokes from R to last " included, and you can see in a glimpse it's 6 lines.
The equivalent non-raw string
std::string s{"Hello\nworld!\nThis\nis\nMe!"};
is 30 keystrokes from R to last " included, and you have to parse it carefully to count the lines.
A pretty short string, and you already see the advantage.
To answer the question, as asked, no it is not possible.
As an example of the impossibility, assume we have a path specified as "C:\a\b";
Now, str is actually represented in memory (in your program when running) using a statically allocated array of five characters with values {'C', ':', '\007', '\010', '\000'} where '\xyz' represents an OCTAL representation (so '\010' is a char equal to numerically to 8 in decimal).
The problem is that there is more than one way to produce that array of five characters using a string literal.
char str[] = "C:\a\b";
char str1[] = "C:\007\010";
char str2[] = "C:\a\010";
char str3[] = "C:\007\b";
char str4[] = "C:\x07\x08"; // \xmn uses hex coding
In the above, str1, str2, str3, and str4 are all initialised using equivalent arrays of five char.
That means convert_to_raw("C:\a\b") could quite legitimately assume it is passed ANY of the strings above AND
std::cout << convert_to_raw("C:\a\b") << '\n';
could quite legitimately produce output of
C:\007\010
(or any one of a number of other strings).
The practical problem with this, if you are working with windows paths, is that c:\a\b, C:\007\010, C:\a\010, C:\007\b, and C:\x07\x08 are all valid filenames under windows - that (unless they are hard links or junctions) name DIFFERENT files.
In the end, if you want to have string literals in your code representing filenames or paths, then use \\ or a raw string literal when you need a single backslash. Alternatively, write your paths as string literals in your code using all forward slashes (e.g. "C:/a/b") since windows API functions accept those too.

C++: Quote escapes for an entire line [duplicate]

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

What is a raw string?

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

How to save text file to struct with string in C++

I'm wanting to save the content of a file to a struct. I've tried to use seekg and read to write to it but it isn't working.
My file is something like:
johnmayer24ericclapton32
I want to store the name, the last name and the age in a struct like that
typedef struct test_struct{
string name;
string last_name;
int age;
} test_struct;
Here is my code
int main(){
test_struct ts;
ifstream data_base;
data_base.open("test_file.txt");
data_base.seekg(0, ios_base::beg);
data_base.read(ts, sizeof(test_struct));
data_base.close();
return 0;
}
It doesn't compile as it don't want me to use ts on the read function. Is there another way - or a way - of doing it?
Serialization/Deserialization of strings is tricky.
As binary data the convention is to output the length of the string first, then the string data.
https://isocpp.org/wiki/faq/serialization#serialize-binary-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\0' if some string might contain that character; recall that std::string can store '\0'. The easiest solution is to write the integer length just before the string data. Make sure the integer length is written in “network format” to avoid sizeof and endian problems (see the solutions in earlier bullets).
That way when reading the data back in you know the length of the string to expect and can preallocate the size of the string then just read that much data from the stream.
If your data is a non-binary (text) format it's a little trickier:
https://isocpp.org/wiki/faq/serialization#serialize-text-format
String data is tricky because you have to unambiguously know when the string’s body stops. You can’t unambiguously terminate all strings with a '\n' or '"' or even '\0' if some string might contain those characters. You might want to use C++ source-code escape-sequences, e.g., writing '\' followed by 'n' when you see a newline, etc. After this transformation, you can either make strings go until end-of-line (meaning they are deliminated by '\n') or you can delimit them with '"'.
If you use C++-like escape-sequences for your string data, be sure to always use the same number of hex digits after '\x' and '\u'. I typically use 2 and 4 digits respectively. Reason: if you write a smaller number of hex digits, e.g., if you simply use stream << "\x" << hex << unsigned(theChar), you’ll get errors when the next character in the string happens to be a hex digit. E.g., if the string contains '\xF' followed by 'A', you should write "\x0FA", not "\xFA".
If you don’t use some sort of escape sequence for characters like '\n', be careful that the operating system doesn’t mess up your string data. In particular, if you open a std::fstream without std::ios::binary, some operating systems translate end-of-line characters.
Another approach for string data is to prefix the string’s data with an integer length, e.g., to write "now is the time" as 15:now is the time. Note that this can make it hard for people to read/write the file, since the value just after that might not have a visible separator, but you still might find it useful.
Text-based serialization/deserialization convention varies but one field per line is an accepted practice.
You'll have to develop a specific algorithm, since there is no separator character between the "fields".
static const std::string input_text = "johnmayer24ericclapton32";
static const std::string alphabet = "abcdefghijklmnopqrstuvwxyz";
static const std::string decimal_digit = "0123456789";
std::string::size_type position = 0;
std::string artist_name;
position = input_text.find_first_not_of(alphabet);
if (position != std::string::npos)
{
artist_name = input_text.substr(0, position - 1);
}
else
{
cerr << "Artist name not found.";
return EXIT_FAILURE;
}
Similarly, you can extract out the number, then use std::stoi to convert the numeric string to internal representation number.
Edit 1: Splitting the name
Since there is no separator character between the first and last name, you may want to have a list of possible first names and use that to find out where the first name ends and the surname starts.

Why can I construct a string with multiple string literals? [duplicate]

This question already has answers here:
Why allow concatenation of string literals?
(10 answers)
Closed 9 years ago.
#include <iostream>
#include <string>
int main() {
std::string str = "hello " "world" "!";
std::cout << str;
}
The following compiles, runs, and prints:
hello world!
see live
It seems as though the string literals are being concatenated together, but interestingly this can not be done with operator +:
#include <iostream>
#include <string>
int main() {
std::string str = "hello " + "world";
std::cout << str;
}
This will fail to compile.
see live
Why is this behavior in the language? My theory is that it is allows strings to be constructed with multiple #include statements because #include statements are required to be on their own line. Is this behavior simply possible due to the grammar of the language, or is it an exception that was added to help solve a problem?
Adjacent string literals are concatenated we can see this in the draft C++ standard section 2.2 Phases of translation paragraph 6 which says:
Adjacent string literal tokens are concatenated
In your other case, there is no operator+ defined to take two *const char**.
As to why, this comes from C and we can go to the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
without this feature you would have to do this to continue a string literal over multiple lines:
std::string str = "hello \
world\
!";
which is pretty ugly.
Like #erenon said, the compiler will merge multiple string literals into one, which is especially helpful if you want to use multiple lines like so:
cout << "This is a very long string-literal, "
"which for readability in the code "
"is divided over multiple lines.";
However, when you try to concatenate string-literals together using operator+, the compiler will complain because there is no operator+ defined for two char const *'s. The operator is defined for the string class (which is totally different from C-strings), so it is legal to do this:
string str = string("Hello ") + "world";
The compiler concatenates the string literals automatically into a single one.
When the compiler sees "hello " + "world"; is looking for a global + operator which takes two const char* ... And since by default there is none it fails.
The "hello " "world" "!" is resolved by the compiler as a single string. This allows you to have concatenated strings written over multiple lines .
In the first example, the consecutive string literals are concatenated by magic, before compilation has properly started. The compiler sees a single literal, as if you'd written "hello world!".
In the second example, once compilation has begun, the literals become static arrays. You can't apply + to two arrays.
Why is this behavior in the language?
This is a legacy of C, which comes from a time when memory was a precious resource. It allows you to do quite a lot of string manipulation without requiring dynamic memory allocation (as more modern idioms like std::string often do); the price for that is some rather quirky semantics.