String literals in C++ with _T macro - c++

What is the difference (if any) between this
_T("a string")
and
_T('a string')
?

First, _T isn't a standard part of C++. I've added the "windows" tag to your question.
Now, the difference between these is that the first is correct and the second is not. In C++, ' is for quoting single characters, and " is for quoting strings.

The second is wrong. You are placing a string literal in between single quotes.

'a string' is a so-called "multicharacter literal". It has type int, and an implementation-defined value. This is [lex.ccon] in the standard.
I don't know what values MSVC gives to multicharacter literals, and I don't know for sure what the MS-specific _T macro ends up doing with it, but I expect you get a narrow multicharacter literal on narrow builds, and a wide multicharacter literal on wide builds. The prefix L is the same for strings and character literals.
It's wrong, anyway: multicharacter literals are pretty much useless and certainly are no substitute for strings. "a string" is a string literal, which is what you want.

You use '' for single character and "" for strings. _T('a string') is wrong and its behaviour is compiler-specific.
In case of MSVC it uses first character only. Example:
#include <iostream>
#include <tchar.h>
int main()
{
if (_T('a string') == _T('a'))
std::cout << (int)'a' << " = " << _T('a');
}
output: 97 = 97

Single quotations are primarily used when denoting a single character:
char c = 'e' ;
Double quotations are used with strings and output statements:
string s = "This is a string";
cout << "Output where double quotations are used.";

Related

String literals concatenation [duplicate]

char* a="dsa" "qwe";
printf("%s", a);
output: dsaqwe
My question is why does this thing work. If I give a space or nothing in between two string literals it concatenates the string literals.
How is this working?
It's defined by the ISO C standard, adjacent string literals are combined into a single one.
The language is a little dry (it is a standard after all) but section 6.4.5 String literals of C11 states:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed wide string literal tokens are concatenated into a single multibyte character sequence.
This is also mentioned in 5.1.1.2 Translation phases, point 6 of the same standard, though a little more succinctly:
Adjacent string literal tokens are concatenated.
This basically means that "abc" "def" is no different to "abcdef".
It's often useful for making long strings while still having nice formatting, something like:
const char *myString = "This is a really long "
"string and I don't want "
"to make my lines in the "
"editor too long, because "
"I'm basically anal retentive :-)";
And to answer your unasked question, "What is this good for?"
For one thing, you can put constants in string literals. You can write
#define FIRST "John"
#define LAST "Doe"
const char* name = FIRST " " LAST;
const char* salutation = "Dear " FIRST ",";
and then if you'll need to change the name later, you'll only have to change it in one spot.
Things like that.
You answered your own question.
If I give a space or nothing in between two string literals it concatenates the string literals.
That's one of the features of the C syntax.
ISO C standard §5.1.1.2 says:-
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant.

C++: Quote escapes for an entire line [duplicate]

I came across this code snippet in C++17 draft n4713:
#define R "x"
const char* s = R"y"; // ill-formed raw string, not "x" "y"
What is a "raw string"? What does it do?
Raw string literals are string literals that are designed to make it easier to include nested characters like quotation marks and backslashes that normally have meanings as delimiters and escape sequence starts. They’re useful for, say, encoding text like HTML. For example, contrast
"C:\\Program Files\\"
which is a regular string literal, with
R"(C:\Program Files\)"
which is a raw string literal. Here, the use of parentheses in addition to quotes allows C++ to distinguish a nested quotation mark from the quotation marks delimiting the string itself.
Basically a raw string literal is a string in which the escape characters (like \n \t or \" ) of C++ are not processed. A raw string literal which starts with R"( and ends in )" ,introduced in C++11
prefix(optional) R "delimiter( raw_characters )delimiter"
prefix - One of L, u8, u, U
Thanks to #Remy Lebeau,
delimiter is optional and is typically omitted, but there are corner cases where it is actually needed, in particular if the string content contains the character sequence )" in it, eg: R"(...)"...)", so you would need a delimiter to avoid an error, eg: R"x(...)"...)x".
See an example:
#include <iostream>
#include <string>
int main()
{
std::string normal_str = "First line.\nSecond line.\nEnd of message.\n";
std::string raw_str = R"(First line.\nSecond line.\nEnd of message.\n)";
std::string raw_str_delim = R"x("(First line.\nSecond line...)")x";
std::cout << normal_str << std::endl;
std::cout << raw_str << std::endl;
std::cout << raw_str_delim << std::endl;
return 0;
}
output:
First line.
Second line.
End of message.
First line.\nSecond line.\nEnd of message.\n
"(First line.\nSecond line...)"
Live on Godbolt
I will make an addition about a concern in one of the comments:
But here in the code the R is defined as "x" and after
expansion of the #define the code is const char* s = "x""y";
and there isn't any R"(.
The code fragment in the question is to show invalid uses of the Raw Strings. Let me get the actual 3-lines of code here:
#define R "x"
const char* s = R"y"; // ill-formed raw string literal, not "x" "y"
const char* s2 = R"(a)" "b)"; // a raw string literal followed by a normal string literal
The first line is there to not get confused by a macro. macros are preprocessed code fragments that replace parts in the source. Raw String, on the other hand, is a feature of the language that is "parsed" according to language rules.
The second line is to show the wrong use of it. Correct way would be R"(x)" where you need parenthesis in it.
And the last is to show how it can be a pain if not written carefully. The string inside parenthesis CANNOT include closing sequence of raw string. A correction might be R"_(a)" "b)_". _ can be replaced by any character (but not parentheses, backslash and spaces) and any number of them as long as closing sequence is not included inside: R"___(a)" "b)___" or R"anything(a)" "b)anything"
So if we wrap these correction within a simple C++ code:
#include <iostream>
using namespace std;
#define R "x" // This is just a macro, not Raw String nor definition of it
const char* s = R"(y)"; // R is part of language, not a macro
const char* s2 = R"_(a)" "b)_"; // Raw String shall not include closing sequence of characters; )_"
int main(){ cout << s <<endl << s2 <<endl << R <<endl; }
then the output will be
y
a)" "b
x
Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.
C++Reference: string literal
a Raw string is defined like this:
string raw_str=R"(First line.\nSecond line.\nEnd of message.\n)";
and the difference is that a raw string ignores (escapes) all the special characters like \n ant \t and threats them like normal text.
So the above line would be just one line with 3 actual \n in it, instead of 3 separate lines.
You need to remove the define line and add parentheses around your string to be considered as a raw string.

const char* initialization

This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";
It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.
This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).
Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."
const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).
The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler
Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");

How can the C++ Preprocessor be used on strings?

The preprocessor can be used to replace certain keywords with other words using #define. For example I could do #define name "George" and every time the preprocessor finds 'name' in the program it will replace it with "George".
However, this only seems to work with code. How could I do this with strings and text? For example if I print "Hello I am name" to the screen, I want 'name' to be replaced with "George" even though it is in a string and not code.
I do not want to manually search the string for keywords and then replace them, but instead want to use the preprocessor to just switch the words.
Is this possible? If so how?
I am using C++ but C solutions are also acceptable.
#define name "George"
printf("Hello I am " name "\n");
Adjacent string literals are concatenated in C and C++.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string literals are concatenated."
EDIT: as requested, add quotes from C and C++ Standard. Thanks to #MatteoItalia for the C++11 quote.
#define name "George"
printf("Hello I am %s\n", name);
Here name will be replaced by "George"
Your issue is that the preprocessor will (wisely) not replace tokens that are inside string literals.
So you must either use a function like printf or a variable rather than the preprocessor, or pull the token out of the string like so:
#include <iostream>
#define name "George"
int main(int argc, char** argv) {
std::cout << "Hello I am " << name << std::endl;
}

What is wrong with this string assignment?

string s="abcdefghijklmnopqrstuvwxyz"
char f[]=" " (s.substr(s.length()-10,9)).c_str() " ";
I want to get the last 9 characters of s and add " " to the beginning and the end of the substring, and store it as a char[]. I don't understand why this doesn't work even though char f[]=" " "a" " " does.
Is (s.substr(s.length()-10,9)).c_str() not a string literal?
No, it's not a string literal. String literals always have the form "<content>" or expand to that (macros, like __FILE__ for example).
Just use another std::string instead of char[].
std::string f = " " + s.substr(s.size()-10, 9) + " ";
First, consider whether you should be using cstrings. In C++, generally, use string.
However, if you want to use cstrings, the concatenation of "abc" "123" -> "abc123" is a preprocessor operation and so cannot be used with string::c_str(). Instead, the easiest way is to construct a new string and take the .c_str() of that:
string s="abcdefghijklmnopqrstuvwxyz"
char f[]= (string(" ") + s.substr(s.length()-10,9) + " ").c_str();
(EDIT: You know what, on second thought, that's a really bad idea. The cstring should be deallocated after the end of this statement, so using f can cause a segfault. Just don't use cstrings unless you're prepared to mess with strcpy and all that ugly stuff. Seriously.)
If you want to use strings instead, consider something like the following:
#include <sstream>
...
string s="abcdefghijklmnopqrstuvwxyz"
stringstream tmp;
tmp << " " << s.substr(s.length()-10,9) << " ";
string f = tmp.str();
#Xeo tells you how to solve your problem. Here's some complimentary background on how string literals are handled in the compilation process.
From section A.12 Preprocessing of The C Programming language:
Escape sequences in character constants and string literals (Pars. A.2.5.2, A.2.6) are
replaced by their equivalents; then adjacent string literals are concatenated.
It's the Preprocessor, not the compiler, who's responsible for the concatenation. (You asked for a C++ answer. I expect that C++ treats string literals the same way as C). The preprocessor has only a limited knowledge of the C/C++ language; the (s.substr(s.length()-10,9)).c_str() part is not evaluated at the preprocessor stage.