const char* initialization - c++

This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";

It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.

This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).

Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."

const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).

The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler

Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";

Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");

Related

Convert string to raw string

char str[] = "C:\Windows\system32"
auto raw_string = convert_to_raw(str);
std::cout << raw_string;
Desired output:
C:\Windows\system32
Is it possible? I am not a big fan of cluttering my path strings with extra backslash. Nor do I like an explicit R"()" notation.
Any other work-around of reading a backslash in a string literally?
That's not possible, \ has special meaning inside a non-raw string literal, and raw string literals exist precisely to give you a chance to avoid having to escape stuff. Give up, what you need is R"(...)".
Indeed, when you write something like
char const * str{"a\nb"};
you can verify yourself that strlen(str) is 3, not 4, which means that once you compile that line, in the binary/object file there's only one single character, the newline character, corresponding to \n; there's no \ nor n anywere in it, so there's no way you can retrieve them.
As a personal taste, I find raw string literals great! You can even put real Enter in there. Often just for the price of 3 characters - R, (, and ) - in addtion to those you would write anyway. Well, you would have to write more characters to escape anything needs escaping.
Look at
std::string s{R"(Hello
world!
This
is
Me!)"};
That's 28 keystrokes from R to last " included, and you can see in a glimpse it's 6 lines.
The equivalent non-raw string
std::string s{"Hello\nworld!\nThis\nis\nMe!"};
is 30 keystrokes from R to last " included, and you have to parse it carefully to count the lines.
A pretty short string, and you already see the advantage.
To answer the question, as asked, no it is not possible.
As an example of the impossibility, assume we have a path specified as "C:\a\b";
Now, str is actually represented in memory (in your program when running) using a statically allocated array of five characters with values {'C', ':', '\007', '\010', '\000'} where '\xyz' represents an OCTAL representation (so '\010' is a char equal to numerically to 8 in decimal).
The problem is that there is more than one way to produce that array of five characters using a string literal.
char str[] = "C:\a\b";
char str1[] = "C:\007\010";
char str2[] = "C:\a\010";
char str3[] = "C:\007\b";
char str4[] = "C:\x07\x08"; // \xmn uses hex coding
In the above, str1, str2, str3, and str4 are all initialised using equivalent arrays of five char.
That means convert_to_raw("C:\a\b") could quite legitimately assume it is passed ANY of the strings above AND
std::cout << convert_to_raw("C:\a\b") << '\n';
could quite legitimately produce output of
C:\007\010
(or any one of a number of other strings).
The practical problem with this, if you are working with windows paths, is that c:\a\b, C:\007\010, C:\a\010, C:\007\b, and C:\x07\x08 are all valid filenames under windows - that (unless they are hard links or junctions) name DIFFERENT files.
In the end, if you want to have string literals in your code representing filenames or paths, then use \\ or a raw string literal when you need a single backslash. Alternatively, write your paths as string literals in your code using all forward slashes (e.g. "C:/a/b") since windows API functions accept those too.

String literals concatenation [duplicate]

char* a="dsa" "qwe";
printf("%s", a);
output: dsaqwe
My question is why does this thing work. If I give a space or nothing in between two string literals it concatenates the string literals.
How is this working?
It's defined by the ISO C standard, adjacent string literals are combined into a single one.
The language is a little dry (it is a standard after all) but section 6.4.5 String literals of C11 states:
In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed wide string literal tokens are concatenated into a single multibyte character sequence.
This is also mentioned in 5.1.1.2 Translation phases, point 6 of the same standard, though a little more succinctly:
Adjacent string literal tokens are concatenated.
This basically means that "abc" "def" is no different to "abcdef".
It's often useful for making long strings while still having nice formatting, something like:
const char *myString = "This is a really long "
"string and I don't want "
"to make my lines in the "
"editor too long, because "
"I'm basically anal retentive :-)";
And to answer your unasked question, "What is this good for?"
For one thing, you can put constants in string literals. You can write
#define FIRST "John"
#define LAST "Doe"
const char* name = FIRST " " LAST;
const char* salutation = "Dear " FIRST ",";
and then if you'll need to change the name later, you'll only have to change it in one spot.
Things like that.
You answered your own question.
If I give a space or nothing in between two string literals it concatenates the string literals.
That's one of the features of the C syntax.
ISO C standard §5.1.1.2 says:-
Adjacent string literal tokens are concatenated.
White-space characters separating tokens are no longer significant.

Why can I construct a string with multiple string literals? [duplicate]

This question already has answers here:
Why allow concatenation of string literals?
(10 answers)
Closed 9 years ago.
#include <iostream>
#include <string>
int main() {
std::string str = "hello " "world" "!";
std::cout << str;
}
The following compiles, runs, and prints:
hello world!
see live
It seems as though the string literals are being concatenated together, but interestingly this can not be done with operator +:
#include <iostream>
#include <string>
int main() {
std::string str = "hello " + "world";
std::cout << str;
}
This will fail to compile.
see live
Why is this behavior in the language? My theory is that it is allows strings to be constructed with multiple #include statements because #include statements are required to be on their own line. Is this behavior simply possible due to the grammar of the language, or is it an exception that was added to help solve a problem?
Adjacent string literals are concatenated we can see this in the draft C++ standard section 2.2 Phases of translation paragraph 6 which says:
Adjacent string literal tokens are concatenated
In your other case, there is no operator+ defined to take two *const char**.
As to why, this comes from C and we can go to the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
without this feature you would have to do this to continue a string literal over multiple lines:
std::string str = "hello \
world\
!";
which is pretty ugly.
Like #erenon said, the compiler will merge multiple string literals into one, which is especially helpful if you want to use multiple lines like so:
cout << "This is a very long string-literal, "
"which for readability in the code "
"is divided over multiple lines.";
However, when you try to concatenate string-literals together using operator+, the compiler will complain because there is no operator+ defined for two char const *'s. The operator is defined for the string class (which is totally different from C-strings), so it is legal to do this:
string str = string("Hello ") + "world";
The compiler concatenates the string literals automatically into a single one.
When the compiler sees "hello " + "world"; is looking for a global + operator which takes two const char* ... And since by default there is none it fails.
The "hello " "world" "!" is resolved by the compiler as a single string. This allows you to have concatenated strings written over multiple lines .
In the first example, the consecutive string literals are concatenated by magic, before compilation has properly started. The compiler sees a single literal, as if you'd written "hello world!".
In the second example, once compilation has begun, the literals become static arrays. You can't apply + to two arrays.
Why is this behavior in the language?
This is a legacy of C, which comes from a time when memory was a precious resource. It allows you to do quite a lot of string manipulation without requiring dynamic memory allocation (as more modern idioms like std::string often do); the price for that is some rather quirky semantics.

String literals in C++ with _T macro

What is the difference (if any) between this
_T("a string")
and
_T('a string')
?
First, _T isn't a standard part of C++. I've added the "windows" tag to your question.
Now, the difference between these is that the first is correct and the second is not. In C++, ' is for quoting single characters, and " is for quoting strings.
The second is wrong. You are placing a string literal in between single quotes.
'a string' is a so-called "multicharacter literal". It has type int, and an implementation-defined value. This is [lex.ccon] in the standard.
I don't know what values MSVC gives to multicharacter literals, and I don't know for sure what the MS-specific _T macro ends up doing with it, but I expect you get a narrow multicharacter literal on narrow builds, and a wide multicharacter literal on wide builds. The prefix L is the same for strings and character literals.
It's wrong, anyway: multicharacter literals are pretty much useless and certainly are no substitute for strings. "a string" is a string literal, which is what you want.
You use '' for single character and "" for strings. _T('a string') is wrong and its behaviour is compiler-specific.
In case of MSVC it uses first character only. Example:
#include <iostream>
#include <tchar.h>
int main()
{
if (_T('a string') == _T('a'))
std::cout << (int)'a' << " = " << _T('a');
}
output: 97 = 97
Single quotations are primarily used when denoting a single character:
char c = 'e' ;
Double quotations are used with strings and output statements:
string s = "This is a string";
cout << "Output where double quotations are used.";

How can the C++ Preprocessor be used on strings?

The preprocessor can be used to replace certain keywords with other words using #define. For example I could do #define name "George" and every time the preprocessor finds 'name' in the program it will replace it with "George".
However, this only seems to work with code. How could I do this with strings and text? For example if I print "Hello I am name" to the screen, I want 'name' to be replaced with "George" even though it is in a string and not code.
I do not want to manually search the string for keywords and then replace them, but instead want to use the preprocessor to just switch the words.
Is this possible? If so how?
I am using C++ but C solutions are also acceptable.
#define name "George"
printf("Hello I am " name "\n");
Adjacent string literals are concatenated in C and C++.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character sequences specified by any sequence of adjacent character and identically-prefixed string literal tokens are concatenated into a single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string literals are concatenated."
EDIT: as requested, add quotes from C and C++ Standard. Thanks to #MatteoItalia for the C++11 quote.
#define name "George"
printf("Hello I am %s\n", name);
Here name will be replaced by "George"
Your issue is that the preprocessor will (wisely) not replace tokens that are inside string literals.
So you must either use a function like printf or a variable rather than the preprocessor, or pull the token out of the string like so:
#include <iostream>
#define name "George"
int main(int argc, char** argv) {
std::cout << "Hello I am " << name << std::endl;
}