This question already has answers here:
Why allow concatenation of string literals?
(10 answers)
Closed 9 years ago.
#include <iostream>
#include <string>
int main() {
std::string str = "hello " "world" "!";
std::cout << str;
}
The following compiles, runs, and prints:
hello world!
see live
It seems as though the string literals are being concatenated together, but interestingly this can not be done with operator +:
#include <iostream>
#include <string>
int main() {
std::string str = "hello " + "world";
std::cout << str;
}
This will fail to compile.
see live
Why is this behavior in the language? My theory is that it is allows strings to be constructed with multiple #include statements because #include statements are required to be on their own line. Is this behavior simply possible due to the grammar of the language, or is it an exception that was added to help solve a problem?
Adjacent string literals are concatenated we can see this in the draft C++ standard section 2.2 Phases of translation paragraph 6 which says:
Adjacent string literal tokens are concatenated
In your other case, there is no operator+ defined to take two *const char**.
As to why, this comes from C and we can go to the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:
A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.
without this feature you would have to do this to continue a string literal over multiple lines:
std::string str = "hello \
world\
!";
which is pretty ugly.
Like #erenon said, the compiler will merge multiple string literals into one, which is especially helpful if you want to use multiple lines like so:
cout << "This is a very long string-literal, "
"which for readability in the code "
"is divided over multiple lines.";
However, when you try to concatenate string-literals together using operator+, the compiler will complain because there is no operator+ defined for two char const *'s. The operator is defined for the string class (which is totally different from C-strings), so it is legal to do this:
string str = string("Hello ") + "world";
The compiler concatenates the string literals automatically into a single one.
When the compiler sees "hello " + "world"; is looking for a global + operator which takes two const char* ... And since by default there is none it fails.
The "hello " "world" "!" is resolved by the compiler as a single string. This allows you to have concatenated strings written over multiple lines .
In the first example, the consecutive string literals are concatenated by magic, before compilation has properly started. The compiler sees a single literal, as if you'd written "hello world!".
In the second example, once compilation has begun, the literals become static arrays. You can't apply + to two arrays.
Why is this behavior in the language?
This is a legacy of C, which comes from a time when memory was a precious resource. It allows you to do quite a lot of string manipulation without requiring dynamic memory allocation (as more modern idioms like std::string often do); the price for that is some rather quirky semantics.
Related
Why does the following work:
string input = "a long string of text pasted from a .txt file";
But this version does not?
string input =
"
some
large
string ";
I thought C++ doesn't care about whitespace.
You can do something like this. It's called a raw string literal:
string input =
R"(
some
large
string )";
This will include the endline characters as well. The format is R"(string-literal)"
For the most parts no, it does not care about whitespace. But there are exceptions and string literals are one of them.
The rule is string literals cannot span multiple lines. But adjacent literals are automatically concatenated so you can just do
const char string[] = "very "
"long "
"string";
and it will be equivalent to
const char string[] = "very long string";
I am not sure about the origin of the rule, I suspect it might have been done to prevent confusion whether the newline should be part of the string or not (it's not unless explicitly escaped). Or maybe just some grammar/parser thing. Compiling C/C++ is kind of complicated and happens in multiple phases, see cppreference - string literals already have plenty of special treatment.
tl:dr
How can I concatenate const char* with std::string, neatly and
elegantly, without multiple function calls. Ideally in one function
call and have the output be a const char*. Is this impossible, what
is an optimum solution?
Initial Problem
The biggest barrier I have experienced with C++ so far is how it handles strings. In my opinion, of all the widely used languages, it handles strings the most poorly. I've seen other questions similar to this that either have an answer saying "use std::string" or simply point out that one of the options is going to be best for your situation.
However this is useless advice when trying to use strings dynamically like how they are used in other languages. I cannot guaranty to always be able to use std::string and for the times when I have to use const char* I hit the obvious wall of "it's constant, you can't concatenate it".
Every solution to any string manipulation problem I've seen in C++ requires repetitive multiple lines of code that only work well for that format of string.
I want to be able to concatenate any set of characters with the + symbol or make use of a simple format() function just how I can in C# or Python. Why is there no easy option?
Current Situation
Standard Output
I'm writing a DLL and so far I've been output text to cout via the << operator. Everything has been going fine so far using simple char arrays in the form:
cout << "Hello world!"
Runtime Strings
Now it comes to the point where I want to construct a string at runtime and store it with a class, this class will hold a string that reports on some errors so that they can be picked up by other classes and maybe sent to cout later, the string will be set by the function SetReport(const char* report). So I really don't want to use more than one line for this so I go ahead and write something like:
SetReport("Failure in " + __FUNCTION__ + ": foobar was " + foobar + "\n"); // __FUNCTION__ gets the name of the current function, foobar is some variable
Immediately of course I get:
expression must have integral or unscoped enum type and...
'+': cannot add two pointers
Ugly Strings
Right. So I'm trying to add two or more const char*s together and this just isn't an option. So I find that the main suggestion here is to use std::string, sort of weird that typing "Hello world!" doesn't just give you one of those in the first place but let's give it a go:
SetReport(std::string("Failure in ") + std::string(__FUNCTION__) + std::string(": foobar was ") + std::to_string(foobar) + std::string("\n"));
Brilliant! It works! But look how ugly that is!! That's some of the ugliest code I've every seen. We can simplify to this:
SetReport(std::string("Failure in ") + __FUNCTION__ + ": foobar was " + std::to_string(foobar) + "\n");
Still possibly the worst way I've every encounter of getting to a simple one line string concatenation but everything should be fine now right?
Convert Back To Constant
Well no, if you're working on a DLL, something that I tend to do a lot because I like to unit test so I need my C++ code to be imported by the unit test library, you will find that when you try to set that report string to a member variable of a class as a std::string the compiler throws a warning saying:
warning C4251: class 'std::basic_string<_Elem,_Traits,_Alloc>' needs to have dll-interface to be used by clients of class'
The only real solution to this problem that I've found other than "ignore the warning"(bad practice!) is to use const char* for the member variable rather than std::string but this is not really a solution, because now you have to convert your ugly concatenated (but dynamic) string back to the const char array you need. But you can't just tag .c_str() on the end (even though why would you want to because this concatenation is becoming more ridiculous by the second?) you have to make sure that std::string doesn't clean up your newly constructed string and leave you with garbage. So you have to do this inside the function that receives the string:
const std::string constString = (input);
m_constChar = constString.c_str();
Which is insane. Because now I traipsed across several different types of string, made my code ugly, added more lines than should need and all just to stick some characters together. Why is this so hard?
Solution?
So what's the solution? I feel that I should be able to make a function that concatenates const char*s together but also handle other object types such as std::string, int or double, I feel strongly that this should be capable in one line, and yet I'm unable to find any examples of it being achieved. Should I be working with char* rather than the constant variant, even though I've read that you should never change the value of char* so how would this help?
Are there any experienced C++ programmers who have resolved this issue and are now comfortable with C++ strings, what is your solution? Is there no solution? Is it impossible?
The standard way to build a string, formatting non-string types as strings, is a string stream
#include <sstream>
std::ostringstream ss;
ss << "Failure in " << __FUNCTION__ << ": foobar was " << foobar << "\n";
SetReport(ss.str());
If you do this often, you could write a variadic template to do that:
template <typename... Ts> std::string str(Ts&&...);
SetReport(str("Failure in ", __FUNCTION__, ": foobar was ", foobar, '\n'));
The implementation is left as an exercise for the reader.
In this particular case, string literals (including __FUNCTION__) can be concatenated by simply writing one after the other; and, assuming foobar is a std::string, that can be concatenated with string literals using +:
SetReport("Failure in " __FUNCTION__ ": foobar was " + foobar + "\n");
If foobar is a numeric type, you could use std::to_string(foobar) to convert it.
Plain string literals (e.g. "abc" and __FUNCTION__) and char const* do not support concatenation. These are just plain C-style char const[] and char const*.
Solutions are to use some string formatting facilities or libraries, such as:
std::string and concatenation using +. May involve too many unnecessary allocations, unless operator+ employs expression templates.
std::snprintf. This one does not allocate buffers for you and not type safe, so people end up creating wrappers for it.
std::stringstream. Ubiquitous and standard but its syntax is at best awkward.
boost::format. Type safe but reportedly slow.
cppformat. Reportedly modern and fast.
One of the simplest solution is to use an C++ empty string. Here I declare empty string variable named _ and used it in front of string concatenation. Make sure you always put it in the front.
#include <cstdio>
#include <string>
using namespace std;
string _ = "";
int main() {
char s[] = "chararray";
string result =
_ + "function name = [" + __FUNCTION__ + "] "
"and s is [" + s + "]\n";
printf( "%s", result.c_str() );
return 0;
}
Output:
function name = [main] and s is [chararray]
Regarding __FUNCTION__, I found that in Visual C++ it is a macro while in GCC it is a variable, so SetReport("Failure in " __FUNCTION__ "; foobar was " + foobar + "\n"); will only work on Visual C++. See: https://msdn.microsoft.com/en-us/library/b0084kay.aspx and https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
The solution using empty string variable above should work on both Visual C++ and GCC.
My Solution
I've continued to experiment with different things and I've got a solution which combines tivn's answer that involves making an empty string to help concatenate long std::string and character arrays together and a function of my own which allows single line copying of that std::string to a const char* which is safe to use when the string object leaves scope.
I would have used Mike Seymour's variadic templates but they don't seem to be supported by the Visual Studio 2012 I'm running and I need this solution to be very general so I can't rely on them.
Here is my solution:
Strings.h
#ifndef _STRINGS_H_
#define _STRINGS_H_
#include <string>
// tivn's empty string in the header file
extern const std::string _;
// My own version of .c_str() which produces a copy of the contents of the string input
const char* ToCString(std::string input);
#endif
Strings.cpp
#include "Strings.h"
const std::string str = "";
const char* ToCString(std::string input)
{
char* result = new char[input.length()+1];
strcpy_s(result, input.length()+1, input.c_str());
return result;
}
Usage
m_someMemberConstChar = ToCString(_ + "Hello, world! " + someDynamicValue);
I think this is pretty neat and works in most cases. Thank you everyone for helping me with this.
As of C++20, fmtlib has made its way into the ISO standard but, even on older iterations, you can still download and use it.
It gives similar capabilities as Python's str.format()(a), and your "ugly strings" example then becomes a relatively simple:
#include <fmt/format.h>
// Later on, where code is allowed (inside a function for example) ...
SetReport(fmt::format("Failure in {}: foobar was {}\n", __FUNCTION__, foobar));
It's much like the printf() family but with extensibility and type safety built in.
(a) But, unfortunately, not its string interpolation feature (use of f-strings), which has the added advantage of putting the expressions in the string at the place where they're output, something like:
set_report(f"Failure in {__FUNCTION__}: foobar was {foobar}\n");
If fmtlib ever got that capability, I'd probably wet my pants in excitement :-)
This is one usage I found in a open source software.And I don't understant how it works.
when I ouput it to the stdout,it was "version 0.8.0".
const char version[] = " version " "0" "." "8" "." "0";
It's called string concatenation -- when you put two (or more) quoted strings next to each other in the source code with nothing between them, the compiler puts them together into a single string. This is most often used for long strings -- anything more than one line long:
char whatever[] = "this is the first line of the string\n"
"this is the second line of the string\n"
"This is the third line of the string";
Before string concatenation was invented, you had to do that with a rather clumsy line continuation, putting a backslash at the end of each line (and making sure it was the end, because most compilers wouldn't treat it as line continuation if there was any whitespace after the backslash). There was also ugliness with it throwing off indentation, because any whitespace at the beginning of subsequent lines might be included in the string.
This can cause a minor problem if you intended to put a comma between the strings, such as when initializing an array of pointers to char. If you miss a comma, the compiler won't warn you about it -- you'll just get one string that includes what was intended to be two separate ones.
This is a basic feature of both C89 and C++98 called 'adjacent string concatenation' or thereabouts.
Basically, if two string literals are adjacent to each other with no punctuation in between, they are merged into a single string, as your output shows.
In the C++98 standard, section §2.1 'Phases of translation [lex.phases]' says:
6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.
This is after the preprocessor has completed.
In the C99 standard, the corresponding section is §5.1.2.1 'Translation Phases' and it says:
6 Adjacent string literal tokens are concatenated.
The wording would be very similar in any other C or C++ standard you can lay hands on (and I do recognize that both C++98 and C99 are superseded by C++11 and C11; I just don't have electronic copies of the final standards, yet).
Part of the C++ standard implementation states that string literals that are beside each other will be concatenated together.
Quotes from C and C++ Standard:
For C (quoting C99, but C11 has something similar in 6.4.5p5):
(C99, 6.4.5p5) "In translation phase 6, the multibyte character
sequences specified by any sequence of adjacent character and
identically-prefixed string literal tokens are concatenated into a
single multibyte character sequence."
For C++:
(C++11, 2.14.5p13) "In translation phase 6 (2.2), adjacent string
literals are concatenated."
const char version[] = " version " "0" "." "8" "." "0";
is same as:
const char version[] = " version 0.8.0";
Compiler concatenates the adjacent pieces of string-literals, making one bigger piece of string-literal.
As a sidenote, const char* (which is in your title) is not same as char char[] (which is in your posted code).
The compiler automatically concatenates string literals written after each other (separated by white-space only).. It is as if you have written
const char version[] = "version 0.8.0";
EDIT: corrected pre-processor to compiler
Adjacent string literals are concatenated:
When specifying string literals, adjacent strings are concatenated.
Therefore, this declaration:
char szStr[] = "12" "34"; is identical to this declaration:
char szStr[] = "1234"; This concatenation of adjacent strings makes it
easy to specify long strings across multiple lines:
cout << "Four score and seven years "
"ago, our forefathers brought forth "
"upon this continent a new nation.";
Simply putting strings one after the other concatenates them at compile time, so:
"Hello" ", " "World!" => "Hello, World!"
This is a strange usage of the feature, usually it is to allow #define strings to be used:
#define FOO "World!"
puts("Hello, " FOO);
Will compile to the same as:
puts("Hello, World!");
Ps: This is more of a conceptual question.
I know this makes things more complicated for no good reason, but here is what I'm wondering. If I'm not mistaken, a const char* "like this" in c++ is pointing to l and will be automatically zero terminated on compile time. I believe it is creating a temporary variable const char* to hold it, unless it is keeping track of the offset using a byte variable (I didn't check the disassembly). My question is, how would you if even possible, add characters to this string without having to call functions or instantiating strings?
Example (This is wrong, just so you can visualize what I meant):
"Like thi" + 's';
The closest thing I came up with was to store it to a const char* with enough spaces and change the other characters.
Example:
char str[9];
strcpy(str, "Like thi")
str[8] = 's';
Clarification:
Down vote: This question does not show any research effort; it is unclear or not useful
Ok, so the question has been highly down voted. There wasn't much reasoning on which of these my question was lacking on, so I'll try to improve all of those qualities.
My question was more so I could have a better understanding of what goes on when you simply create a string "like this" without storing the address of that string in a const char* I also wanted to know if it was possible to concatenate/change the content of that string without using functions like strcat() and without using the overloaded operator + from the class string. I'm aware this is not exactly useful for dealing with strings in C++, but I was curious whether or not there was a way besides the standard ways for doing so.
string example = "Like thi" + "s"; //I'm aware of the string class and its member functions
const char* example2 = "Like this"; //I'm also aware of C-type Strings (CString as well)
It is also possible that not having English as my native language made things even worst, I apologize for the confusion.
Instead of using a plain char string, you should use the string library provided by the C++ library:
#include <string>
#include <iostream>
using namespace std;
int main()
{
string str = "Like thi";
cout << str << endl;
str = str + "s";
cout << str << endl;
return 0;
}
Normally, it's not possible to simply concatenate plain char * strings in C or C++, because they are merely pointers to arrays of characters. There's almost no reason you should be using a bare character array in C++ if you intend on doing any string manipulations within your own code.
Even if you need access to the C representation (e.g. for an external library) you can use string::c_str().
First, there is nothing null terminated, but the zero terminated. All char* strings in C end with '\0'.
When you in code do something like this:
char *name="Daniel";
compiler will generate a string that has a contents:
Daniel\0
and will initialize name pointer to point at it at a certain time during program execution depending on the variable context (member, static, ...).
Appending ANYTHING to the name won't work as you expect, since memory pointed to by name isn't changeable, and you'll probably get either access violation error or will overwrite something else.
Having
const char* copyOfTheName = name;
won't create a copy of the string in question, it will only have copyOfTheName point to the original string, so having
copyOfTheName[6]='A';
will be exactly as
name[6]='A';
and will only cause problems to you.
Use std::strcat instead. And please, do some investigating how the basic string operations work in C.
string s="abcdefghijklmnopqrstuvwxyz"
char f[]=" " (s.substr(s.length()-10,9)).c_str() " ";
I want to get the last 9 characters of s and add " " to the beginning and the end of the substring, and store it as a char[]. I don't understand why this doesn't work even though char f[]=" " "a" " " does.
Is (s.substr(s.length()-10,9)).c_str() not a string literal?
No, it's not a string literal. String literals always have the form "<content>" or expand to that (macros, like __FILE__ for example).
Just use another std::string instead of char[].
std::string f = " " + s.substr(s.size()-10, 9) + " ";
First, consider whether you should be using cstrings. In C++, generally, use string.
However, if you want to use cstrings, the concatenation of "abc" "123" -> "abc123" is a preprocessor operation and so cannot be used with string::c_str(). Instead, the easiest way is to construct a new string and take the .c_str() of that:
string s="abcdefghijklmnopqrstuvwxyz"
char f[]= (string(" ") + s.substr(s.length()-10,9) + " ").c_str();
(EDIT: You know what, on second thought, that's a really bad idea. The cstring should be deallocated after the end of this statement, so using f can cause a segfault. Just don't use cstrings unless you're prepared to mess with strcpy and all that ugly stuff. Seriously.)
If you want to use strings instead, consider something like the following:
#include <sstream>
...
string s="abcdefghijklmnopqrstuvwxyz"
stringstream tmp;
tmp << " " << s.substr(s.length()-10,9) << " ";
string f = tmp.str();
#Xeo tells you how to solve your problem. Here's some complimentary background on how string literals are handled in the compilation process.
From section A.12 Preprocessing of The C Programming language:
Escape sequences in character constants and string literals (Pars. A.2.5.2, A.2.6) are
replaced by their equivalents; then adjacent string literals are concatenated.
It's the Preprocessor, not the compiler, who's responsible for the concatenation. (You asked for a C++ answer. I expect that C++ treats string literals the same way as C). The preprocessor has only a limited knowledge of the C/C++ language; the (s.substr(s.length()-10,9)).c_str() part is not evaluated at the preprocessor stage.