C++ const char* pointer assignment - c++

I must have missed an obvious fact here -- haven't been programming C++ for a while. Why can't I print the c-style string after assigning it to a const char* variable? But if I try to print it directly without assigning it works fine:
#include "boost/lexical_cast.hpp"
using namespace std;
using boost::lexical_cast;
int main (int argc, char** argv)
{
int aa=500;
cout << lexical_cast<string>(aa).c_str() << endl; // prints the string "500" fine
const char* bb = lexical_cast<string>(aa).c_str();
cout << bb << endl; // prints nothing
return EXIT_SUCCESS;
}

The C String returned by c_str is only usable while the std::string from which it was obtained exists. Once that std::string is destroyed, the C String is gone too. (At that point, attempting to use the C String yields undefined behavior.)
Other operations may also invalidate the C String. In general, any operation that modifies the string will invalidate the pointer returned by c_str.

c_str function is called on the result of the temporary string which is created from the lexical_cast. Since you don't save it, the string is destroyed at the end of that expression and thus accessing the pointer to the c_str of the string that has been destroyed is undefined behaviour.

Related

why stringsteam.str().c_str() is not giving expected results? [duplicate]

My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.
What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.
In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.
The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.
The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.

Why can we return char* from function?

Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?
char* strA()
{
char str[] = "hello word";
return str;
}
char* strB()
{
char* str = "hello word";
return str;
}
int main()
{
cout<<strA()<<endl;
cout<<strB()<<endl;
}
Why does strB() work?
A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)
When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.
Why doesn't strA() work?
However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.
When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).
String literals exist for the life of the program.
String literals have static storage duration, and thus exist in memory for the life of the program.
That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.
On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.
BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)
case 1:
#include <stdio.h>
char *strA() {
char str[] = "hello world";
return str;
}
int main(int argc, char **argv) {
puts(strA());
return 0;
}
The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):
You can kinda cheat this with a continuation:
#include <stdio.h>
void strA(void (*continuation)(char *)) {
char str[] = "hello world";
continuation(str);
}
void myContinuation(char *arg) {
puts(arg);
}
int main(int argc, char **argv) {
strA(myContinuation);
return 0;
}
case 2:
If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.
#include <stdio.h>
char *strB() {
char *str = "hello world";
return str;
}
int main(int argc, char **argv) {
char *myStr;
// comment the line below and replace it with
// result of &myStr[0], in my case, result of &myStr[0] is 4231168
printf("is your string: %s.\n", (char *)4231168);
myStr = strB();
printf("str is at: %lld\n", &myStr[0]);
return 0;
}
You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.
#include <stdio.h>
typedef struct {
char data[81];
} MY_STRING;
MY_STRING strC() {
MY_STRING str = {"what year is this?"};
return str;
}
int main(int argc, char **argv) {
puts(strC().data);
printf("size of strC's return: %d.\n", sizeof(strC()));
return 0;
}
tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).

std::to_string store in const char*

I need to convert number to string and store it into a const char* but problem is that const char* variable is blank after assignment.
In the following example code I expect to see number output converted to const char*
#include <iostream>
#include <string>
int main()
{
int number = 123;
const char* ptr_num_string = std::to_string(number).c_str();
std::cout << "number to string is: " << ptr_num_string << std::endl;
std::cin.get();
return 0;
}
Output is blank:
number to string is:
How do I convert number into a const char* ?
std::to_string returns a temporary std::string.
The pointer returned by std::string::c_str is invalidated by any non-const operation on the string itself - this is because it basically gives you a pointer to the string's internal buffer.
Destroying a std::string is definitely a non-const operation!
Therefore, you can't expect to take a pointer into the string returned by to_string and ever be able to use that. You need to save a copy of that string in the first place:
int number = 123;
std::string const numAsString = std::to_string(number);
char const* ptrToNumString = numAsString.c_str(); // use this as long as numAsString is alive
What you are doing is called Undefined Behaviour - this means that your program is invalid, and anything could happen. It could crash, it could print out some garbage, it could appear to work normally, it could do a different one of these each time you run your program...
It is not "blank" it points to the buffer of string that went out of scope. So accessing it causes Undefined Behavior. You need to keep string alive:
auto str{std::to_string(number)};
auto ptr_num_string{str.c_str()};
std::cout << "number to string is: " << ptr_num_string << std::endl;

string.c_str() outputs gibberish - but only sometimes [duplicate]

My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.
What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.
In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.
The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.
The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.

Returning value from a function

const char *Greet(const char *c) {
string name;
if(c)
name = c;
if (name.empty())
return "Hello, Unknown";
return name.c_str();
}
int _tmain(int argc, _TCHAR* argv[])
{
cout << Greet(0) << '\t' << Greet("Hello, World") << endl;
return 0;
}
I see 2 bugs with the above code.
Returning c_str from a string object that is defined local to the function. String gets destroyed when function returns and clearly c_str() will point to some memory that is de-allocated.
Returning "Hello, Unknown" from within the function. This is again an array of const chars allocated in the stack which should get de-allocated as well when the function returns. However, it does not and I am guessing that is because of Return Value Optimization.
Is my above understanding correct?
PS: I tested the above code with both gcc and MSVC10. GCC runs the above code fine and does not generate any runtime errors or undefined behaviors both for the string object as well as for the constant string. MSVC10 displays garbage data for the string object but prints the constant string correctly.
Number 1 is correct. The pointer returned from c_str() is invalidated when name is destroyed. Dereferencing the pointer after name results in undefined behavior. In your tests, under gcc it appears to work; under Visual C++ it prints garbage. Any results are possible when the behavior is undefined.
Number 2 is incorrect. "Hello, Unknown" is a string literal. String literals have static storage duration (they exist from when the program starts up to when it terminates. You are returning a pointer to this string literal, and that pointer is valid even after the function returns.
String literals have static storage, so are not deallocated at the end of the function.