string.c_str() outputs gibberish - but only sometimes [duplicate] - c++

My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.

stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.

What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.

In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.

The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.

The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.

Related

why stringsteam.str().c_str() is not giving expected results? [duplicate]

My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.
What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.
In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.
The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.
The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.

Question related to std::string object and c_str() method in 3 different implementations

I saw a strange behavior the other day.
So I wanted to store lines(present in a vector) in a char array and wanted to use '\n' as delimiter.
I know c_str() method in string class returns a pointer to a char array ending in '\0'.
Based on my experience/understanding of C++.(see greet0 and greet2 functions).
I assumed it should work but it didn't.
Can anyone explain the different behavior in three greet functions? What is the the scope of the object mentioned in each of the greet function?
(also i had a guess that the string object was destroyed in greet1 function but if that would have been the case there should be segmentation fault in cout<<"greet1:"<<w1<<endl; but that does not happen so what exactly is happening in background).
//The snippet that where i first encountered the issue.
const char* concatinated_str(std::vector<std::string> lines, const char *delimiter)
{
std::stringstream buf;
std::copy(lines.begin(), lines.end(), std::ostream_iterator<std::string>(buf, delimiter));
string w = buf.str();
const char *ret = w.c_str();
return ret;
}
//Implementation 0
string greet0(){
string msg = "hello";
return msg;
}
//Implementation 1
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
//Implementation 2
const char* greet2(){
const char* msg = "hello";
return msg;
}
int main(){
auto w0 = greet0();
cout<<&w0<<endl;
cout<<"greet0:"<<w0<<endl;
auto w1 = greet1();
cout<<"greet1:"<<w1<<endl;
const char* w2 = greet2();
cout<<"greet2:"<<w2<<endl;
}
Output:
0x7fff0ff3e8e0
0x7fff0ff3e8e0
greet0:hello
greet1:
greet2:hello
Returning a std::string or the pointer to a string-literal by value is perfectly fine.
Using the return-value of greet1() though has Undefined Behavior because the std::string whose elements you try to print died at the end of its enclosing function, leaving the returned pointer dangling.
What happens if you dereference a dangling pointer is not defined, acting as if you had a pointer to an empty string due to storage being re-used being one of the more benign possibilities.
As an aside, the address of a std::string is rarely that interesting to someone executing your program, though printing it is perfectly fine.
In statements cout<<&w0<<endl; cout<<&msg<<endl; you're outputting a pointer to std::string. Remove the & to actually print string, not its address. IF you're mystified by same result for two different objects, that might be because of they are addresses of local variables. The memory could be reused as those objects are limited in their lifetime not necessary have unique locations.
In greet0 technically msg is a local variable and stops existing on exit from function but compiler may optimize returned value and instead of copying msg to outside, the actual code would form a proper object at destination w0. With newer compilers Returned Value Optimization is guaranteed.
In function
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
msg here is a function-local variable, so it represents an object that stops existing at end of scope containing it, i.e. after function had returned. After return line the pointer taken from c_str() is dangling, because that method returns a pointer to the internal storage of std::string. The storage of msg was destroyed and you're invoking Undefined Behaviour by accessing it. Segmentation fault (which is purely Linux event by the way, mechanics in Windows are different) is possible outcome but not necessary.
In third function
const char* greet2(){
const char* msg = "hello";
return msg;
}
msg points to a array containing the constant string "hello". Constant strings created by string literals have same lifespan as a global static object. Those strings are formed during compilation. Exiting function doesn't invalidate the pointer, you still can dereference it because string still exists.
The only code that invokes undefined behavior is related to this function
#Implementation 1
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
The local object msg of the type std::string will not be alive after exiting the function. It will be destroyed. So the function returns an invalid pointer.
In this function implementation
#Implementation 2
const char* greet2(){
const char* msg = "hello";
return msg;
}
there is returned a pointer to the first character of the string literal "hello" that has static storage duration. It means that the string literal will be alive after exiting the function. Thus the function returns a valid pointer.
This function
#Implementation 0
string greet0(){
string msg = "hello";
return msg;
}
returns a temporary object of the type std::string that is moved (possibly with the move elision) to the variable w0 in main
auto w0 = greet0();
So this function is correct.

C++ substring from string

I'm pretty new to C++ and I'm need to create MyString class, and its method to create new MyString object from another's substring, but chosen substring changes while class is being created and when I print it with my method.
Here is my code:
#include <iostream>
#include <cstring>
using namespace std;
class MyString {
public:
char* str;
MyString(char* str2create){
str = str2create;
}
MyString Substr(int index2start, int length) {
char substr[length];
int i = 0;
while(i < length) {
substr[i] = str[index2start + i];
i++;
}
cout<<substr<<endl; // prints normal string
return MyString(substr);
}
void Print() {
cout<<str<<endl;
}
};
int main() {
char str[] = {"hi, I'm a string"};
MyString myStr = MyString(str);
myStr.Print();
MyString myStr1 = myStr.Substr(10, 7);
cout<<myStr1.str<<endl;
cout<<"here is the substring I've done:"<<endl;
myStr1.Print();
return 0;
}
And here is the output:
hi, I'm a string
string
stri
here is the substring I've done:
♦
Have to walk this through to explain what's going wrong properly so bear with me.
int main() {
char str[] = {"hi, I'm a string"};
Allocated a temporary array of 17 characters (16 letters plus a the terminating null), placed the characters "hi, I'm a string" in it, and ended it off with a null. Temporary means what it sound like. When the function ends, str is gone. Anything pointing at str is now pointing at garbage. It may shamble on for a while and give some semblance of life before it is reused and overwritten, but really it's a zombie and can only be trusted to kill your program and eat its brains.
MyString myStr = MyString(str);
Creates myStr, another temporary variable. Called the constructor with the array of characters. So let's take a look at the constructor:
MyString(char* str2create){
str = str2create;
}
Take a pointer to a character, in this case it will have a pointer to the first element of main's str. This pointer will be assigned to MyString's str. There is no copying of the "hi, I'm a string". Both mains's str and MyString's strpoint to the same place in memory. This is a dangerous condition because alterations to one will affect the other. If one str goes away, so goes the other. If one str is overwritten, so too is the other.
What the constructor should do is:
MyString(char* str2create){
size_t len = strlen(str2create); //
str = new char[len+1]; // create appropriately sized buffer to hold string
// +1 to hold the null
strcpy(str, str2create); // copy source string to MyString
}
A few caveats: This is really really easy to break. Pass in a str2create that never ends, for example, and the strlen will go spinning off into unassigned memory and the results will be unpredictable.
For now we'll assume no one is being particularly malicious and will only enter good values, but this has been shown to be really bad assumption in the real world.
This also forces a requirement for a destructor to release the memory used by str
virtual ~MyString(){
delete[] str;
}
It also adds a requirement for copy and move constructors and copy and move assignment operators to avoid violating the Rule of Three/Five.
Back to OP's Code...
str and myStr point at the same place in memory, but this isn't bad yet. Because this program is a trivial one, it never becomes a problem. myStr and str both expire at the same point and neither are modified again.
myStr.Print();
Will print correctly because nothing has changed in str or myStr.
MyString myStr1 = myStr.Substr(10, 7);
Requires us to look at MyString::Substr to see what happens.
MyString Substr(int index2start, int length) {
char substr[length];
Creates a temporary character array of size length. First off, this is non-standard C++. It won't compile under a lot of compilers, do just don't do this in the first place. Second, it's temporary. When the function ends, this value is garbage. Don't take any pointers to substr because it won't be around long enough to use them. Third, no space was reserved for the terminating null. This string will be a buffer overrun waiting to happen.
int i = 0;
while(i < length) {
substr[i] = str[index2start + i];
i++;
}
OK that's pretty good. Copy from source to destination. What it is missing is the null termination so users of the char array knows when it ends.
cout<<substr<<endl; // prints normal string
And that buffer overrun waiting to happen? Just happened. Whups. You got unlucky because it looks like it worked rather than crashing and letting you know that it didn't. Must have been a null in memory at exactly the right place.
return MyString(substr);
And this created a new MyString that points to substr. Right before substr hit the end of the function and died. This new MyString points to garbage almost instantly.
}
What Substr should do:
MyString Substr(int index2start, int length)
{
std::unique_ptr<char[]> substr(new char[length + 1]);
// unique_ptr is probably paranoid overkill, but if something does go
// wrong, the array's destruction is virtually guaranteed
int i = 0;
while (i < length)
{
substr[i] = str[index2start + i];
i++;
}
substr[length] = '\0';// null terminate
cout<<substr.get()<<endl; // get() gets the array out of the unique_ptr
return MyString(substr.get()); // google "copy elision" for more information
// on this line.
}
Back in OP's code, with the return to the main function that which was substr starts to be reused and overwritten.
cout<<myStr1.str<<endl;
Prints myStr1.str and already we can see some of it has been reused and destroyed.
cout<<"here is the substring I've done:"<<endl;
myStr1.Print();
More death, more destruction, less string.
Things to not do in the future:
Sharing pointers where data should have been copied.
Pointers to temporary data.
Not null terminating strings.
Your function Substr returns the address of a local variable substr indirectly by storing a pointer to it in the return value MyString object. It's invalid to dereference a pointer to a local variable once it has gone out of scope.
I suggest you decide whether your class wraps an external string, or owns its own string data, in which case you will need to copy the input string to a member buffer.

Adding a pointer to a string in C++

I am confused with const pointers in C++ and wrote a small application to see what the output would be. I am attempting (I believe) to add a pointer to a string, which should not work correctly, but when I run the program I correctly get "hello world". Can anyone help me figure out what how this line (s += s2) is working?
My code:
#include <iostream>
#include <stdio.h>
#include <string>
using namespace std;
const char* append(const char* s1, const char* s2){
std::string s(s1); //this will copy the characters in s1
s += s2; //add s and s2, store the result in s (shouldn't work?)
return s.c_str(); //return result to be printed
}
int main() {
const char* total = append("hello", "world");
printf("%s", total);
return 0;
}
The variable s is local inside the append function. Once the append function returns that variable is destructed, leaving you with a pointer to a string that no longer exists. Using this pointer leads to undefined behavior.
My tip to you on how to solve this: Use std::string all the way!
you're adding const char* pointer to a std::string and that is possible (see this reference). it wouldn't be possible to make that operation on char* type (C style string).
however, you're returning a pointer to local variable, so once function append returns and gets popped of the stack, the string that your returned pointer is pointing to would not exist. this leads to an undefined behavior.
Class std::string has overloaded operator += for an operand of type const char *
basic_string& operator+=(const charT* s);
In fact it simply appends the string pointed to by this pointer to the contents of the object of type std::string allocating additionly memory if required. For example internally the overloaded operator could use standard C function strcat
Conceptually it is similar to the following code snippet.
char s[12] = "Hello ";
const char *s2 = "World";
std::strcat( s, s2 );
Take into account that your program has undefined behaviour because total will be invalid after destroying local object s after exiting function append. So the next statemnent in main
printf("%s", total);
can result in undefined behaviour.

assignment of c_str() to a string

I have a problem which i cannot fix on my own.
string filenameRaw;
filenameRaw= argv[1];
function(filenameRaw.c_str(),...);
function(const char* rawDataFile,const char* targetfieldFile,const char* resultFile,const char* filename)
...
this->IOPaths.rawData=rawDataFile;
...
works very fine so far. Now I try to put another string in the variable IOPaths.rawData...
function(const char* rawDataFile,const char* targetfieldFile,const char* resultFile,const char* filename)
...
string filenameRaw;
filenameRaw=reader.Get("paths", "rawData", "UNKNOWN")
...
const char* rawDataFile1=filenameRaw.c_str();
cout << "Compare: " << strcmp(rawDataFile,rawDataFile1) <<endl;
...
this->IOPaths.rawData=rawDataFile1;
this does not work any more. Later in my programm I get errors with the filename. The strcmp definitly gives a 0, so the strings must be equal. Does anyone has an idea what i am doing wrong?
The validity of the output of c_str() is limited to, at most, the lifetime of the object on which c_str() was called.1
I suspect that this->IOPaths.rawData is pointing to deallocated memory once filenameRaw is out of scope.
An adequate remedy would be to pass the std::string around rather than [const] char*. A good stl implementation would use copy on write semantics for the string class so perhaps you wouldn't be repeatedly copying string data.
1In certain instances (such as if the object is modified), it could be less.