I am confused with const pointers in C++ and wrote a small application to see what the output would be. I am attempting (I believe) to add a pointer to a string, which should not work correctly, but when I run the program I correctly get "hello world". Can anyone help me figure out what how this line (s += s2) is working?
My code:
#include <iostream>
#include <stdio.h>
#include <string>
using namespace std;
const char* append(const char* s1, const char* s2){
std::string s(s1); //this will copy the characters in s1
s += s2; //add s and s2, store the result in s (shouldn't work?)
return s.c_str(); //return result to be printed
}
int main() {
const char* total = append("hello", "world");
printf("%s", total);
return 0;
}
The variable s is local inside the append function. Once the append function returns that variable is destructed, leaving you with a pointer to a string that no longer exists. Using this pointer leads to undefined behavior.
My tip to you on how to solve this: Use std::string all the way!
you're adding const char* pointer to a std::string and that is possible (see this reference). it wouldn't be possible to make that operation on char* type (C style string).
however, you're returning a pointer to local variable, so once function append returns and gets popped of the stack, the string that your returned pointer is pointing to would not exist. this leads to an undefined behavior.
Class std::string has overloaded operator += for an operand of type const char *
basic_string& operator+=(const charT* s);
In fact it simply appends the string pointed to by this pointer to the contents of the object of type std::string allocating additionly memory if required. For example internally the overloaded operator could use standard C function strcat
Conceptually it is similar to the following code snippet.
char s[12] = "Hello ";
const char *s2 = "World";
std::strcat( s, s2 );
Take into account that your program has undefined behaviour because total will be invalid after destroying local object s after exiting function append. So the next statemnent in main
printf("%s", total);
can result in undefined behaviour.
Related
My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.
What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.
In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.
The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.
The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.
Code :
#include <iostream>
using namespace std;
int main() {
string str("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
const char* temp;
temp = str.substr(0, str.length()).c_str();
printf(str.substr(0, str.length()).c_str());
printf(temp);
const char* test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
printf(test);
return 0;
}
Output:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
�$P
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Can someone explain this?
Your compiler should warn you about this line:
temp = str.substr(0, str.length()).c_str();
Warning C26815 The pointer is dangling because it points at a temporary instance which was destroyed.
What is happening, is that str.substr() is creating (and returning) a std::string object, but it's not being assigned to a variable, instead a pointer to its buffer is retrieved with c_str(), but the object itself is being deleted here as well (you can say 'abandoned').
Thus pointer to its buffer is no longer valid. Just by accident there are still some data that partially looks right. Thus you've got undefined behavior.
The way you are assigning temp is creating a dangling pointer. A dangling pointer is a pointer that points to invalid data, in this case, the invalid data is the sub-string you get from str.substring. The sub-string gets released because it is unused in the program, you can correct this by adding a new variable with the sub-string
#include <iostream>
using namespace std;
int main() {
string str("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
const char* temp;
//this is the importent line.
string substr = str.substr(0, str.length());
temp = substr.c_str();
printf(str.substr(0, str.length()).c_str());
printf(temp);
const char* test = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
printf(test);
return 0;
}
I found this (simplified) piece of code in our code base and it's leaving me feeling unpleasant. It either works, doesn't work, or is never called anyway. I would expect some buffer overflow, but when I try it in an online compiler it certainly doesn't work, but doesn't overflow either. I'm looking at the definition of strcat and it will write the source to the destination starting at its null terminator, but I am assuming in this scenario, the destination buffer (which was created as a std::string) should be too small..
#include <iostream>
#include "string.h"
using namespace std;
void addtostring(char* str){
char str2[12] = "goodbye";
strcat(str, str2);
}
int main()
{
std::string my_string = "hello";
addtostring((char*)my_string.c_str());
cout << my_string << endl;
return 0;
}
What would be the actual behaviour of this operation?
What would be the actual behaviour of this operation?
The behavior is undefined. First, writing to any character through c_str is undefined behavior. Secondly, had you used data instead to get a char*, overwriting the null terminator is also undefined behavior. Lastly, both c_str and data only give you a pointer (p) that has a valid range of elements from [p, p + size()]. Writing to any element outside that range is also undefined behavior.
If you want to modify the string you need to use the string's member/free functions to do so. Your function could be rewritten to
void addtostring(std::string& str){
str += "goodbye";
}
and that will have well defined behavior.
My question can be boiled down to, where does the string returned from stringstream.str().c_str() live in memory, and why can't it be assigned to a const char*?
This code example will explain it better than I can
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const char* cstr2 = ss.str().c_str();
cout << cstr1 // Prints correctly
<< cstr2; // ERROR, prints out garbage
system("PAUSE");
return 0;
}
The assumption that stringstream.str().c_str() could be assigned to a const char* led to a bug that took me a while to track down.
For bonus points, can anyone explain why replacing the cout statement with
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
prints the strings correctly?
I'm compiling in Visual Studio 2008.
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends. That's why your code prints garbage.
You could copy that temporary string object to some other string object and take the C string from that one:
const std::string tmp = stringstream.str();
const char* cstr = tmp.c_str();
Note that I made the temporary string const, because any changes to it might cause it to re-allocate and thus render cstr invalid. It is therefor safer to not to store the result of the call to str() at all and use cstr only until the end of the full expression:
use_c_str( stringstream.str().c_str() );
Of course, the latter might not be easy and copying might be too expensive. What you can do instead is to bind the temporary to a const reference. This will extend its lifetime to the lifetime of the reference:
{
const std::string& tmp = stringstream.str();
const char* cstr = tmp.c_str();
}
IMO that's the best solution. Unfortunately it's not very well known.
What you're doing is creating a temporary. That temporary exists in a scope determined by the compiler, such that it's long enough to satisfy the requirements of where it's going.
As soon as the statement const char* cstr2 = ss.str().c_str(); is complete, the compiler sees no reason to keep the temporary string around, and it's destroyed, and thus your const char * is pointing to free'd memory.
Your statement string str(ss.str()); means that the temporary is used in the constructor for the string variable str that you've put on the local stack, and that stays around as long as you'd expect: until the end of the block, or function you've written. Therefore the const char * within is still good memory when you try the cout.
In this line:
const char* cstr2 = ss.str().c_str();
ss.str() will make a copy of the contents of the stringstream. When you call c_str() on the same line, you'll be referencing legitimate data, but after that line the string will be destroyed, leaving your char* to point to unowned memory.
The std::string object returned by ss.str() is a temporary object that will have a life time limited to the expression. So you cannot assign a pointer to a temporary object without getting trash.
Now, there is one exception: if you use a const reference to get the temporary object, it is legal to use it for a wider life time. For example you should do:
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
stringstream ss("this is a string\n");
string str(ss.str());
const char* cstr1 = str.c_str();
const std::string& resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
cout << cstr1 // Prints correctly
<< cstr2; // No more error : cstr2 points to resultstr memory that is still alive as we used the const reference to keep it for a time.
system("PAUSE");
return 0;
}
That way you get the string for a longer time.
Now, you have to know that there is a kind of optimisation called RVO that say that if the compiler see an initialization via a function call and that function return a temporary, it will not do the copy but just make the assigned value be the temporary. That way you don't need to actually use a reference, it's only if you want to be sure that it will not copy that it's necessary. So doing:
std::string resultstr = ss.str();
const char* cstr2 = resultstr.c_str();
would be better and simpler.
The ss.str() temporary is destroyed after initialization of cstr2 is complete. So when you print it with cout, the c-string that was associated with that std::string temporary has long been destoryed, and thus you will be lucky if it crashes and asserts, and not lucky if it prints garbage or does appear to work.
const char* cstr2 = ss.str().c_str();
The C-string where cstr1 points to, however, is associated with a string that still exists at the time you do the cout - so it correctly prints the result.
In the following code, the first cstr is correct (i assume it is cstr1 in the real code?). The second prints the c-string associated with the temporary string object ss.str(). The object is destroyed at the end of evaluating the full-expression in which it appears. The full-expression is the entire cout << ... expression - so while the c-string is output, the associated string object still exists. For cstr2 - it is pure badness that it succeeds. It most possibly internally chooses the same storage location for the new temporary which it already chose for the temporary used to initialize cstr2. It could aswell crash.
cout << cstr // Prints correctly
<< ss.str().c_str() // Prints correctly
<< cstr2; // Prints correctly (???)
The return of c_str() will usually just point to the internal string buffer - but that's not a requirement. The string could make up a buffer if its internal implementation is not contiguous for example (that's well possible - but in the next C++ Standard, strings need to be contiguously stored).
In GCC, strings use reference counting and copy-on-write. Thus, you will find that the following holds true (it does, at least on my GCC version)
string a = "hello";
string b(a);
assert(a.c_str() == b.c_str());
The two strings share the same buffer here. At the time you change one of them, the buffer will be copied and each will hold its separate copy. Other string implementations do things different, though.
GetTypeName is std::string, the following code
printf("%#x\n", proto->GetTypeName().c_str());
printf("%s\n", proto->GetTypeName().c_str());
const char *res = proto->GetTypeName().c_str();
printf("%#x\n",res);
printf("%s\n",res);
produces this output:
0x90ef78
ValidTypeName
0x90ef78
ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■ю■←ЬЬQщZ
addresses are always the same; the following code (lines are exchanges)
const char *res = proto->GetTypeName().c_str();
printf("%#x\n",res);
printf("%s\n",res);
printf("%#x\n", proto->GetTypeName().c_str());
printf("%s\n", proto->GetTypeName().c_str());
produces this output, addresses are always different:
0x57ef78
Y
0x580850
ValidTypeName
What am I doing wrong?
strlen(res)
returns invalid size, so I can't even strcpy.
YourGetTypeName function is returning an std::string and you are calling c_str to get a pointer to the internal data in that string.
As it's a temporary the std::string you return will be deleted at the end of the statement
const char *res = proto->GetTypeName().c_str();
But you still have res pointing to the now deleted data.
Edit: Change your code to something like :-
const std::string& res = proto->GetTypeName();
and call .c_str() on that string in the printf like this :-
printf("%#x\n",res.c_str());
printf("%s\n",res.c_str());
Assigning a temporary to a reference extends the lifetime of that temporary to be the same as the lifetime of the reference...
Better still, just use std::string and iostream for printing and stop messing about with low level pointers when unnecessary :)