const char *Greet(const char *c) {
string name;
if(c)
name = c;
if (name.empty())
return "Hello, Unknown";
return name.c_str();
}
int _tmain(int argc, _TCHAR* argv[])
{
cout << Greet(0) << '\t' << Greet("Hello, World") << endl;
return 0;
}
I see 2 bugs with the above code.
Returning c_str from a string object that is defined local to the function. String gets destroyed when function returns and clearly c_str() will point to some memory that is de-allocated.
Returning "Hello, Unknown" from within the function. This is again an array of const chars allocated in the stack which should get de-allocated as well when the function returns. However, it does not and I am guessing that is because of Return Value Optimization.
Is my above understanding correct?
PS: I tested the above code with both gcc and MSVC10. GCC runs the above code fine and does not generate any runtime errors or undefined behaviors both for the string object as well as for the constant string. MSVC10 displays garbage data for the string object but prints the constant string correctly.
Number 1 is correct. The pointer returned from c_str() is invalidated when name is destroyed. Dereferencing the pointer after name results in undefined behavior. In your tests, under gcc it appears to work; under Visual C++ it prints garbage. Any results are possible when the behavior is undefined.
Number 2 is incorrect. "Hello, Unknown" is a string literal. String literals have static storage duration (they exist from when the program starts up to when it terminates. You are returning a pointer to this string literal, and that pointer is valid even after the function returns.
String literals have static storage, so are not deallocated at the end of the function.
Related
I saw a strange behavior the other day.
So I wanted to store lines(present in a vector) in a char array and wanted to use '\n' as delimiter.
I know c_str() method in string class returns a pointer to a char array ending in '\0'.
Based on my experience/understanding of C++.(see greet0 and greet2 functions).
I assumed it should work but it didn't.
Can anyone explain the different behavior in three greet functions? What is the the scope of the object mentioned in each of the greet function?
(also i had a guess that the string object was destroyed in greet1 function but if that would have been the case there should be segmentation fault in cout<<"greet1:"<<w1<<endl; but that does not happen so what exactly is happening in background).
//The snippet that where i first encountered the issue.
const char* concatinated_str(std::vector<std::string> lines, const char *delimiter)
{
std::stringstream buf;
std::copy(lines.begin(), lines.end(), std::ostream_iterator<std::string>(buf, delimiter));
string w = buf.str();
const char *ret = w.c_str();
return ret;
}
//Implementation 0
string greet0(){
string msg = "hello";
return msg;
}
//Implementation 1
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
//Implementation 2
const char* greet2(){
const char* msg = "hello";
return msg;
}
int main(){
auto w0 = greet0();
cout<<&w0<<endl;
cout<<"greet0:"<<w0<<endl;
auto w1 = greet1();
cout<<"greet1:"<<w1<<endl;
const char* w2 = greet2();
cout<<"greet2:"<<w2<<endl;
}
Output:
0x7fff0ff3e8e0
0x7fff0ff3e8e0
greet0:hello
greet1:
greet2:hello
Returning a std::string or the pointer to a string-literal by value is perfectly fine.
Using the return-value of greet1() though has Undefined Behavior because the std::string whose elements you try to print died at the end of its enclosing function, leaving the returned pointer dangling.
What happens if you dereference a dangling pointer is not defined, acting as if you had a pointer to an empty string due to storage being re-used being one of the more benign possibilities.
As an aside, the address of a std::string is rarely that interesting to someone executing your program, though printing it is perfectly fine.
In statements cout<<&w0<<endl; cout<<&msg<<endl; you're outputting a pointer to std::string. Remove the & to actually print string, not its address. IF you're mystified by same result for two different objects, that might be because of they are addresses of local variables. The memory could be reused as those objects are limited in their lifetime not necessary have unique locations.
In greet0 technically msg is a local variable and stops existing on exit from function but compiler may optimize returned value and instead of copying msg to outside, the actual code would form a proper object at destination w0. With newer compilers Returned Value Optimization is guaranteed.
In function
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
msg here is a function-local variable, so it represents an object that stops existing at end of scope containing it, i.e. after function had returned. After return line the pointer taken from c_str() is dangling, because that method returns a pointer to the internal storage of std::string. The storage of msg was destroyed and you're invoking Undefined Behaviour by accessing it. Segmentation fault (which is purely Linux event by the way, mechanics in Windows are different) is possible outcome but not necessary.
In third function
const char* greet2(){
const char* msg = "hello";
return msg;
}
msg points to a array containing the constant string "hello". Constant strings created by string literals have same lifespan as a global static object. Those strings are formed during compilation. Exiting function doesn't invalidate the pointer, you still can dereference it because string still exists.
The only code that invokes undefined behavior is related to this function
#Implementation 1
const char* greet1(){
string msg = "hello";
cout<<&msg<<endl;
return msg.c_str();
}
The local object msg of the type std::string will not be alive after exiting the function. It will be destroyed. So the function returns an invalid pointer.
In this function implementation
#Implementation 2
const char* greet2(){
const char* msg = "hello";
return msg;
}
there is returned a pointer to the first character of the string literal "hello" that has static storage duration. It means that the string literal will be alive after exiting the function. Thus the function returns a valid pointer.
This function
#Implementation 0
string greet0(){
string msg = "hello";
return msg;
}
returns a temporary object of the type std::string that is moved (possibly with the move elision) to the variable w0 in main
auto w0 = greet0();
So this function is correct.
Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?
char* strA()
{
char str[] = "hello word";
return str;
}
char* strB()
{
char* str = "hello word";
return str;
}
int main()
{
cout<<strA()<<endl;
cout<<strB()<<endl;
}
Why does strB() work?
A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)
When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.
Why doesn't strA() work?
However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.
When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).
String literals exist for the life of the program.
String literals have static storage duration, and thus exist in memory for the life of the program.
That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.
On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.
BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)
case 1:
#include <stdio.h>
char *strA() {
char str[] = "hello world";
return str;
}
int main(int argc, char **argv) {
puts(strA());
return 0;
}
The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):
You can kinda cheat this with a continuation:
#include <stdio.h>
void strA(void (*continuation)(char *)) {
char str[] = "hello world";
continuation(str);
}
void myContinuation(char *arg) {
puts(arg);
}
int main(int argc, char **argv) {
strA(myContinuation);
return 0;
}
case 2:
If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.
#include <stdio.h>
char *strB() {
char *str = "hello world";
return str;
}
int main(int argc, char **argv) {
char *myStr;
// comment the line below and replace it with
// result of &myStr[0], in my case, result of &myStr[0] is 4231168
printf("is your string: %s.\n", (char *)4231168);
myStr = strB();
printf("str is at: %lld\n", &myStr[0]);
return 0;
}
You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.
#include <stdio.h>
typedef struct {
char data[81];
} MY_STRING;
MY_STRING strC() {
MY_STRING str = {"what year is this?"};
return str;
}
int main(int argc, char **argv) {
puts(strC().data);
printf("size of strC's return: %d.\n", sizeof(strC()));
return 0;
}
tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).
I have three functions like this:
MyStruct foo() {
//do something...
return get_var("string literal");
}
MyStruct get_var(const string &literal) {
return (MyStruct) {some_attribute, &*literal.begin(), literal.size()}; //struct needs const char*
}
void bar() {
Mystruct var;
//do stuff
var = foo();
std::cout << var.string_attribute;
}
This should print "string literal", but instead, the first half of the string is a random jumble of characters.
If I do this:
MyStruct get_var(const string &literal) {
std::cout << literal;
return (MyStruct) {some_attribute, &*literal.begin(), literal.size()}; //struct needs const char*
}
It prints correctly only the first time. and if I do this:
MyStruct foo() {
//do something...
string my_literal = "string literal";
std::cout << my_literal;
return get_var(my_literal);
}
It prints correctly the first and second times, but not the third. I have no idea what's happening; I thought string literals lasted forever, so it shouldn't be overwritten or anything.
Any help is greatly appreciated.
c++ is an old language that grew out of C, the result is that both the behavior and the terminology used to describe that behavior can be rather confusing.
A "string literal" is a sequence of characters in the source code surrounded by quotes. In most contexts it evaluates to a pointer to a null-terminated sequence of characters (a "C string"). Under normal circumstances* said sequence of characters will indeed remain valid for the entire lifetime of the progream.
The type string in your code on the other hand is probably referring to std::string (via using namespace std somewhere) which is a class representing an automatically managed string
When you do get_var("string literal"); or string my_literal = "string literal"; the "C string" is implicitly converted to a std::string. This operation creates a copy of the sequence of characters. Unlike the original sequence of characters this sequence of characters will be freed when the std::string that owns it is destroyed.
&*literal.begin is a somewhat unorthadox way to get a pointer to the sequence of characters owned by the std::string. using c_str would be more normal. That isn't relevant to your problem though. The important bit is the sequence of characters in memory is one owned by the std::string, not the original sequence from the string literal.
In the case of get_var("string literal"); the std::string is destroyed as soon as the statement completes. In the case of string my_literal = "string literal"; it is destroyed when the variable my_literal goes out of scope. Either way it is destroyed before foo() returns. So when you do std::cout << var.string_attribute; you are referencing a stale pointer for which the associated memory has already been freed.
The reason it works "sometimes" is that memory managers do not generally overwrite memory as soon as it is freed. Typically the memory is not actually overwritten until something re-uses it.
Edit: misread your question. It is possible for a use-after free to "work" sometimes but that is not what is going on here. The cout calls you say are working are at points in the code where the std::string is still alive.
* Excluding cases like unloading shared libraries at runtime that are beyond the scope of the C standard.
Enable maximum compiler warnings. It should alert you to the fact that you're trying to return a pointer to a temporary.
The line string my_literal = "string literal"; creates a string, then passes a const reference to that string into the function. Then at the end of foo(), my_literal is DESTROYED. It is GONE. Any pointers to that are now INVALID.
Absolutely any bad thing can happen after that, it is undefined behavior.
I learned that when I initialize an array of chars it's just like initializing a pointer to chars.
But, if that is the situation, why does the following code output strange characters?
char* returnMe()
{
char text[] = "Will I live forever?";
return text;
}
While the following code:
char* returnMe()
{
char* text = "Will I live forever?";
return text;
}
outputs:
Will I live forever?
What exactly are the differences between these two initializations?
They both act like pointers, so if I do:
puts(X); //puts get char* as a parameter in it.
It will work for both cases (When I haven't gone out of scope yet.)
The function containing this:
char text[] = "Will I live forever?";
returns a pointer to a local variable called text, which contains the string "Will I live forever?". As with all local variables, that variable evaporates after the function returns, and so you are off in undefined behaviour land if you try to access it.
The function containing this:
char* text = "Will I live forever?";
returns a pointer to some magic place in memory (but not a local variable) containing the string, which persists for the program's execution, so using that pointer is not undefined behaviour.
I must have missed an obvious fact here -- haven't been programming C++ for a while. Why can't I print the c-style string after assigning it to a const char* variable? But if I try to print it directly without assigning it works fine:
#include "boost/lexical_cast.hpp"
using namespace std;
using boost::lexical_cast;
int main (int argc, char** argv)
{
int aa=500;
cout << lexical_cast<string>(aa).c_str() << endl; // prints the string "500" fine
const char* bb = lexical_cast<string>(aa).c_str();
cout << bb << endl; // prints nothing
return EXIT_SUCCESS;
}
The C String returned by c_str is only usable while the std::string from which it was obtained exists. Once that std::string is destroyed, the C String is gone too. (At that point, attempting to use the C String yields undefined behavior.)
Other operations may also invalidate the C String. In general, any operation that modifies the string will invalidate the pointer returned by c_str.
c_str function is called on the result of the temporary string which is created from the lexical_cast. Since you don't save it, the string is destroyed at the end of that expression and thus accessing the pointer to the c_str of the string that has been destroyed is undefined behaviour.