Why can we return char* from function? - c++

Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?
char* strA()
{
char str[] = "hello word";
return str;
}
char* strB()
{
char* str = "hello word";
return str;
}
int main()
{
cout<<strA()<<endl;
cout<<strB()<<endl;
}

Why does strB() work?
A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)
When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.
Why doesn't strA() work?
However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.
When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).

String literals exist for the life of the program.
String literals have static storage duration, and thus exist in memory for the life of the program.
That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.
On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.
BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)

case 1:
#include <stdio.h>
char *strA() {
char str[] = "hello world";
return str;
}
int main(int argc, char **argv) {
puts(strA());
return 0;
}
The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):
You can kinda cheat this with a continuation:
#include <stdio.h>
void strA(void (*continuation)(char *)) {
char str[] = "hello world";
continuation(str);
}
void myContinuation(char *arg) {
puts(arg);
}
int main(int argc, char **argv) {
strA(myContinuation);
return 0;
}
case 2:
If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.
#include <stdio.h>
char *strB() {
char *str = "hello world";
return str;
}
int main(int argc, char **argv) {
char *myStr;
// comment the line below and replace it with
// result of &myStr[0], in my case, result of &myStr[0] is 4231168
printf("is your string: %s.\n", (char *)4231168);
myStr = strB();
printf("str is at: %lld\n", &myStr[0]);
return 0;
}
You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.
#include <stdio.h>
typedef struct {
char data[81];
} MY_STRING;
MY_STRING strC() {
MY_STRING str = {"what year is this?"};
return str;
}
int main(int argc, char **argv) {
puts(strC().data);
printf("size of strC's return: %d.\n", sizeof(strC()));
return 0;
}
tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).

Related

Error returning char* in c++ function [duplicate]

This question already has answers here:
How to access a local variable from a different function using pointers?
(10 answers)
Closed 6 years ago.
I have this function:
char* return_string(){
char buffer[] = "Hi world!";
return buffer;
}
bool test08()
{
char compare[] = "Hi world!";
int result = strcmp(compare,return_string());
if (result == 0) return true;
return false;
}
int main()
{
if(test08) printf("\nTRUE");
else printf("\nFALSE");
}
Why this code run in c++ Shell and it doesn't in codeblocks v. 13.12 (Segmentation fault); it'll work if i change my char buffer[]= declaration to char *buffer=; i'm a beginner at C++ (easy to know) so please be clear...
Just change the function return_string the following way
const char* return_string(){
const char *buffer = "Hi world!";
return buffer;
}
The problem with the original function implementation is that the array buffer is a local array of the function with the automatic storage duration that will not be alive after exiting the function.
In the modified function there is used a string literal that has static storage duration. So you may return a pointer to the first character of the string literal.
The function test08 can be written simpler
bool test08()
{
char compare[] = "Hi world!";
return strcmp( compare, return_string() ) == 0;
}
This:
char* return_string(){
char buffer[] = "Hi world!";
return buffer;
}
copies the string "Hi world!" into the local variable buffer and then returns that local variable - this results in undefined behaviour if you try to use that return value, because the buffer is discarded on function exit.
This:
char* return_string(){
char *buffer = "Hi world!";
return buffer;
}
would actually be OK, though probably not what you want. You would be returning the address of the start of a string literal (stored in a mysterious place, re the standard), which is OK.In C++, you would have problems with const.
Your function return_string is returning pointer to a local variable buffer that is defined only in the scope of return_string. When the function returns, you return the address of where buffer used to be stored while the value stored in that address is no longer valid.
Well, if you insist on returning a pointer to character, you can dynamically allocate "Hello World" and remember to free it later:
char * return_string() {
char local_var[] = "hello world";
char *buffer = (char *) malloc(sizeof(local_var));
strcpy(buffer, local_var); // we know that buffer is big enough to hold local_var
return buffer;
}
Since the memory in which "hello world" is stored is not local to the function, it won't be released when the function exits.
Another (better) solution is to pass a statically allocated buffer (hopefully large enough to hold your value) to return_string and modify its content there.
void return_string(char *buffer, size_t buffer_size) {
strncpy(buffer, "hello world", buffer_size);
}
You are returning a pointer to a buffer but the buffer variable gets destroyed after the function returns.
You need to make your buffer variable static.
This is an answer at all, but still...
I've tried on CodeBlocks 16.01 (mingw) and everything works smoothly.
So a quick suggestion, always try working with the upgraded versions of any software, especially if you still in learning process.

char pointer parameter different behaviour

I have the following code:
void uppercase(char *sir)
{
for(int i=0;i<strlen(sir);i++)
{
sir[i]=(char)toupper(sir[i]);
}
}
int _tmain(int argc, _TCHAR* argv[])
{
//char lower[]="u forgot the funny"; this works
//char *lower="u forgot the funny"; this gives me a runtime error
uppercase(lower);
cout<<lower<<"\n\n";
system("PAUSE");
return 0;
}
I have noted that if I run with the char vector it works.
When I try to run with the second method it generates a runtime error.
I would like to know the reason for this behaviour please.
You cannot modify string literals; doing so (as in your second case) is undefined behaviour.
char x[] = "foo";
creates a character array containing the characters f,o,o,\0. It's basically a mutable copy of the string.
char *x = "foo";
creates a string pointer pointing to the "foo" string literal. The literal may live in some read-only memory, in the program memory, or in a constant pool. Writing to it is undefined behaviour. Also, not that the type of a string literal is always const char[], so assigning it to a char * is violating const-correctness.
The former creates a character array which can be mutated, the latter is a pointer to fixed memory (which cannot be manipulated)

C++ const char* pointer assignment

I must have missed an obvious fact here -- haven't been programming C++ for a while. Why can't I print the c-style string after assigning it to a const char* variable? But if I try to print it directly without assigning it works fine:
#include "boost/lexical_cast.hpp"
using namespace std;
using boost::lexical_cast;
int main (int argc, char** argv)
{
int aa=500;
cout << lexical_cast<string>(aa).c_str() << endl; // prints the string "500" fine
const char* bb = lexical_cast<string>(aa).c_str();
cout << bb << endl; // prints nothing
return EXIT_SUCCESS;
}
The C String returned by c_str is only usable while the std::string from which it was obtained exists. Once that std::string is destroyed, the C String is gone too. (At that point, attempting to use the C String yields undefined behavior.)
Other operations may also invalidate the C String. In general, any operation that modifies the string will invalidate the pointer returned by c_str.
c_str function is called on the result of the temporary string which is created from the lexical_cast. Since you don't save it, the string is destroyed at the end of that expression and thus accessing the pointer to the c_str of the string that has been destroyed is undefined behaviour.

Why is main() argument argv of type char*[] rather than const char*[]?

When I wrote the following code and executed it, the compiler said
deprecated conversion from string constant to char*
int main()
{
char *p;
p=new char[5];
p="how are you";
cout<< p;
return 0;
}
It means that I should have written const char *.
But when we pass arguments into main using char* argv[] we don't write const char* argv[].
Why?
Because ... argv[] isn't const. And it certainly isn't a (static) string literal since it's being created at runtime.
You're declaring a char * pointer then assigning a string literal to it, which is by definition constant; the actual data is in read-only memory.
int main(int argc, char **argv) {
// Yes, I know I'm not checking anything - just a demo
argv[1][0] = 'f';
std::cout << argv[1] << std::endl;
}
Input:
g++ -o test test.cc
./test hoo
Output:
foo
This is not a comment on why you'd want to change argv, but it certainly is possible.
Historical reasons. Changing the signature of main() would break too much existing code. And it is possible that some implementations allow you to change the parameters to main from your code. However code like this:
char * p = "helllo";
* p = 'x';
is always illegal, because you are not allowed to mess with string literals like that, so the pointer should be to a const char.
why is it required for char* to be constant while assigning it to a string
Because such literal strings (like "hi", "hello what's going on", etc), are stored in the read-only segment of your exe. As such, the pointers that point to them need to point to constant characters (eg, can't change them).
You are assigning a string constant (const char*) to a pointer to a non-constant string (char *p). This would allow you to modify the string constant, e.g. by doing p[0] = 'n'.
Anyway, why don't you use std::string instead ? (you seem to be using C++).
If you look at execution functions like execve, you will see that they actually don't accept const char* as parameters, but do indeed require char*, therefore you can't use a string constant to invoke main.

Returning value from a function

const char *Greet(const char *c) {
string name;
if(c)
name = c;
if (name.empty())
return "Hello, Unknown";
return name.c_str();
}
int _tmain(int argc, _TCHAR* argv[])
{
cout << Greet(0) << '\t' << Greet("Hello, World") << endl;
return 0;
}
I see 2 bugs with the above code.
Returning c_str from a string object that is defined local to the function. String gets destroyed when function returns and clearly c_str() will point to some memory that is de-allocated.
Returning "Hello, Unknown" from within the function. This is again an array of const chars allocated in the stack which should get de-allocated as well when the function returns. However, it does not and I am guessing that is because of Return Value Optimization.
Is my above understanding correct?
PS: I tested the above code with both gcc and MSVC10. GCC runs the above code fine and does not generate any runtime errors or undefined behaviors both for the string object as well as for the constant string. MSVC10 displays garbage data for the string object but prints the constant string correctly.
Number 1 is correct. The pointer returned from c_str() is invalidated when name is destroyed. Dereferencing the pointer after name results in undefined behavior. In your tests, under gcc it appears to work; under Visual C++ it prints garbage. Any results are possible when the behavior is undefined.
Number 2 is incorrect. "Hello, Unknown" is a string literal. String literals have static storage duration (they exist from when the program starts up to when it terminates. You are returning a pointer to this string literal, and that pointer is valid even after the function returns.
String literals have static storage, so are not deallocated at the end of the function.