address of local variable - c++

I have a hard time understanding the difference between these three:
const char * f() {
return "this is a test";
}
const char * g() {
const char * str = "test again";
return str;
}
const double * h() {
const double a = 2.718;
return &a;
}
I get a warning for the h(), as warning: address of local variable ‘a’ returned. Which makes sense, but I do not understand why the compiler (gcc -Wall) is ok with the f() and g() function.
Isn't there a local variable there?
When and how does the pointer returned by f() or g() gets deallocated?

String literals are not stored in the local stack frame. They live in a fixed place in your executable. Contrast:
const char * g() {
const char * p = "test again";
return p;
}
with
const char * g() {
const char a[] = "test again";
return a;
}
In the former, the return value points to a fixed place in your executable. In the latter, the return value points to (a now invalid location in) the stack.

It's string literals.
n3337 2.14.5/8
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow
string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration

const char * g() {
const char * str = "test again";
return str;
}
This doesn't return the address of local variable. The variable is str, and therefore it address should be &str which would be different from str itself:
std::cout << (void*) str << std::endl;
std::cout << (void*) &str << std::endl; //address of str (local variable)
They would print different values!
So a more apt example would be this:
const char ** g() {
const char * str = "test again";
return &str; //difference!
}
Now it returns the address of the local variable. A good compiler may issue warning for this.
Another example would be this:
const char * g() {
const char str[] = "test again"; //difference!
return str; //same as before
}
Now even though you return str which doesn't seem to be the address of the local variable, it may give warning, as in this case, the value of str and &str would be exactly same! Try printing these now:
std::cout << (void*) str << std::endl;
std::cout << (void*) &str << std::endl; //address of str (local variable)
They would print the same value!

The string literals aren't local variables. The string equivalent of the third function is this
const char * f() {
const char str[] = "this is a test";
return str;
}

In the function h, a is a local variable that won't exist after the function returns. You're returning a pointer to that variable, and so dereferencing the pointer outside the function is incorrect, and undefined behavior.
In f and g you're returning literal strings. Literal strings have static storage: they aren't allocated on the stack, and they'll exist beyond the lifetime of the functions.
In the definition of g:
const char *g()
{
const char *str = "test again";
return str;
}
str is a local variable, but it's a pointer to non-local - statically allocated - memory. It's that address that you're returning, not a reference to the local variable.
Consider another definition of g:
const char *g()
{
const char str[] = "test again";
// incorrect: can't use str after the return:
return str;
}
Now g has the same problem as your function h, and when compiling it you should see the same warning about returning the address of a local variable.

Storage allocation for string literals is static, that is why you don't get a warning.
Try this and you will get undefined behavior:
const char* getFoo()
{
std::string foo("hi");
return foo.c_str();
}
Because the string foo made a copy of the literal string.

These strings are physically and permanently placed inside your data memeory, so their addresses are permanent. The automatic variable is on the stack, so it will disappear the moment you return from the call.

Related

c++ change array by index inside function

void changeArray(char* str1) {
str1[0] = 'f';
}
int main() {
char* msg1 = "andrew";
changeArray(msg1);
cout << msg1 << endl;
return 0;
}
Hi guys,i dont understand why i'm getting segmentation fault. pointers cannot be accessed by index inside functions? (C++)
You're trying to modify string literal, which leads to undefined behavior.
Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:
const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior
And, char* msg1 = "andrew"; is not allowed since C++11,
In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.
You can construct and pass a char array instead.
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
E.g.
int main() {
char msg1[] = "andrew";
changeArray(msg1);
cout << msg1 << endl;
return 0;
}
In int main() you declared msg1 as a pointer to a char, not as an array of chars. Do this: char msg1[] = "andrew";.

c++ does static allocation of string literals apply to any const char *

In Stroustrup C++ 4thEd p176, he states that this code is safe because string literals are allocated statically
const char∗ f() { return "some literal"; }
I have two questions about that :
First, at which revision of C++ did that became the case? ( and is it implementation dependent ? )
Second, does this extend to any "const char*" ?
on this second part, I guess the answer is no, since this caused a run time error:
const char* make_const_char_ptr(){
const char res [] = {'a','b','c', '\0' };
return res;
}
First, at which revision of C++ did that became the case?
Always been the case since C++ was standardised. String literals have static storage duration.
Second, does this extend to any "const char*" ?
Yes, if you the the returned object has static storage. But that's not the case in your example. res has automatic storage duration. Instead you can do:
const char* make_const_char_ptr() {
static const char res [] = { 'a', 'b', 'c', '\0' };
return res;
}
and it's valid.
The statement refers to literals NOT to any const char*, you've got to mind the difference between a const char* and a literals.
The first line of code you mentioned:
const char∗ f() { return "some literal"; }
contains a literal "some literal" the compiler interpret that by statically allocating that string represented.
The other piece of code:
const char* make_const_char_ptr(){
const static char res [] = {'a','b','c', '\0' };
return res;
}
is interpreted as to allocate an array of char and then return the pointer to the begin of that array, but as soon the program returns from the function the array is being deallocated (so it is not safe to access any more to the address pointed by the returning value. Type of return is the same for both function but in first case is linked to a literal (statically allocated, not destroyed on return) in the other case you tell compiler to allocate an array which scope is confined to the function itself (contrary to the static case of the literal).
To return a reference or a pointer to a variable with the automatic storage duration from a function invokes undefined behavior because after exiting the function the referenced or pointed variable will not be alive.
String literals have static storage duration so you may return a pointer or a reference to a string literal.
Also you can return a reference or a pointer to a function local variable that is declared with the storage specifier static.
For example using this function definition
const char* make_const_char_ptr(){
const char res [] = {'a','b','c', '\0' };
return res;
}
this code snippet
const char *s = make_const_char_ptr();
size_t n = strlen( s );
invokes undefined behavior because the pointed string stored in the array res with the automatic storage duration will not be alive.
However if the function will be defined the following way
const char* make_const_char_ptr(){
const static char res [] = {'a','b','c', '\0' };
return res;
}
then this code snippet
const char *s = make_const_char_ptr();
size_t n = strlen( s );
is correct because the array res is still alive. The qualifier const does not matter in this case. That is it does not influence on the life time of the array.
Pay attention to that it is unimportant how the array is initialized: whether using a string literal or an initializer list of characters.
This declaration within the function
const char res [] = { "abc" };
has automatic storage duration if the storage specifier static is not present.
And below there is a demonstrative program where a string literal is returned from a function by reference.
#include <iostream>
#include <type_traits>
decltype( auto ) f()
{
return ( "hello World" );
}
void g( const char *s )
{
std::cout << s << '\n';
}
int main()
{
decltype( auto ) s = f();
g( s );
std::cout << std::extent<std::remove_reference<decltype( s )>::type>::value << '\n';
return 0;
}
The program output is
hello World
12

Can you safely get a pointer to a string from its c_str() const char*?

I have a const char pointer which I know for sure came from a string. For example:
std::string myString = "Hello World!";
const char* myCstring = myString.c_str();
In my case I know myCstring came from a string, but I no longer have access to that string (I received the const char* from a function call, and I cannot modify the function's argument list).
Given that I know myCstring points to contents of an existing string, is there any way to safely access the pointer of the parent string from which it originated? For example, could I do something like this?
std::string* hackyStringPointer = myCstring - 6; //Along with whatever pointer casting stuff may be needed
My concern is that perhaps the string's contents possibly cannot be guaranteed to be stored in contiguous memory on some or all platforms, etc.
Given that I know myCstring points to contents of an existing string, is there any way to safely access the pointer of the parent string from which it originated?
No, there is no way to obtain a valid std::string* pointer from a const char* pointer to character data that belongs to a std::string.
I received the const char* from a function call, and I cannot modify the function's argument list
Your only option in this situation would be if you can pass a pointer to the std::string itself as the actual const char* pointer, but that will only work if whatever is calling your function does not interpret the const char* in any way (and certainly not as a null-terminated C string), eg:
void doSomething(void (*func)(const char*), const char *data)
{
...
func(data);
...
}
void myFunc(const char *myCstring)
{
std::string* hackyStringPointer = reinterpret_cast<std::string*>(myCstring);
...
}
...
std::string myString = "Hello World!";
doSomething(&myFunc, reinterpret_cast<char*>(&myString));
You cannot convert a const char* that you get from std::string::c_str() to a std::string*. The reason you can't do this is because c_str() returns a pointer to the string data, not the string object itself.
If you are trying to get std::string so you can use it's member functions then what you can do is wrap myCstring in a std::string_view. This is a non-copying wrapper that lets you treat a c-string like it is a std::string. To do that you would need something like
std::string_view sv{myCstring, std::strlen(myCstring)};
// use sv here like it was a std::string
Yes (it seems), although I agree that if I need to do this it's likely a sign that my code needs reworking in general. Nevertheless, the answer seems to be that the string pointer resides 4 words before the const char* which c_str() returns, and I did recover a string* from a const char* belonging to a string.
#include <string>
#include <iostream>
std::string myString = "Hello World!";
const char* myCstring = myString.c_str();
unsigned int strPtrSize = sizeof(std::string*);
unsigned int cStrPtrSize = sizeof(const char*);
long strAddress = reinterpret_cast<std::size_t>(&myString);
long cStrAddress = reinterpret_cast<std::size_t>(myCstring);
long addressDifference = strAddress - cStrAddress;
long estStrAddress = cStrAddress + addressDifference;
std::string* hackyStringPointer = reinterpret_cast<std::string*>(estStrAddress);
cout << "Size of String* " << strPtrSize << ", Size of const char*: " << cStrPtrSize << "\n";
cout << "String Address: " << strAddress << ", C String Address: " << cStrAddress << "\n";
cout << "Address Difference: " << addressDifference << "\n";
cout << "Estimated String Address " << estStrAddress << "\n";
cout << "Hacky String: " << *hackyStringPointer << "\n";
//If any of these asserts trigger on any platform, I may need to re-evaluate my answer
assert(addressDifference == -4);
assert(strPtrSize == cStrPtrSize);
assert(hackyStringPointer == &myString);
The output of this is as follows:
Size of String* 4, Size of const char*: 4
String Address: 15725656, C String Address: 15725660
Address Difference: -4
Estimated String Address: 15725656
Hacky String: Hello World!
It seems to work so far. If someone can show that the address difference between a string and its c_str() can change over time on the same platform, or if all members of a string are not guaranteed to reside in contiguous memory, I'll change my answer to "No."
This reference says
The pointer returned may be invalidated by further calls to other member functions that modify the object.
You say you got the char* from a function call, this means you do not know what happens to the string in the mean time, is that right? If you know that the original string is not changed or deleted (e.g. gets out of scope and thus is destructed) then you can still use the char*.
Your example code however has multiple problems. You want to do this:
std::string* hackyStringPointer = myCstring - 6;
but I think you meant
char* hackyStringPointer = myCstring;
One, you cannot cast the char* to a string* and second you do not want to go BEFORE the start of the char*. The char* points to the first character of the string, you can use it to access the characters up to the trailing 0 character. But you should not go before the first or after the trailing 0 character though, as you do not know what is in that memory or if it even exists.

cout char* is different even when declared constant

I have a very simple code snippet:
#include <iostream>
using namespace std;
string getString() {
return "test";
}
int main(){
const char* testString = getString().c_str();
cout << "string 1:" << testString << endl;
string dummy[] = {"1","2","0"};
cout << "string 2:" << testString << endl;
return 0;
}
I expect the two couts will print the same output, but the output I got is
string 1:test
string 2:1
Can anyone explain why is this happening? Also, there are two things that I observed:
1) If dummy[] is of int type, then they will print the exact same strings test as expected.
2) If I first assign getString() to a string variable, then change the first line in main to const char* testString = variable.c_str(); then they will cout the same strings as expected.
The behavior is undefined.
const char* testString = getString().c_str();
getString returns a temporary object which is destroyed when the evaluation completes. As the result. testString points at internals of a destroyed object, causing undefined behavior.
In practice, it may happen that the data is still at that address for some time, that's why the first cout gives illusion of correctness.
You set pointer to a temporary object that will be deleted after this declaration
const char* testString = getString().c_str();
So the program has undefined behaviour.
The correct code could look like
const char * getString() {
return "test";
}
int main(){
const char* testString = getString();
//...
because string literals have static storage duration.
When you get a low level character pointer from a string object that manages the memory for that string, the pointer is only good for as long as that specific object is alive.
It's more narrow than that, actually. If you call any non-const members on the string object, it means you can't trust any values you got from previous calls to c_str() to still be good--even if the object destructor has not run.
#include <iostream>
using namespace std;
string getString() {
return "test";
}
int main(){
string testString = getString();
const char * testCstring = testString.c_str();
cout << "string 1:" << testCstring << endl;
string dummy[] = {"1","2","0"};
cout << "string 2:" << testCString << endl;
return 0;
}
That's legal, but do not rely upon the pointer from c_str() after you have made any modifying change to the string you got it from -or- that string having been destroyed.
Also note there's no need to get a char * from a string in order to output it. Start thinking in terms of using string objects and don't go to char * unless you have a good reason.

How does a function detect string pointer vs string literal argument?

I have encountered a function, such that it can differentiate between being called as
foo("bar");
vs
const char *bob = "bar";
foo(bob);
Possibilities I have thought of are:
Address of string: both arguments sat in .rdata section of the image. If I do both calls in the same program, both calls receive the same string address.
RTTI: no idea how RTTI can be used to detect such differences.
The only working example I could conjure up is:
void foo(char *msg)
{
printf("string literal");
}
void foo(const char *&msg)
{
printf("string pointer");
}
foo("bar"); // "string literal"
const char *soap = "bar";
foo(soap); // "string pointer"
I do not have access to the function's code, and the declarations in the header file only revealed one function declaration.
Here's another way to distinguish between a string literal and a pointer, based on the fact that string literals have array type, not pointer type:
#include <iostream>
void foo(char *msg)
{
std::cout << "non-const char*\n";
}
void foo(const char *&msg) // & needed, else this is preferred to the
// template function for a string literal
{
std::cout << "const char*\n";
}
template <int N>
void foo(const char (&msg)[N])
{
std::cout << "const char array reference ["<< N << "]\n";
}
int main() {
foo("bar"); // const char array reference [4]
}
But note that all of them (including your original function) can be "fooled" by passing something that isn't a string literal:
const char *soap = 0;
foo(soap);
char *b = 0;
foo(b);
const char a[4] = {};
foo(a);
There is no type in C++ which is unique to string literals. So, you can use the type to tell the difference between an array and a pointer, but not to tell the difference between a string literal and another array. RTTI is no use, because RTTI exists only for classes with at least one virtual member function. Anything else is implementation-dependent: there is no guarantee in the standard that string literals will occupy any particular region of memory, or that the same string literal used twice in a program (or even in a compilation unit) will have the same address. In terms of storage location, anything that an implementation can do with string literals, it is permitted also to do with my array a.
The function foo() in theory could use a macro to determine if the argument was a literal or not.
#define foo(X) (*#X == '"'
? foo_string_literal(X)
: foo_not_string_literal(X))
And what happens if you call it as:
const char bob[] = "bar";
foo(bob);
It's probably using some sort of distinction like that to make the determination.
EDIT: If there's only one function declaration in the header I can't conceive of any portable way the library could make that distinction.