c++ does static allocation of string literals apply to any const char * - c++

In Stroustrup C++ 4thEd p176, he states that this code is safe because string literals are allocated statically
const char∗ f() { return "some literal"; }
I have two questions about that :
First, at which revision of C++ did that became the case? ( and is it implementation dependent ? )
Second, does this extend to any "const char*" ?
on this second part, I guess the answer is no, since this caused a run time error:
const char* make_const_char_ptr(){
const char res [] = {'a','b','c', '\0' };
return res;
}

First, at which revision of C++ did that became the case?
Always been the case since C++ was standardised. String literals have static storage duration.
Second, does this extend to any "const char*" ?
Yes, if you the the returned object has static storage. But that's not the case in your example. res has automatic storage duration. Instead you can do:
const char* make_const_char_ptr() {
static const char res [] = { 'a', 'b', 'c', '\0' };
return res;
}
and it's valid.

The statement refers to literals NOT to any const char*, you've got to mind the difference between a const char* and a literals.
The first line of code you mentioned:
const char∗ f() { return "some literal"; }
contains a literal "some literal" the compiler interpret that by statically allocating that string represented.
The other piece of code:
const char* make_const_char_ptr(){
const static char res [] = {'a','b','c', '\0' };
return res;
}
is interpreted as to allocate an array of char and then return the pointer to the begin of that array, but as soon the program returns from the function the array is being deallocated (so it is not safe to access any more to the address pointed by the returning value. Type of return is the same for both function but in first case is linked to a literal (statically allocated, not destroyed on return) in the other case you tell compiler to allocate an array which scope is confined to the function itself (contrary to the static case of the literal).

To return a reference or a pointer to a variable with the automatic storage duration from a function invokes undefined behavior because after exiting the function the referenced or pointed variable will not be alive.
String literals have static storage duration so you may return a pointer or a reference to a string literal.
Also you can return a reference or a pointer to a function local variable that is declared with the storage specifier static.
For example using this function definition
const char* make_const_char_ptr(){
const char res [] = {'a','b','c', '\0' };
return res;
}
this code snippet
const char *s = make_const_char_ptr();
size_t n = strlen( s );
invokes undefined behavior because the pointed string stored in the array res with the automatic storage duration will not be alive.
However if the function will be defined the following way
const char* make_const_char_ptr(){
const static char res [] = {'a','b','c', '\0' };
return res;
}
then this code snippet
const char *s = make_const_char_ptr();
size_t n = strlen( s );
is correct because the array res is still alive. The qualifier const does not matter in this case. That is it does not influence on the life time of the array.
Pay attention to that it is unimportant how the array is initialized: whether using a string literal or an initializer list of characters.
This declaration within the function
const char res [] = { "abc" };
has automatic storage duration if the storage specifier static is not present.
And below there is a demonstrative program where a string literal is returned from a function by reference.
#include <iostream>
#include <type_traits>
decltype( auto ) f()
{
return ( "hello World" );
}
void g( const char *s )
{
std::cout << s << '\n';
}
int main()
{
decltype( auto ) s = f();
g( s );
std::cout << std::extent<std::remove_reference<decltype( s )>::type>::value << '\n';
return 0;
}
The program output is
hello World
12

Related

Why can we return char* from function?

Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?
char* strA()
{
char str[] = "hello word";
return str;
}
char* strB()
{
char* str = "hello word";
return str;
}
int main()
{
cout<<strA()<<endl;
cout<<strB()<<endl;
}
Why does strB() work?
A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)
When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.
Why doesn't strA() work?
However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.
When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).
String literals exist for the life of the program.
String literals have static storage duration, and thus exist in memory for the life of the program.
That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.
On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.
BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)
case 1:
#include <stdio.h>
char *strA() {
char str[] = "hello world";
return str;
}
int main(int argc, char **argv) {
puts(strA());
return 0;
}
The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):
You can kinda cheat this with a continuation:
#include <stdio.h>
void strA(void (*continuation)(char *)) {
char str[] = "hello world";
continuation(str);
}
void myContinuation(char *arg) {
puts(arg);
}
int main(int argc, char **argv) {
strA(myContinuation);
return 0;
}
case 2:
If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.
#include <stdio.h>
char *strB() {
char *str = "hello world";
return str;
}
int main(int argc, char **argv) {
char *myStr;
// comment the line below and replace it with
// result of &myStr[0], in my case, result of &myStr[0] is 4231168
printf("is your string: %s.\n", (char *)4231168);
myStr = strB();
printf("str is at: %lld\n", &myStr[0]);
return 0;
}
You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.
#include <stdio.h>
typedef struct {
char data[81];
} MY_STRING;
MY_STRING strC() {
MY_STRING str = {"what year is this?"};
return str;
}
int main(int argc, char **argv) {
puts(strC().data);
printf("size of strC's return: %d.\n", sizeof(strC()));
return 0;
}
tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).

c++ change array by index inside function

void changeArray(char* str1) {
str1[0] = 'f';
}
int main() {
char* msg1 = "andrew";
changeArray(msg1);
cout << msg1 << endl;
return 0;
}
Hi guys,i dont understand why i'm getting segmentation fault. pointers cannot be accessed by index inside functions? (C++)
You're trying to modify string literal, which leads to undefined behavior.
Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:
const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior
And, char* msg1 = "andrew"; is not allowed since C++11,
In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.
You can construct and pass a char array instead.
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
E.g.
int main() {
char msg1[] = "andrew";
changeArray(msg1);
cout << msg1 << endl;
return 0;
}
In int main() you declared msg1 as a pointer to a char, not as an array of chars. Do this: char msg1[] = "andrew";.

Why can't I make an int array using 'const int*' same as 'const char*'?

Why can I create a string or array of chars in this way:
#include <iostream>
int main() {
const char *string = "Hello, World!";
std::cout << string[1] << std::endl;
}
? and it outputs the second element correctly, while I can't make an array of integer type without the array's subscript notation [ ]? What's the difference between the char's one and this one: const int* intArray={3,54,12,53};.
The "why" is: "Because string literals are special". The string literal is stored in the binary, as a constant part of the program itself, and const char *string = "Hello, World!"; is just treating the literal as an anonymous array stored elsewhere which it then stores a pointer to in string.
There is no equivalent special behavior for other types, but you can get the same basic solution by making a named static constant and using that to initialize the pointer, e.g.
int main() {
static const int intstatic[] = {3,54,12,53};
const int *intptr = intstatic;
std::cout << intptr[1] << std::endl;
}
The effect of the static const array is to allocate the same constant space the string literal would use (though unlike string literals, it's less likely that the compiler will identify duplicate arrays and coalesce the storage), but as a named variable rather than an anonymous one. The string case could be made explicit in the same way:
int main() {
static const char hellostatic[] = "Hello, World!";
const char *string = hellostatic;
std::cout << string[1] << std::endl;
}
but using the literal directly makes things a little cleaner.
You almost can. There are a couple of things at work.
{1,2,3} and "abc" are not the same thing. In fact, if you wanted to draw a comparison, "abc" should rather be compared to {'a', 'b', 'c', '\0'}. Both of them are valid array initializers:
char foo[] = "abc";
char bar[] = {'a', 'b', 'c', '\0'};
However, only "abc" is also a valid expression to initialize a pointer in C++.
In C (and as an extension in some C++ compilers, including Clang and GCC), you can cast compound literals to an array type, like this:
static const int* array = (const int[]){1, 2, 3};
However, this is almost never correct. It works at the global scope and as a function argument, but if you try to initialize a variable of automatic storage with it (i.e. a variable within a function), you'll get a pointer to a location that is about to expire, so you won't be able to use it for anything useful.
Such a feature exists in C and is named compound literal.
For example
#include <stdio.h>
int main(void)
{
const int *intArray = ( int[] ){ 3, 54, 12, 53 };
printf( "%d\n", intArray[1] );
return 0;
}
However C++ does not support this feature from C.
There is a difference compared with string literals. String literals have static storage duration independent on where they appear while compound literals have either static storage duration or automatic storage duration dependent on where they are appear.
In C++ something that is close to this feature is std::initializer_list . For example
#include <iostream>
#include <initializer_list>
int main()
{
const auto &myArray = { 3, 54, 12, 53 };
std::cout << myArray.begin()[1] << std::endl;
return 0;
}
The strings litterals come from the C language. Any string declared with double quotes in the code is automatically converted as a const char[].
So this:
const char str[6] = "hello";
Is exactly the same as:
const char str[6] = { 'h', 'e', 'l', 'l', 'o', '\0' };

address of local variable

I have a hard time understanding the difference between these three:
const char * f() {
return "this is a test";
}
const char * g() {
const char * str = "test again";
return str;
}
const double * h() {
const double a = 2.718;
return &a;
}
I get a warning for the h(), as warning: address of local variable ‘a’ returned. Which makes sense, but I do not understand why the compiler (gcc -Wall) is ok with the f() and g() function.
Isn't there a local variable there?
When and how does the pointer returned by f() or g() gets deallocated?
String literals are not stored in the local stack frame. They live in a fixed place in your executable. Contrast:
const char * g() {
const char * p = "test again";
return p;
}
with
const char * g() {
const char a[] = "test again";
return a;
}
In the former, the return value points to a fixed place in your executable. In the latter, the return value points to (a now invalid location in) the stack.
It's string literals.
n3337 2.14.5/8
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow
string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration
const char * g() {
const char * str = "test again";
return str;
}
This doesn't return the address of local variable. The variable is str, and therefore it address should be &str which would be different from str itself:
std::cout << (void*) str << std::endl;
std::cout << (void*) &str << std::endl; //address of str (local variable)
They would print different values!
So a more apt example would be this:
const char ** g() {
const char * str = "test again";
return &str; //difference!
}
Now it returns the address of the local variable. A good compiler may issue warning for this.
Another example would be this:
const char * g() {
const char str[] = "test again"; //difference!
return str; //same as before
}
Now even though you return str which doesn't seem to be the address of the local variable, it may give warning, as in this case, the value of str and &str would be exactly same! Try printing these now:
std::cout << (void*) str << std::endl;
std::cout << (void*) &str << std::endl; //address of str (local variable)
They would print the same value!
The string literals aren't local variables. The string equivalent of the third function is this
const char * f() {
const char str[] = "this is a test";
return str;
}
In the function h, a is a local variable that won't exist after the function returns. You're returning a pointer to that variable, and so dereferencing the pointer outside the function is incorrect, and undefined behavior.
In f and g you're returning literal strings. Literal strings have static storage: they aren't allocated on the stack, and they'll exist beyond the lifetime of the functions.
In the definition of g:
const char *g()
{
const char *str = "test again";
return str;
}
str is a local variable, but it's a pointer to non-local - statically allocated - memory. It's that address that you're returning, not a reference to the local variable.
Consider another definition of g:
const char *g()
{
const char str[] = "test again";
// incorrect: can't use str after the return:
return str;
}
Now g has the same problem as your function h, and when compiling it you should see the same warning about returning the address of a local variable.
Storage allocation for string literals is static, that is why you don't get a warning.
Try this and you will get undefined behavior:
const char* getFoo()
{
std::string foo("hi");
return foo.c_str();
}
Because the string foo made a copy of the literal string.
These strings are physically and permanently placed inside your data memeory, so their addresses are permanent. The automatic variable is on the stack, so it will disappear the moment you return from the call.

How does a function detect string pointer vs string literal argument?

I have encountered a function, such that it can differentiate between being called as
foo("bar");
vs
const char *bob = "bar";
foo(bob);
Possibilities I have thought of are:
Address of string: both arguments sat in .rdata section of the image. If I do both calls in the same program, both calls receive the same string address.
RTTI: no idea how RTTI can be used to detect such differences.
The only working example I could conjure up is:
void foo(char *msg)
{
printf("string literal");
}
void foo(const char *&msg)
{
printf("string pointer");
}
foo("bar"); // "string literal"
const char *soap = "bar";
foo(soap); // "string pointer"
I do not have access to the function's code, and the declarations in the header file only revealed one function declaration.
Here's another way to distinguish between a string literal and a pointer, based on the fact that string literals have array type, not pointer type:
#include <iostream>
void foo(char *msg)
{
std::cout << "non-const char*\n";
}
void foo(const char *&msg) // & needed, else this is preferred to the
// template function for a string literal
{
std::cout << "const char*\n";
}
template <int N>
void foo(const char (&msg)[N])
{
std::cout << "const char array reference ["<< N << "]\n";
}
int main() {
foo("bar"); // const char array reference [4]
}
But note that all of them (including your original function) can be "fooled" by passing something that isn't a string literal:
const char *soap = 0;
foo(soap);
char *b = 0;
foo(b);
const char a[4] = {};
foo(a);
There is no type in C++ which is unique to string literals. So, you can use the type to tell the difference between an array and a pointer, but not to tell the difference between a string literal and another array. RTTI is no use, because RTTI exists only for classes with at least one virtual member function. Anything else is implementation-dependent: there is no guarantee in the standard that string literals will occupy any particular region of memory, or that the same string literal used twice in a program (or even in a compilation unit) will have the same address. In terms of storage location, anything that an implementation can do with string literals, it is permitted also to do with my array a.
The function foo() in theory could use a macro to determine if the argument was a literal or not.
#define foo(X) (*#X == '"'
? foo_string_literal(X)
: foo_not_string_literal(X))
And what happens if you call it as:
const char bob[] = "bar";
foo(bob);
It's probably using some sort of distinction like that to make the determination.
EDIT: If there's only one function declaration in the header I can't conceive of any portable way the library could make that distinction.