String initializer and read only section

String initializer and read only section - c++

Suppose that I have an array(local to a function) and a pointer
char a[]="aesdf" and char *b="asdf"
My question is whether in the former case the string literal "aesdf" is stored in read only section and then copied on to the local array or is it similar to
char a[]={'a','e','s','d','f','\0'}; ?
I think that in this case the characters are directly created on the stack but in the earlier case (char a[]="aesdf") the characters are copied from the read only section to the local array.
Will `"aesdf" exist for the entire life of the executable?

From the abstract and formal point of view, each string literal is an independent nameless object with static storage duration. This means, that the initialization char a[] = "aesdf" formally creates literal object "aesdf" and then uses it to initialize the independent array a, i.e. it is not equivalent to char *a = "aesdf", where a pointer is made to point to the string literal directly.
However, since string literals are nameless objects, in the char a[] = "aesdf" variant there's no way to access the independent "aesdf" object before or after the initialization. This means that there's no way for you to "detect" whether this object actually existed. The existence (or non-existence) of that object cannot affect the observable behavior of the program. For this reason, the implementation has all the freedom to eliminate the independent "aesdf" object and initialize the a array in any other way that leads to the expected correct result, i.e. as char a[] = { 'a', 'e', 's', 'd', 'f', '\0' } or as char a[] = { 'a', 'e', "sdf" } or as something else.

First:
char a[]="aesdf";
Assuming this is an automatic local variable, it will allocate 6 bytes on the stack and initialize them with the given characters. How it does this (whether by memcpy from a string literal or loading a byte at a time with inline store instructions, or some other way) is completely implementation-defined. Note that initialization must happen every time the variable comes into scope, so if it's not going to change, this is a very wasteful construct to use.
If this is a static/global variable, it will produce a 6-byte char array with a unique address/storage whose initial contents are the given characters, and which is writable.
Next:
char *b="asdf";
This initializes the pointer b to point to a string literal "asdf", which might or might not share storage with other string literals, and which produces undefined behavior if you write to it.

Both a[] ="aesdf" and char a[]={'a','e','s','d','f','\0'} will be stored in function's run time stack and memory will be released when function returns. but for char* b= "asdf" asdf is stored in readonly section and is referred from there.

char a[] = "aesdf";
char a[] = {'a','e','s','d','f','\0'};
These two lines of code have the same effect. A compiler may choose to implement them the same way, or it may choose to implement them differently.
From the standpoint of a programmer writing code in C, it shouldn't really matter. You can use either and be sure that you will end up with a six element array of char initialized to the specified contents.
Will "aesdf" exist for the entire life of the executable?
Semantically, yes. String literals are char arrays that have static storage duration. An object that has static storage duration has a lifetime of the execution of the program: it is initialized before the program starts, and exists until the program terminates.
However, this doesn't matter at all in your program. That string literal is used to initialize array a. Since you do not obtain a pointer to the string literal itself, it doesn't matter what its actual lifetime of that string literal is or how it is actually stored. The compiler may do whatever it sees fit to do, so long as the array a is correctly initialized.

Related

Pointer stores strings?

I recently started learning C++ and came across with the concept of a pointer (which is a variable that stores the address of another variable). However I also came across with char* str = "Hello" and I became confused. So it looks like the of "Hello" is being assigned to the pointer str (which I thought could only store addresses). So can a pointer also store a string?

For future reference you should only use the language tag of the language you're using. C and C++ are two very different languages, and in this case there is a difference.
First the common part: Literal strings like "Hello" are stored by the compiler as arrays. In the case of "Hello" it's an array of six char elements, including the string null terminator.
Now for the part that's different: In C++ such string literal arrays are constant, they can not be modified. Therefore it's an error to have a non-const pointer to such an array. In C the string literal arrays are not constant, but they are still not modifiable, they are in essence read-only. But it's still allowed to have a non-const pointer to them.
And finally for your question: As with all arrays, using them make them decay into a pointer to their first element, and that is basically what happens here. You make your variable str point to the first element in the string literal array.
A little simplified it can be seen like this (in C):
char anonymous_literal_array[] = "Hello";
...
char *str = &anonymous_literal_array[0]; // Make str point to first element in array

The pointer will store the address of the start of the string, therefore the first character. In this case "Hello" is an immutable literal. (Check the difference: Immutable vs constant)
More correctly, a pointer cannot store a string as well as anything, a pointer can point to an address containing data of the pointer's type.
Since char* is a pointer to char, it points exactly to a char.

In this example, the pointer is the address of the first character in the string. This is inherited from C where a "string" is an array of characters terminated by a NULL character. In C and C++, arrays and pointers are closely related. When you do your own memory management, you often create an array with a pointer to the first element of the array. That is exactly what is going on here with the array holding the string literal "Hello".

in c/c++ strings are stored as array of characters. Literal string like "Hello" actually return start of temporary read only character array which hold this string.

A char* variable is a pointer to a single byte(char) in memory. The most common way of handling strings is called a c-style string where the char* is a pointer to the first character in the string and is followed by the rest of the characters in memory. The c-string will always end in a '\0' or null character to signify that you've reached the end of the string ( 'H', 'e', 'l', 'l', 'o', '\0' ).
The "Hello" is called a string literal. What happens in memory is at the very beginning of your program, before anything else is run, the program allocates and sets the memory for the "Hello" string where the other static constants are located. When you write char* str = "Hello"; The compiler knows you're using a string literal and sets str to the location of the first character of that string literal.
But be careful though. All string literals are stored in a portion of memory that you cannot write to. If you try to modify that string, you might get memory errors. To make sure this doesn't happen, when dealing with c-strings, you should always write const char* str = "Hello"; That way the compiler will never allow you to modify that memory.
To have a modifiable string, you will need to allocate and manage the memory yourself. I would suggest using std::string, or have some fun and make your own string class that handles the memory.

Pointers in CPP, confusion regarding an example program

Working my way through accelerated c++. There's an example where there are multiple things that I do not understand.
double grade = 88;
static const double numbers[] = { 97,94,90,87,84,80,77,74,70,60,0 };
static const char* const letters[] = { "A+","A","A-","B+","B","B-","C+","C","C-","D","F" };
static const size_t ngrades = sizeof(numbers) / sizeof(*numbers);
for (size_t i = 0;i < ngrades;++i) {
if (grade >= numbers[i]) {
cout << letters[i];
break;
}
}
I don't understand what's going on with static const char* const letters[] = (...). First of all, I always thought a char was a single character delimited by '. Single or more characters delimited by " are for me a string.
The way I've understood pointers is that they're a value that represents the address of an object, although this would be initialized as int* p=&x;. They have the advantage of being able to be used like an iterator (kind of). But I really do not get what is going on here, we declare a pointer letters that gets assigned to it an array of values (not addresses), what does that mean? What would be a reason for doing this?
I know what static is in java, is the meaning similar in CPP? The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope? I've noticed in debug that I seem to skip over the values after having executed it the first time. But that would imply that even after my program is finished running these static values are still saved? That doesn't seem logical to me.

Regarding your first question, a string literal (like e.g. "A+") is an (read-only) array of characters, and as all arrays they can decay to pointers to their first element, i.e. a pointer to char. The variable letters is an array of constant pointers (the pointer in the array can't be changed) to characters that are constant.
For the third questions, what static means is different depending on which scope you declare the variable in. It's a linkage specifier when used in the global scope, and means that the variable (or function) will not be exported from the translation unit. If use for a variable in local scope (i.e. inside a function) then variable will be shared between invocations of the function, i.e. all calls to the function will have the same variable with the the value of it being kept between calls. Declaring a class-member as static means that it's shared between all object instances of the class.

1) Here
static const char* const letters[] = (...)
letters is actually array of const pointers to const characters. Hence the "".
2) Like said, above variable is array of pointers. So each
element in the array holds address to the string literal
defined in the array. So letters[0] holds address of memory
where "A+" is stored.
3) static has various uses in C++. In your case if its declared inside
function, its value is preserved between successive calls to that function.. More details.

Those are not characters but strings, so your understanding is correct. Letters here are not stored as char but in the form of const char*.
we declare a pointer letters that gets assigned to it an array of values (not addresses)
those are not pointer letters but literal strings, "a" is a literal, its type is const char[], which decays to const char* - which means its a poitner.
3.
I know what static is in java, is the meaning similar in CPP
in general yes, but there are differences - like you cant use static inside functions in java while you can in c++, also you can have global static variables in c++.
The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope?
non static variables will be created on stack and default initialized (if no explicit initialization is done) on each function run. static variables on the other hand will be initialized on the first run only.
But that would imply that even after my program is finished running these static values are still saved?
thats not true, after program is done ,they are freed - your process is dead then

static const char* const letters[]
This is an array of pointers to characters. In this case the initializer list sets each pointer to the first character of each of the strings specified in the initializer list.
const ... const
The pointers and the characters the pointers point to are constants.
static
If declared inside a function, similar to a global, but with local scope.
Links:
http://msdn.microsoft.com/en-us/library/s1sb61xd.aspx
http://en.wikipedia.org/wiki/Static_(keyword)

char* Space Allocation

My understanding is that in C and C++, creating a character array by calling:
char *s = "hello";
actually creates two objects: a read-only character array that is created in static space, meaning that it lives for the entire duration of the program, and a pointer to that memory. The pointer is a local variable to its scope then dies.
My question is what happens to the array when the pointer dies? If I execute the code above inside a function, does this mean I have a memory leak after I exit the function?

it lives for the entire duration of the program
Exactly, formally it has static storage duration.
what happens to the array when the pointer dies?
Nothing.
If I execute the code above inside a function, does this mean I have a memory leak after I exit the function?
No, because of (1). (The array is only "freed" when the program exits.)

No, there is no leak.
The literal string is stored in the program's data section, which is typically loaded into a read-only memory page. All equivalent string literals will typically point to the same memory location -- it's a singleton, of sorts.
char const *a = "hello";
char const *b = "hello";
printf("%p %p\n", a, b);
This should display identical values for the two pointers, and successive calls to the same function should print the same values too.
(Note that you should declare such variables as char const * -- pointer to constant character -- since the data is shared. Modifying a string literal via a pointer is undefined behavior. At best you will crash your program if the memory page is read-only, and at worst you will change the value of every occurrence of that string literal in the entire program.)

const char* s = "Hello"; is part of the code (program) - hence a constant never altered (unless you have some nasty mechanism altering code at runtime)

My question is what happens to the array when the pointer dies? If I
execute the code above inside a function, does this mean I have a
memory leak after I exit the function?
No there will be no memory leak and nothing happens to the array when the pointer dies.
A memory leak could be possible only with dynamic allocation, via malloc(). When you're malloc() something, you have to free() it later. If you don't, there will be a memory leak.
In your case, it's a "static allocation": the allocation and free of this memory space will be freed automatically and you don't have to handle that.

does this mean I have a memory leak after I exit the function?
No, there is no memory leak, string literals have static duration and will be freed when the program is done. Quote from the C++ draft standard section 2.14.5 String literals subsection 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration
Section 3.7.1 Static storage duration says:
[...] The storage for these entities shall last for the duration of the program
Note in C++, this line:
char *s = "hello";
uses a deprecated conversion see C++ warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] for more details.
The correct way would be as follows:
const char *s = "hello";

you only have to free if you use malloc or new
EDIT:
char* string = "a string"; memory allocation is static, and not good practice (if it will be constant the declaration should be a const char*)
because this is in the stack when the function ends it should be destroyed along with the rest of the local variables and arguments.
you need to use specific malloc/free and new/delete when you allocate the memory for your variable like:
char *string = new char[64]; --> delete string;
char *string = malloc(sizeof(char) * 64); --> free(string); //this is not best practice unless you have to use C

How can char * name = "Duncan"; be valid if pointers can only hold addresses?

I thought that pointers can only hold addresses to other variables. So how can the following statement that I came across be valid? It's holding a string.
char * name = "Duncan"
Thanks.

It's holding a pointer to a string. That's not the same. name just contains an address of memory which contains the string.

"Duncan" is a null terminated string and as such an array of char ({'D', 'u', 'n', 'c', 'a', 'n', '\0'}). char*name="Duncan"; sets name to the address of the array.
Your statement is OK in C, but in C++ "Duncan" is a const char array, so you should use const char *name = "Duncan".
BTW, if you do not need to change the pointer variable name, it's better to have const char name[] = "Duncan". This only allocates memory for the string. Your sample code allocates memory for the string and for the pointer variable name. (Of course the compiler might optimize away name.)

It's still pointing to a string. The string gets put in memory first, and name points to that. It's compiled into your program, so it may not be obvious.

pointers can only hold addresses to other variables.
This is incorrect: references hold addresses of other variables; pointers can hold addresses of anything, or even nothing in particular (e.g. NULL).
In this case, name holds an address of a memory block of 7 bytes, containing ASCII codes for D,u,n,c,a,n, and \0.

In this particular case, the compiler will store the array with data Duncan\0 somewhere in the object file and the pointer will point there.
So yes, the pointer is only holding an address. The data are somewhere else.
This brings me to saying, writing code like this is not so good. For example, if you change that string through your pointer, you get an undefined behavior.

That's a definition of a char pointer. After the definition, on the right side of "=", you have a constant definition. The constant is stored somewhere in memory and its address is used as first value for "name".
Later on you will be able to assign other value to "name". You are not bound to the first value, in fact "name" is a variable.

Understanding C-strings & string literals in C++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.

The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.

char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".

In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)

char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String initializer and read only section - c++

Both a[] ="aesdf" and char a[]={'a','e','s','d','f','\0'} will be stored in function's run time stack and memory will be released when function returns. but for char* b= "asdf" asdf is stored in readonly section and is referred from there.

Related

Pointer stores strings?

Pointers in CPP, confusion regarding an example program

char* Space Allocation

How can char * name = "Duncan"; be valid if pointers can only hold addresses?

Understanding C-strings & string literals in C++

Categories

Resources