Creation of char array causes memory in static? - c++

for the following code I was wondering if there is just an array created in the stack or if there is also an array created in the static. I'm just kind of confused with array creation from a string.
char str[] = "White";
I'm assuming this creates a pointer in the stack called str which points to an array with the following contents, "White\0", in the static memory. Is this correct as an assumption?

Nope.
"White" is an array of char[6] in static memory somewhere. (or magic land, it's not specified, and is completely irrelevant). Note that it may or may not be the same static array as another "White" elsewhere in the code.
char str[] = "White"; creates a new local char[6] array on the stack, named str, and copies the characters from the static array to the local array. There are no pointers involved whatsoever.
Note that this is just about the only case where an array can be copied. In most situations, arrays will not copy like this.
If you want a pointer to the magic static array, simply use const char* str = "White";
Phonetagger notes that if the line of code is not in a function, then str is not on the stack, but also in the magic static memory, but the copy still (theoretically at least) happens just before execution begins of code in that translation unit.

Wrong. What you described is what happens when you write:
const char * str = "White";
Instead,
char str[] = "White";
creates an array on the stack (large enough to hold that string), and initializes it with that text. "Regular" string literals and the initialization syntax for char arrays are unrelated stuff.
(as for the implementation, often the compiler emits code that looks like
char str[SIZE_OF_THE_STRING];
strcpy(str, "White");
but that's an implementation-specific detail)

Related

char* Space Allocation

My understanding is that in C and C++, creating a character array by calling:
char *s = "hello";
actually creates two objects: a read-only character array that is created in static space, meaning that it lives for the entire duration of the program, and a pointer to that memory. The pointer is a local variable to its scope then dies.
My question is what happens to the array when the pointer dies? If I execute the code above inside a function, does this mean I have a memory leak after I exit the function?
it lives for the entire duration of the program
Exactly, formally it has static storage duration.
what happens to the array when the pointer dies?
Nothing.
If I execute the code above inside a function, does this mean I have a memory leak after I exit the function?
No, because of (1). (The array is only "freed" when the program exits.)
No, there is no leak.
The literal string is stored in the program's data section, which is typically loaded into a read-only memory page. All equivalent string literals will typically point to the same memory location -- it's a singleton, of sorts.
char const *a = "hello";
char const *b = "hello";
printf("%p %p\n", a, b);
This should display identical values for the two pointers, and successive calls to the same function should print the same values too.
(Note that you should declare such variables as char const * -- pointer to constant character -- since the data is shared. Modifying a string literal via a pointer is undefined behavior. At best you will crash your program if the memory page is read-only, and at worst you will change the value of every occurrence of that string literal in the entire program.)
const char* s = "Hello"; is part of the code (program) - hence a constant never altered (unless you have some nasty mechanism altering code at runtime)
My question is what happens to the array when the pointer dies? If I
execute the code above inside a function, does this mean I have a
memory leak after I exit the function?
No there will be no memory leak and nothing happens to the array when the pointer dies.
A memory leak could be possible only with dynamic allocation, via malloc(). When you're malloc() something, you have to free() it later. If you don't, there will be a memory leak.
In your case, it's a "static allocation": the allocation and free of this memory space will be freed automatically and you don't have to handle that.
does this mean I have a memory leak after I exit the function?
No, there is no memory leak, string literals have static duration and will be freed when the program is done. Quote from the C++ draft standard section 2.14.5 String literals subsection 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration
Section 3.7.1 Static storage duration says:
[...] The storage for these entities shall last for the duration of the program
Note in C++, this line:
char *s = "hello";
uses a deprecated conversion see C++ warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings] for more details.
The correct way would be as follows:
const char *s = "hello";
you only have to free if you use malloc or new
EDIT:
char* string = "a string"; memory allocation is static, and not good practice (if it will be constant the declaration should be a const char*)
because this is in the stack when the function ends it should be destroyed along with the rest of the local variables and arguments.
you need to use specific malloc/free and new/delete when you allocate the memory for your variable like:
char *string = new char[64]; --> delete string;
char *string = malloc(sizeof(char) * 64); --> free(string); //this is not best practice unless you have to use C

Understanding C-strings & string literals in C++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.
The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.
char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".
In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)
char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.

When to use const char * and when to use const char []

I know they are different, I know how they are different and I read all questions I could find regarding char* vs char[]
But all those answers never tell when they should be used.
So my question is:
When do you use
const char *text = "text";
and when do you use
const char text[] = "text";
Is there any guideline or rule?
As an example, which one is better:
void withPointer()
{
const char *sz = "hello";
std::cout << sz << std::endl;
}
void withArray()
{
const char sz[] = "hello";
std::cout << sz << std::endl;
}
(I know std::string is also an option but I specifically want to know about char pointer/array)
Both are distinctly different, For a start:
The First creates a pointer.
The second creates an array.
Read on for more detailed explanation:
The Array version:
char text[] = "text";
Creates an array that is large enough to hold the string literal "text", including its NULL terminator. The array text is initialized with the string literal "text".The array can be modified at a later time. Also, the array's size is known even at compile time, so sizeof operator can be used to determine its size.
The pointer version:
char *text = "text";
Creates a pointer to point to a string literal "text". This is faster than the array version, but string pointed by the pointer should not be changed, because it is located in an read only implementation defined memory. Modifying such an string literal results in Undefined Behavior.
In fact C++03 deprecates use of string literal without the const keyword. So the declaration should be:
const char*text = "text";
Also,you need to use the strlen() function, and not sizeof to find size of the string since the sizeof operator will just give you the size of the pointer variable.
Which version is better?
Depends on the Usage.
If you do not need to make any changes to the string, use the pointer version.
If you intend to change the data, use the array version.
EDIT: It was just brought to my notice(in comments) that the OP seeks difference between:
const char text[] and const char* text
Well the above differing points still apply except the one regarding modifying the string literal. With the const qualifier the array test is now an array containing elements of the type const char which implies they cannot be modified.
Given that, I would choose the array version over the pointer version because the pointer can be(by mistake)easily reseated to another pointer and the string could be modified through that another pointer resulting in an UB.
Probably the biggest difference is that you cannot use the sizeof operator with the pointer to get the size of the buffer begin pointed to, where-as with the const char[] version you can use sizeof on the array variable to get the memory footprint size of the array in bytes. So it really depends on what you're wanting to-do with the pointer or buffer, and how you want to use it.
For instance, doing:
void withPointer()
{
const char *sz = "hello";
std::cout << sizeof(sz) << std::endl;
}
void withArray()
{
const char sz[] = "hello";
std::cout << sizeof(sz) << std::endl;
}
will give you very different answers.
In general to answer these types of questions, use the one that's most explicit.
In this case, const char[] wins because it contains more detailed information about the data within -- namely, the size of the buffer.
Just a note:
I'd make it static const char sz[] = "hello";. Declaring as such has the nice advantage of making changes to that constant string crash the program by writing to read-only memory. Without static, casting away constness and then changing the content may go unnoticed.
Also, the static lets the array simply lie in the constant data section instead of being created on the stack and copied from the constant data section each time the function is called.
If you use an array, then the data is initialized at runtime. If you use the pointer, the run-time overhead is (probably) less because only the pointer needs to be initialized. (If the data is smaller than the size of a pointer, then the run-time initialization of the data is less than the initialization of the pointer.) So, if you have enough data that it matters and you care about the run-time cost of the initialization, you should use a pointer. You should almost never care about those details.
I was helped a lot by Ulrich Drepper's blog-entries a couple of years ago:
so close but no cigar and more array fun
The gist of the blog is that const char[] should be preferred but only as global or static variable.
Using a pointer const char* has as disadvantages:
An additional variable
pointer is writable
an extra indirection
accessing the string through the pointer require 2 memory-loads
Just to mention one minor point that the expression:
const char chararr[4] = {'t', 'e', 'x', 't'};
Is another way to initialize the array with exactly 4 char.

String initializer and read only section

Suppose that I have an array(local to a function) and a pointer
char a[]="aesdf" and char *b="asdf"
My question is whether in the former case the string literal "aesdf" is stored in read only section and then copied on to the local array or is it similar to
char a[]={'a','e','s','d','f','\0'}; ?
I think that in this case the characters are directly created on the stack but in the earlier case (char a[]="aesdf") the characters are copied from the read only section to the local array.
Will `"aesdf" exist for the entire life of the executable?
From the abstract and formal point of view, each string literal is an independent nameless object with static storage duration. This means, that the initialization char a[] = "aesdf" formally creates literal object "aesdf" and then uses it to initialize the independent array a, i.e. it is not equivalent to char *a = "aesdf", where a pointer is made to point to the string literal directly.
However, since string literals are nameless objects, in the char a[] = "aesdf" variant there's no way to access the independent "aesdf" object before or after the initialization. This means that there's no way for you to "detect" whether this object actually existed. The existence (or non-existence) of that object cannot affect the observable behavior of the program. For this reason, the implementation has all the freedom to eliminate the independent "aesdf" object and initialize the a array in any other way that leads to the expected correct result, i.e. as char a[] = { 'a', 'e', 's', 'd', 'f', '\0' } or as char a[] = { 'a', 'e', "sdf" } or as something else.
First:
char a[]="aesdf";
Assuming this is an automatic local variable, it will allocate 6 bytes on the stack and initialize them with the given characters. How it does this (whether by memcpy from a string literal or loading a byte at a time with inline store instructions, or some other way) is completely implementation-defined. Note that initialization must happen every time the variable comes into scope, so if it's not going to change, this is a very wasteful construct to use.
If this is a static/global variable, it will produce a 6-byte char array with a unique address/storage whose initial contents are the given characters, and which is writable.
Next:
char *b="asdf";
This initializes the pointer b to point to a string literal "asdf", which might or might not share storage with other string literals, and which produces undefined behavior if you write to it.
Both a[] ="aesdf" and char a[]={'a','e','s','d','f','\0'} will be stored in function's run time stack and memory will be released when function returns. but for char* b= "asdf" asdf is stored in readonly section and is referred from there.
char a[] = "aesdf";
char a[] = {'a','e','s','d','f','\0'};
These two lines of code have the same effect. A compiler may choose to implement them the same way, or it may choose to implement them differently.
From the standpoint of a programmer writing code in C, it shouldn't really matter. You can use either and be sure that you will end up with a six element array of char initialized to the specified contents.
Will "aesdf" exist for the entire life of the executable?
Semantically, yes. String literals are char arrays that have static storage duration. An object that has static storage duration has a lifetime of the execution of the program: it is initialized before the program starts, and exists until the program terminates.
However, this doesn't matter at all in your program. That string literal is used to initialize array a. Since you do not obtain a pointer to the string literal itself, it doesn't matter what its actual lifetime of that string literal is or how it is actually stored. The compiler may do whatever it sees fit to do, so long as the array a is correctly initialized.

How safe and reliable are C++ String Literals?

So, I'm wanting to get a better grasp on how string literals in C++ work. I'm mostly concerned with situations where you're assigning the address of a string literal to a pointer, and passing it around. For example:
char* advice = "Don't stick your hands in the toaster.";
Now lets say I just pass this string around by copying pointers for the duration of the program. Sure, it's probably not a good idea, but I'm curious what would actually be going on behind the scenes.
For another example, let's say we make a function that returns a string literal:
char* foo()
{
// function does does stuff
return "Yikes!"; // somebody's feeble attempt at an error message
}
Now lets say this function is called very often, and the string literal is only used about half the time it's called:
// situation #1: it's just randomly called without heed to the return value
foo();
// situation #2: the returned string is kept and used for who knows how long
char* retVal = foo();
In the first situation, what's actually happening? Is the string just created but not used, and never deallocated?
In the second situation, is the string going to be maintained as long as the user finds need for it? What happens when it isn't needed anymore... will that memory be freed up then (assuming nothing points to that space anymore)?
Don't get me wrong, I'm not planning on using string literals like this. I'm planning on using a container to keep my strings in check (probably std::string). I'm mostly just wanting to know if these situations could cause problems either for memory management or corrupted data.
String-literals have the type const char[N] (where N is the length + 1) and are statically allocated. You need not worry about memory issues; if a string is used in your program it is all handled for you, and resides somewhere in program memory (usually read-only).
That is, these are "the same":
static const char str[] = "a string";
"a string"
When you point to a string literal, you are pointing to the first character at the array. In fact, because the type is const char[], it's only safe to point to it via const char*. The conversion from string literal to char* is deprecated, and unsafe.
// the "same"
static const char str[] = "a string";
const char* strPtr = str; // decays
const char* s1 = "a string";
char* s2 = "a string"; // allowed, implicit const_cast
*s1 = 'A'; // not allowed, it's const
*s2 = 'B'; // allowed, it's not const (but leads to undefined behavior)
Firstly, declare the return value of foo as const, because string literals are constants that can't be changed without causing the dreaded "undefined behaviour". This will then force any pointers which use the return value of foo to also be declared as const, and potentially limiting the damage which can be (usually unintentionally) done. String literals are stored in the 'text' area of a binary executable - they're not created as such at run time.