Memory Allocation char* and char[] - c++

What is the difference between these two in terms of memory allocation.
char *p1 = "hello";
char p2[] = "hello";

The first one creates a pointer variable (four or eight bytes of storage depending on the platform) and stores a location of a string literal there, The second one creates an array of six characters (including zero string terminator byte) and copies the literal there.
You should get a compiler warning on the first line since the literal is const.

The first one is a non-const pointer to const (read-only) data, the second is a non-const array.

Since the first one is a non-const pointer to const (read-only) data, the second is a non-const array, as Paul said, you can write:
p2[2]='A'; //changing third character - okay
But you cannot write:
p1[2]='A';//changing third character - runtime error!

Related

How does the std::string constructor handle char[] of fixed size?

How does the string constructor handle char[] of a fixed size when the actual sequence of characters in that char[] could be smaller than the maximum size?
char foo[64];//can hold up to 64
char* bar = "0123456789"; //Much less than 64 chars, terminated with '\0'
strcpy(foo,bar); //Copy shorter into longer
std::string banz(foo);//Make a large string
In this example will the size of the banz objects string be based on the original char* length or the char[] that it is copied into?
First you have to remember (or know) that char strings in C++ are really called null-terminated byte strings. That null-terminated bit is a special character ('\0') that tells the end of the string.
The second thing you have to remember (or know) is that arrays naturally decays to pointers to the arrays first element. In the case of foo from your example, when you use foo the compiler really does &foo[0].
Finally, if we look at e.g. this std::string constructor reference you will see that there is an overload (number 5) that accepts a const CharT* (with CharT being a char for normal char strings).
Putting it all together, with
std::string banz(foo);
you pass a pointer to the first character of foo, and the std::string constructor will treat it as a null-terminated byte string. And from finding the null-terminator it knows the length of the string. The actual size of the array is irrelevant and not used.
If you want to set the size of the std::string object, you need to explicitly do it by passing a length argument (variant 4 in the constructor reference):
std::string banz(foo, sizeof foo);
This will ignore the null-terminator and set the length of banz to the size of the array. Note that the null-terminator will still be stored in the string, so passing a pointer (as retrieved by e.g. the c_str function) to a function which expects a null-terminated string, then the string will seem short. Also note that the data after the null-terminator will be uninitialized and have indeterminate contents. You must initialize that data before you use it, otherwise you will have undefined behavior (and in C++ even reading indeterminate data is UB).
As mentioned in a comment from MSalters, the UB from reading uninitialized and indeterminate data also goes for the construction of the banz object using an explicit size. It will typically work and not lead to any problems, but it does break the rules set out in the C++ specification.
Fixing it is easy though:
char foo[64] = { 0 };//can hold up to 64
The above will initialize all of the array to zero. The following strcpy call will not touch the data of the array beyond the terminator, and as such the remainder of the array will be initialized.
The constructor called is one that takes a const char* as an argument. That constructor attempts to copy the character data pointed to by that pointer, until the first NUL terminator is reached. If there is no such NUL terminator then the behaviour of the constructor is undefined.
Your foo type is converted to a char* by pointer decay, then an implicit conversion to const char* occurs at the calling site.
Perhaps there could have been a templatised std::string constructor taking a const char[N] as an argument, which would have allowed the insertion of more than one NUL character (the std::string class after all does support that), but it was not introduced and to do so now would be a breaking change; using
std::string foo{std::begin(foo), std::end(foo)};
will also copy the entire array foo.

Pointer stores strings?

I recently started learning C++ and came across with the concept of a pointer (which is a variable that stores the address of another variable). However I also came across with char* str = "Hello" and I became confused. So it looks like the of "Hello" is being assigned to the pointer str (which I thought could only store addresses). So can a pointer also store a string?
For future reference you should only use the language tag of the language you're using. C and C++ are two very different languages, and in this case there is a difference.
First the common part: Literal strings like "Hello" are stored by the compiler as arrays. In the case of "Hello" it's an array of six char elements, including the string null terminator.
Now for the part that's different: In C++ such string literal arrays are constant, they can not be modified. Therefore it's an error to have a non-const pointer to such an array. In C the string literal arrays are not constant, but they are still not modifiable, they are in essence read-only. But it's still allowed to have a non-const pointer to them.
And finally for your question: As with all arrays, using them make them decay into a pointer to their first element, and that is basically what happens here. You make your variable str point to the first element in the string literal array.
A little simplified it can be seen like this (in C):
char anonymous_literal_array[] = "Hello";
...
char *str = &anonymous_literal_array[0]; // Make str point to first element in array
The pointer will store the address of the start of the string, therefore the first character. In this case "Hello" is an immutable literal. (Check the difference: Immutable vs constant)
More correctly, a pointer cannot store a string as well as anything, a pointer can point to an address containing data of the pointer's type.
Since char* is a pointer to char, it points exactly to a char.
In this example, the pointer is the address of the first character in the string. This is inherited from C where a "string" is an array of characters terminated by a NULL character. In C and C++, arrays and pointers are closely related. When you do your own memory management, you often create an array with a pointer to the first element of the array. That is exactly what is going on here with the array holding the string literal "Hello".
in c/c++ strings are stored as array of characters. Literal string like "Hello" actually return start of temporary read only character array which hold this string.
A char* variable is a pointer to a single byte(char) in memory. The most common way of handling strings is called a c-style string where the char* is a pointer to the first character in the string and is followed by the rest of the characters in memory. The c-string will always end in a '\0' or null character to signify that you've reached the end of the string ( 'H', 'e', 'l', 'l', 'o', '\0' ).
The "Hello" is called a string literal. What happens in memory is at the very beginning of your program, before anything else is run, the program allocates and sets the memory for the "Hello" string where the other static constants are located. When you write char* str = "Hello"; The compiler knows you're using a string literal and sets str to the location of the first character of that string literal.
But be careful though. All string literals are stored in a portion of memory that you cannot write to. If you try to modify that string, you might get memory errors. To make sure this doesn't happen, when dealing with c-strings, you should always write const char* str = "Hello"; That way the compiler will never allow you to modify that memory.
To have a modifiable string, you will need to allocate and manage the memory yourself. I would suggest using std::string, or have some fun and make your own string class that handles the memory.

Trouble with C++ Pointers

If you have a look at the code
#include <iostream>
enum Type
{
INT,
FLOAT,
STRING,
};
void Print(void *pValue, Type eType)
{
using namespace std;
switch (eType)
{
case INT:
cout << *static_cast<int*>(pValue) << endl;
break;
case FLOAT:
cout << *static_cast<float*>(pValue) << endl;
break;
case STRING:
cout << static_cast<char*>(pValue) << endl;
break;
}
}
int main()
{
int nValue = 5;
float fValue = 7.5;
char *szValue = "Mollie";
Print(&nValue, INT);
Print(&fValue, FLOAT);
Print(szValue, STRING);
return 0;
}
The line char *szValue = "Mollie";is what confuses me. From what I have been learning is that a pointer is a variable that holds the address of another variable. My issues with that line obviously is that
How come this code accepts a string into a char? We have not specified that this char is an array. Then how come?
How come we are assigning a STRING to a pointer? I thought we could only assign other addresses to them. Where is that pointer getting its address from? Where is it storing the value?
I am still new to C++ but any help would be appreciated.
Update: from what I have understood from the answers below is that when we say "it assigns each letter to the memory addresses in the vicinity of szValue". The rest of the Chars in the string are stored in +1 addresses. How does C++ know how many char / addresses are in the original string? Since szvalue only contains the address of the first char. Not the others right?
Source: LearnCPP - Void Pointers
In a nutshell, in C++ pointers and arrays are almost the same thing. For compiler there is no difference when you define *szValue or szValue[].
A string literal is stored by compiler in memory and first symbol address is actually the value of the string. When you assign a string to char * you might get different use of the that block of memory (i.e. just pass this address into some function or iterate over symbols)
Mind examining more pages of the online tutorial you found pointers-arrays-and-pointer-arithmetic. However I consider the best for learning C++ is reading Bjarne Stroustrup
EDIT: (credits to seand)
Pointers and arrays are almost, but not exactly the same. Take, char *x, and char y[5]. 'y' is like a pointer that points to something fixed, but 'x' may be reassigned. sizeof y yields 5. When passing x or y into functions the arrayness of 'y' disappears
How come we are assigning a STRING to a pointer? I thought we could only assign other addresses to them. Where is that pointer getting its address from? Where is it storing the value?
So, two things.
Except when it is the operand of the sizeof or the unary & operators (along with a couple of others), or is a string literal being used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer toT", and the value of the expression will be the address of the first element of the array.
A string literal is an expression of type "N+1 element array of const char" (plain char in C) where N is the number of characters in the string.
Putting that together,
char *szValue = "Mollie";
the string literal "Mollie" is an expression of type "7-element array of const char". Since it isn't the operand of the sizeof or unary & operators, and since it isn't being used to initialize an array, it is converted to an expression of type "pointer of const char", and its value is the address of the first element.
This is an important point - arrays are not pointers. Arrays do not store a pointer value anywhere. Under certain circumstances the compiler will substitute a pointer expression for an array, but otherwise they're two completely different animals.
The previous two answers have covered the gist of the matter. There are strong similarities between pointers and arrays (arrays can be considered pointers to a memory location themselves). For example, when an array name is used as an argument for a function the memory address of the first element is passed to the function as opposed to the value at that location (which would be the case for an ordinary variable).
The code above assigns a string literal "Mollie" to the char pointer szValue. Therefore starting from the "M" (which is treated as a char not a string) in "Mollie" it assigns each letter to the memory addresses in the vicinity of szValue. Therefore the pointer variable szValue would point to the first element of the string, equivalent to saying szValue[0] (if szValue were declared as a char array).
Hope this helps.
Edit:
Just to be more specific szValue points to the memory address of the first element in the string "Mollie" which is equivalent to using &szValue[0].
The line char* s = "Mollie", in C and C++, means this:
s is a pointer to a character.
It is initialized pointing to the first character in a static character array containing the characters M, o, ... e, and a null character \0.
A pointer is a pointer to a piece of memory containing a type.
That piece of memory is not necessarily a variable.
How come this code accepts a string into a char?
Your code isn't actually doing that, instead it's assigning the address of the first character element to the value of the pointer. Viz, the pointer points to the beginning of the string, incrementing the pointer by 1 will now refer to the next character in the array, and so on.
We have not specified that this char is an array. Then how come?
You have char*, not char, that's the difference. Also, semantically char[] and char* are the same.
Where is that pointer getting its address from?
The compiler sticks the literal string "Mollie" in a read-only portion of the program's memory that is loaded when the program is executed by the operating system. The value of szValue is equal to the address of that string literal. You'll find it's read-only and if you attempt to modify it you'll get a segfault (on *nix) or an Access Violation on Windows.
char *foo declares a char pointer, which in C is how arrays are represented.
So:
char *foo = "ABCD";
foo[1] == 'B'
which also means:
*(foo + 1) == 'B'
The actual type of "Mollie" is const char*. You can think about it as about an array of chars. Thats why you can write something like this:
const char c = szValue[4];
Here, szValue is just a pointer to the first character of the string. Hence, szValue + 1 will be a pointer to the second character and so on.
You should know that when you use char *szValue = "Mollie";, it means you assign the address of constant string "Mollie" to szValue and you can't change value through szValue, because "MOllie" is a constant value stored in constant area.

Why can you put multiple characters in C++ char*

I can't figure out how this works.
// This doesn't work (obviously)
char a;
a = "aaa";
// This works
char* a;
a = "aaa";
How come this works ?
Since char type stores only one character or 1 byte number, how can you store more characters in it when you access it through a pointer ?
You're not putting characters into the char*. You're creating an array of characters in a part of memory determined by your compiler, and pointing the char* at the first character of that array.
The array is actually const, so you shouldn't be able to assign it to a non-const pointer. But due to historical reasons, you still can in many C++ implementations. However, it was officially made illegal in C++11.
The second one is a pointer to a string of chars, not a single char. Tutorial.

Understanding C-strings & string literals in C++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.
The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.
char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".
In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)
char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.