Pointer stores strings? - c++

I recently started learning C++ and came across with the concept of a pointer (which is a variable that stores the address of another variable). However I also came across with char* str = "Hello" and I became confused. So it looks like the of "Hello" is being assigned to the pointer str (which I thought could only store addresses). So can a pointer also store a string?

For future reference you should only use the language tag of the language you're using. C and C++ are two very different languages, and in this case there is a difference.
First the common part: Literal strings like "Hello" are stored by the compiler as arrays. In the case of "Hello" it's an array of six char elements, including the string null terminator.
Now for the part that's different: In C++ such string literal arrays are constant, they can not be modified. Therefore it's an error to have a non-const pointer to such an array. In C the string literal arrays are not constant, but they are still not modifiable, they are in essence read-only. But it's still allowed to have a non-const pointer to them.
And finally for your question: As with all arrays, using them make them decay into a pointer to their first element, and that is basically what happens here. You make your variable str point to the first element in the string literal array.
A little simplified it can be seen like this (in C):
char anonymous_literal_array[] = "Hello";
...
char *str = &anonymous_literal_array[0]; // Make str point to first element in array

The pointer will store the address of the start of the string, therefore the first character. In this case "Hello" is an immutable literal. (Check the difference: Immutable vs constant)
More correctly, a pointer cannot store a string as well as anything, a pointer can point to an address containing data of the pointer's type.
Since char* is a pointer to char, it points exactly to a char.

In this example, the pointer is the address of the first character in the string. This is inherited from C where a "string" is an array of characters terminated by a NULL character. In C and C++, arrays and pointers are closely related. When you do your own memory management, you often create an array with a pointer to the first element of the array. That is exactly what is going on here with the array holding the string literal "Hello".

in c/c++ strings are stored as array of characters. Literal string like "Hello" actually return start of temporary read only character array which hold this string.

A char* variable is a pointer to a single byte(char) in memory. The most common way of handling strings is called a c-style string where the char* is a pointer to the first character in the string and is followed by the rest of the characters in memory. The c-string will always end in a '\0' or null character to signify that you've reached the end of the string ( 'H', 'e', 'l', 'l', 'o', '\0' ).
The "Hello" is called a string literal. What happens in memory is at the very beginning of your program, before anything else is run, the program allocates and sets the memory for the "Hello" string where the other static constants are located. When you write char* str = "Hello"; The compiler knows you're using a string literal and sets str to the location of the first character of that string literal.
But be careful though. All string literals are stored in a portion of memory that you cannot write to. If you try to modify that string, you might get memory errors. To make sure this doesn't happen, when dealing with c-strings, you should always write const char* str = "Hello"; That way the compiler will never allow you to modify that memory.
To have a modifiable string, you will need to allocate and manage the memory yourself. I would suggest using std::string, or have some fun and make your own string class that handles the memory.

Related

Trouble with C++ Pointers

If you have a look at the code
#include <iostream>
enum Type
{
INT,
FLOAT,
STRING,
};
void Print(void *pValue, Type eType)
{
using namespace std;
switch (eType)
{
case INT:
cout << *static_cast<int*>(pValue) << endl;
break;
case FLOAT:
cout << *static_cast<float*>(pValue) << endl;
break;
case STRING:
cout << static_cast<char*>(pValue) << endl;
break;
}
}
int main()
{
int nValue = 5;
float fValue = 7.5;
char *szValue = "Mollie";
Print(&nValue, INT);
Print(&fValue, FLOAT);
Print(szValue, STRING);
return 0;
}
The line char *szValue = "Mollie";is what confuses me. From what I have been learning is that a pointer is a variable that holds the address of another variable. My issues with that line obviously is that
How come this code accepts a string into a char? We have not specified that this char is an array. Then how come?
How come we are assigning a STRING to a pointer? I thought we could only assign other addresses to them. Where is that pointer getting its address from? Where is it storing the value?
I am still new to C++ but any help would be appreciated.
Update: from what I have understood from the answers below is that when we say "it assigns each letter to the memory addresses in the vicinity of szValue". The rest of the Chars in the string are stored in +1 addresses. How does C++ know how many char / addresses are in the original string? Since szvalue only contains the address of the first char. Not the others right?
Source: LearnCPP - Void Pointers
In a nutshell, in C++ pointers and arrays are almost the same thing. For compiler there is no difference when you define *szValue or szValue[].
A string literal is stored by compiler in memory and first symbol address is actually the value of the string. When you assign a string to char * you might get different use of the that block of memory (i.e. just pass this address into some function or iterate over symbols)
Mind examining more pages of the online tutorial you found pointers-arrays-and-pointer-arithmetic. However I consider the best for learning C++ is reading Bjarne Stroustrup
EDIT: (credits to seand)
Pointers and arrays are almost, but not exactly the same. Take, char *x, and char y[5]. 'y' is like a pointer that points to something fixed, but 'x' may be reassigned. sizeof y yields 5. When passing x or y into functions the arrayness of 'y' disappears
How come we are assigning a STRING to a pointer? I thought we could only assign other addresses to them. Where is that pointer getting its address from? Where is it storing the value?
So, two things.
Except when it is the operand of the sizeof or the unary & operators (along with a couple of others), or is a string literal being used to initialize another array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer toT", and the value of the expression will be the address of the first element of the array.
A string literal is an expression of type "N+1 element array of const char" (plain char in C) where N is the number of characters in the string.
Putting that together,
char *szValue = "Mollie";
the string literal "Mollie" is an expression of type "7-element array of const char". Since it isn't the operand of the sizeof or unary & operators, and since it isn't being used to initialize an array, it is converted to an expression of type "pointer of const char", and its value is the address of the first element.
This is an important point - arrays are not pointers. Arrays do not store a pointer value anywhere. Under certain circumstances the compiler will substitute a pointer expression for an array, but otherwise they're two completely different animals.
The previous two answers have covered the gist of the matter. There are strong similarities between pointers and arrays (arrays can be considered pointers to a memory location themselves). For example, when an array name is used as an argument for a function the memory address of the first element is passed to the function as opposed to the value at that location (which would be the case for an ordinary variable).
The code above assigns a string literal "Mollie" to the char pointer szValue. Therefore starting from the "M" (which is treated as a char not a string) in "Mollie" it assigns each letter to the memory addresses in the vicinity of szValue. Therefore the pointer variable szValue would point to the first element of the string, equivalent to saying szValue[0] (if szValue were declared as a char array).
Hope this helps.
Edit:
Just to be more specific szValue points to the memory address of the first element in the string "Mollie" which is equivalent to using &szValue[0].
The line char* s = "Mollie", in C and C++, means this:
s is a pointer to a character.
It is initialized pointing to the first character in a static character array containing the characters M, o, ... e, and a null character \0.
A pointer is a pointer to a piece of memory containing a type.
That piece of memory is not necessarily a variable.
How come this code accepts a string into a char?
Your code isn't actually doing that, instead it's assigning the address of the first character element to the value of the pointer. Viz, the pointer points to the beginning of the string, incrementing the pointer by 1 will now refer to the next character in the array, and so on.
We have not specified that this char is an array. Then how come?
You have char*, not char, that's the difference. Also, semantically char[] and char* are the same.
Where is that pointer getting its address from?
The compiler sticks the literal string "Mollie" in a read-only portion of the program's memory that is loaded when the program is executed by the operating system. The value of szValue is equal to the address of that string literal. You'll find it's read-only and if you attempt to modify it you'll get a segfault (on *nix) or an Access Violation on Windows.
char *foo declares a char pointer, which in C is how arrays are represented.
So:
char *foo = "ABCD";
foo[1] == 'B'
which also means:
*(foo + 1) == 'B'
The actual type of "Mollie" is const char*. You can think about it as about an array of chars. Thats why you can write something like this:
const char c = szValue[4];
Here, szValue is just a pointer to the first character of the string. Hence, szValue + 1 will be a pointer to the second character and so on.
You should know that when you use char *szValue = "Mollie";, it means you assign the address of constant string "Mollie" to szValue and you can't change value through szValue, because "MOllie" is a constant value stored in constant area.

Why can you put multiple characters in C++ char*

I can't figure out how this works.
// This doesn't work (obviously)
char a;
a = "aaa";
// This works
char* a;
a = "aaa";
How come this works ?
Since char type stores only one character or 1 byte number, how can you store more characters in it when you access it through a pointer ?
You're not putting characters into the char*. You're creating an array of characters in a part of memory determined by your compiler, and pointing the char* at the first character of that array.
The array is actually const, so you shouldn't be able to assign it to a non-const pointer. But due to historical reasons, you still can in many C++ implementations. However, it was officially made illegal in C++11.
The second one is a pointer to a string of chars, not a single char. Tutorial.

How can char * name = "Duncan"; be valid if pointers can only hold addresses?

I thought that pointers can only hold addresses to other variables. So how can the following statement that I came across be valid? It's holding a string.
char * name = "Duncan"
Thanks.
It's holding a pointer to a string. That's not the same. name just contains an address of memory which contains the string.
"Duncan" is a null terminated string and as such an array of char ({'D', 'u', 'n', 'c', 'a', 'n', '\0'}). char*name="Duncan"; sets name to the address of the array.
Your statement is OK in C, but in C++ "Duncan" is a const char array, so you should use const char *name = "Duncan".
BTW, if you do not need to change the pointer variable name, it's better to have const char name[] = "Duncan". This only allocates memory for the string. Your sample code allocates memory for the string and for the pointer variable name. (Of course the compiler might optimize away name.)
It's still pointing to a string. The string gets put in memory first, and name points to that. It's compiled into your program, so it may not be obvious.
pointers can only hold addresses to other variables.
This is incorrect: references hold addresses of other variables; pointers can hold addresses of anything, or even nothing in particular (e.g. NULL).
In this case, name holds an address of a memory block of 7 bytes, containing ASCII codes for D,u,n,c,a,n, and \0.
In this particular case, the compiler will store the array with data Duncan\0 somewhere in the object file and the pointer will point there.
So yes, the pointer is only holding an address. The data are somewhere else.
This brings me to saying, writing code like this is not so good. For example, if you change that string through your pointer, you get an undefined behavior.
That's a definition of a char pointer. After the definition, on the right side of "=", you have a constant definition. The constant is stored somewhere in memory and its address is used as first value for "name".
Later on you will be able to assign other value to "name". You are not bound to the first value, in fact "name" is a variable.

Understanding C-strings & string literals in C++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.
The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.
char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".
In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)
char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.

c++ char array out of scope or not?

I have a method that requires a const char pointer as input (not null terminated). This is a requirement of a library (TinyXML) I'm using in my project. I get the input for this method from a string.c_str() method call.
Does this char pointer need to be deleted? The string goes out of scope immediately after the call completes; so the string should delete it with its destructor call, correct?
Do not delete the memory you get from std::string::c_str. The string is responsible for that (and it is entirely possible it gave you a pointer to its internal buffer, so if you deleted it, that would be a bad thing (tm)).
The char array returned by string.c_str() is null terminated. If tinyXML's function takes a not null terminated char* buffer, then your probably gonna get some unexpected behaviour.
const char* c_str ( ) const;
Get C string equivalent
Generates a null-terminated sequence
of characters (c-string) with the same
content as the string object and
returns it as a pointer to an array of
characters.
A terminating null character is
automatically appended.
No, it does not need to be released. String's destructor does that for you.
The returned array points to an
internal location with the required
storage space for this sequence of
characters plus its terminating
null-character, but the values in this
array should not be modified in the
program and are only granted to remain
unchanged until the next call to a
non-constant member function of the
string object.
Source
On another note,
If you don't need to char pointer to be null terminated then you're better off to used str.data() rather than str.c_str(). The difference is that .data() doesn't grantee that what you get is going to be null terminated. This is useful if the your string just happens to occupy the entire length of the internal buffer allocated by string. In this case, calling .c_str() would force string to reallocate the date to a new bigger buffer, ones that contains enough space to add the '\0' in the end.
In any rate, ofcourse you shouldn't delete the pointer returned. string will take care of that.
std::string.c_str() returns a pointer to a null terminated string. The actual array of characters is still owned by the std::string object, and it is valid as long as:
The std::string object is valid, and
No calls to non-const member functions on the std::string object are made (i.e. modifying the string invalidates any previous C-style string pointed to).
It's up to the string object itself to allocate and release the null terminated array-of-char it returns to you.
You can always use a null-terminated string as a non-null-terminated string. After all, an NTS is just a non-NTS with an extra zero at the end. As long as the string is correctly terminated as the function expects, it'll never see the "extra" null.