Understanding C-strings & string literals in C++

Understanding C-strings & string literals in C++ - c++

I have a few questions I would like to ask about string literals and C-strings.
So if I have something like this:
char cstr[] = "c-string";
As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.
It is then legal to perform this:
for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
cstr[i] = 97+i;
So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?
But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.
char * p = "const cstring";
*p = 'A'; // illegal memory write
I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.

The bit that you're missing is a little compiler magic where this:
char cstr[] = "c-string";
Actually executes like this:
char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);
You don't see that bit, but it's more or less what the code compiles to.

char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...
char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".

In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.
In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.
Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.
(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)

char cstr[] = "c-string";
This copies "c-string" into a char array on the stack. It is legal to write to this memory.
char * p = "const cstring";
*p = 'A'; // illegal memory write
Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.
Related question here.

Related

Pointer value to array C++

I'm trying to set a pointer array to a char array in class Tran, however it only applies the first letter of the string. I've tried many other ways but can't get the whole string to go into name.
edit: name is a private variable
char name[MAX_NAME + 1];
Trying to output it using cout << name << endl;
the input is:
setTran("Birth Tran", 1);
help would be appreciated, thank youu

namee[0] == NULL
name[0] = NULL;
These are bugs. NULL is for pointers. name[0] as well as namee[0] is a char. It may work (by work, I mean it will assign the first character to be the null terminator character) on some systems because 0 is both a null pointer constant and an integer literal and thus convertible to char, and NULL may be defined as 0. But NULL may also be defined as nullptr in which case the program will be ill-formed.
Use name[0] = '\0' instead.
name[0] = *namee;
however it only applies the first letter of the string.
Well, you assign only the first character, so this is to be expected.
If you would like to copy the entire string, you need to assign all of the characters. That can be implemented with a loop. There are standard functions for copying a string though; You can use std::strncpy.
That said, constant length arrays are usually problematic because it is rarely possible to correctly predict the maximum required size. std::string is a more robust alternative.

The underlying issue you are trying to assign a const char* to an char* const. When declaring
char name[MAX_NAME + 1];
You are declaring a constant memory address containing mutable char data (char* const). When you are passing a const char* to your function, you are passing a mutable pointer containing constant data. This will not compile. You should be doing a deep copy of the char array by using:
strcpy_s(dst, buffer_size, src);
This copy function will make sure that your array does not overflow, and that it is null terminated.
In order to be able to assign a pointer to a char array, it would need to be allocated on the heap with
char* name = new char[MAX_NAME + 1];
This would allow assigning a char* or char* const to it afterwards. You however need to manage the memory dynamically at this point, and I would advise against this in your case, as passing "Birth Tran" would lead to undefined behaviours as soon as char* const namee goes out of scope.

Pointer stores strings?

I recently started learning C++ and came across with the concept of a pointer (which is a variable that stores the address of another variable). However I also came across with char* str = "Hello" and I became confused. So it looks like the of "Hello" is being assigned to the pointer str (which I thought could only store addresses). So can a pointer also store a string?

For future reference you should only use the language tag of the language you're using. C and C++ are two very different languages, and in this case there is a difference.
First the common part: Literal strings like "Hello" are stored by the compiler as arrays. In the case of "Hello" it's an array of six char elements, including the string null terminator.
Now for the part that's different: In C++ such string literal arrays are constant, they can not be modified. Therefore it's an error to have a non-const pointer to such an array. In C the string literal arrays are not constant, but they are still not modifiable, they are in essence read-only. But it's still allowed to have a non-const pointer to them.
And finally for your question: As with all arrays, using them make them decay into a pointer to their first element, and that is basically what happens here. You make your variable str point to the first element in the string literal array.
A little simplified it can be seen like this (in C):
char anonymous_literal_array[] = "Hello";
...
char *str = &anonymous_literal_array[0]; // Make str point to first element in array

The pointer will store the address of the start of the string, therefore the first character. In this case "Hello" is an immutable literal. (Check the difference: Immutable vs constant)
More correctly, a pointer cannot store a string as well as anything, a pointer can point to an address containing data of the pointer's type.
Since char* is a pointer to char, it points exactly to a char.

In this example, the pointer is the address of the first character in the string. This is inherited from C where a "string" is an array of characters terminated by a NULL character. In C and C++, arrays and pointers are closely related. When you do your own memory management, you often create an array with a pointer to the first element of the array. That is exactly what is going on here with the array holding the string literal "Hello".

in c/c++ strings are stored as array of characters. Literal string like "Hello" actually return start of temporary read only character array which hold this string.

A char* variable is a pointer to a single byte(char) in memory. The most common way of handling strings is called a c-style string where the char* is a pointer to the first character in the string and is followed by the rest of the characters in memory. The c-string will always end in a '\0' or null character to signify that you've reached the end of the string ( 'H', 'e', 'l', 'l', 'o', '\0' ).
The "Hello" is called a string literal. What happens in memory is at the very beginning of your program, before anything else is run, the program allocates and sets the memory for the "Hello" string where the other static constants are located. When you write char* str = "Hello"; The compiler knows you're using a string literal and sets str to the location of the first character of that string literal.
But be careful though. All string literals are stored in a portion of memory that you cannot write to. If you try to modify that string, you might get memory errors. To make sure this doesn't happen, when dealing with c-strings, you should always write const char* str = "Hello"; That way the compiler will never allow you to modify that memory.
To have a modifiable string, you will need to allocate and manage the memory yourself. I would suggest using std::string, or have some fun and make your own string class that handles the memory.

C++ Array Inquiry

Why can we do:
char* array = "String";
but not
int* array = 1;
To my understanding * means address, so I don't really understand why we can give a non-address value, like "String." to char* array.

char* array means that array is a variable that can hold the address of another object (such as another variable or constant).
If the program has "String", it means that there is actually an array of 7 characters that exists in memory somewhere and it holds the contents "String".
When you write array = "String"; then the variable array is made to hold the address of the letter 'S' in that string.
This is because C++ has a rule, sometimes called array-pointer decay which means that if you try to use an array (such as "String") in a context where a value is expected, then it gets automatically converted to a pointer to the first element of that array.
Without that rule you'd have to write array = &("String"[0]); . The rule was included in C originally to avoid having to write &....[0] all over the place when working with arrays, although in hindsight it seems to have generated more pain than pleasure.
Moving onto int* i = 1. You have said that i can hold the address of an int, but you have not provided any such address. Variables thare aren't arrays don't automatically get converted to their address. In fact 1 isn't even a variable. We call it a prvalue , it doesn't have any memory storage area associated with it, so it does not have an address. To point at an instance of a 1 you would have to make a variable, for example:
int j = 1; int* i = &j;

* does not mean adress. Its meaning is context sensitive, but most of the time it means pointer.
The reason for this, not to work is because "String" is an array of characters, or a pointer to an character. In contrast to this, 1 is a literal which is not a valid adress. You should write int array = 1 instead, and after that you could do int *brray = &array.

* means a pointer, not an address. You can retrieve an address using the & operator.
char* array = "String";
Actually declares array as a pointer to a character, and the = sign after the declaration tells the compiler what value should the pointer posses. In this case it's an address of "String" in the string pool somewhere in the memory of the run program.
int* array = 1;
Doesn't put the address of 1 to array as you may expect. However, with a little adjustment
int* array = (int*)1;
... it could point to an integer at address 1, which is unfortunately unaccessible.

This assignment:
char* array = "String";
Assign "String" to an available location in memory and returns the memory address of the first position of "String" in memory. "array" stores the address of "S".
This assignment:
int* array = 1;
Doesn't work because you are trying to assign an integer to a pointer to integer. The types are different

"I don't really understand why we can give a non-address value, like "String." "
That's because a character string literal actually is const char[] array, which decays to a pointer when assigned to a char*, while 1 isn't one and you can't take it's address in any way.

Query on Pointer dereferencing

I was doing some some string manipulation and encountered with a problem.
char *arr1 = "HELLO";
(*arr1)++;
this throws an error "Access violation writing to location"!
however, below code works fine.
char arr1[] = "HELLO";
(*arr1)++;
and what are their memory segments on which both the char* arr1 and char arr1[] are stored ?

char *arr1 = "HELLO";
The above definition implies that the memory for arr1 could be allocated in a read-only part of memory (it is implementation-defined behaviour, actually), and thus could cause 'Access violation' when you are trying to change the value at the memory location is pointing to.
char arr1[] = "HELLO";
In this case, the memory for arr1 is allocated in stack - which is writable. Hence, the expression (*arr1)++ works fine without any issue.

In the first case, arr1 is a local variable which holds a pointer to a read-only memory segment containing the string "HELLO". The statement (*arr1)++ tries to modify the first byte of the segment (which contains the character 'H'), which leads to access violation.
In the second case, arr1 is a local variable which holds an array of six bytes, which are initialized with {'H', 'E', 'L', 'L', 'O', 0}. Local variables are in read-write memory, so modifying them doesn't lead to an error.

There is no pointer arithmetic. You are adding one to the first charachter of the string. Since string constants are not writable, the first version gives an error. In the second version you change the content of on char array, so there is no problem.

You didn't do any pointer arithmetic. You dereferenced the pointer then incremented the thing it points to (the first character).
Unfortunately for you, string literals cannot be modified (and you should have been warned by your compiler that char* is bad and const char* is good, for pointing to string literals).
It works in the second case because initialising a local array from a string literal creates your own copy of the data.

Literals like "HELLO" are of type const char[] which decays to const char*, compiler can put it in memory which actually cannot be modified, modification results in undefined behaviour. g++ gives you warning like below:
main.cpp:7:14: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
char *arr1 = "HELLO";
Compiler allows this because as your can read here (http://en.cppreference.com/w/cpp/language/string_literal):
In C, string literals are of type char[], and can be assigned directly
to a (non-const) char*. C++03 allowed it as well (but deprecated it,
as literals are const in C++). C++11 no longer allows such assignments
without a cast.
but actually I dont get errors on g++ even with C++11 enabled.
arrays of course you can modify, so your second example works fine. This code:
char arr1[] = "HELLO";
creates on stack an array of length 6 and initializes it with "HELLO\0"

initializing char and char pointers

What's the difference between these:
This one works:
char* pEmpty = new char;
*pEmpty = 'x';
However if I try doing:
char* pEmpty = NULL;
*pEmpty = 'x'; // <---- doesn't work!
and:
char* pEmpty = "x"; // putting in double quotes works! why??
EDIT: Thank you for all the comments:
I corrected it. it was supposed to be pEmpty ='x',
So, this line doesn't even compile: char pEmpty ='x';
wheras this line works: char* pEmpty ="x"; //double quotes.

Your second line doesn't work because you are trying to assign 'x' to pEmpty rather than *pEmpty.
Edit: Thanks to Chuck for the correction. It ALSO doesn't work because you need to allocate some memory to hold the value 'x'. See the example below.
The third line does work because you are using an initalizer rather than a regular assignment statement.
In general, you should understand how pointers and dereferencing work.
char *p = new char(); // Now I have a variable named p that contains
// the memory address of a single piece of character
// data.
*p = 'x'; // Here I assign the letter 'x' to the dereferenced value of p;
// that is, I look up the location of the memory address contained
// in p and put 'x' there.
p = 'x'; // This is illegal because p contains a memory address,
// not a character.
char q = 'x'; // Now I have a char variable named q containing the
// character 'x'.
p = &q; // Now I assign the address of q (obtained with the reference
// operator &) to p. This is legal because p contains a memory
// address.

You need to keep in mind what a pointer is — it's just a normal variable that holds an address, much like a char holds a character value. This address can be used to look up another variable (with the * operator).
When you do char* pEmpty = new char, you're giving pEmpty the value returned by new char, which is the address of a chunk of memory large enough to hold a char value. Then you use *pEmpty to access this memory and assign it the char value 'x'.
In the second example, you write pEmpty = 'x' — but remember that pEmpty is a pointer, which means it's supposed to hold an address. Is 'x' an address? No, it's a character literal! So that line isn't really meaningful.
In the third example, you're assigning pEmpty the string literal "x". Is this an address? Yes, it is. The literal evaluates to the address of that constant string.
Remember, pointers are a completely different thing from the type that they point to. They can be used to access a value of that type, but they are a completely different type of their own.

The difference is that string literals are stored in a memory location that may be accessed by the program at runtime, while character literals are just values. C++ is designed so that character literals, such as the one you have in the example, may be inlined as part of the machine code and never really stored in a memory location at all.
To do what you seem to be trying to do, you must define a static variable of type char that is initialized to 'x', then set pEmpty to refer to that variable.

The second example doesn't work for a few reasons. The first one would be that you have made the pointer point to, well, nowhere in particular. The second is that you didn't actually dereference it, so you are telling the pointer to point to the address of a character literal. As it has no address, the compiler will complain.
EDIT:
Just to be clear, the asterisk (*) is the operator that dereferences pointers.

pEmpty = 'x'; assigns pEmpty (rather than the memory pointed by it) the value of 'x'.
*pEmpty = 'x'; //this is probably what you want

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Understanding C-strings & string literals in C++ - c++

The bit that you're missing is a little compiler magic where this: char cstr[] = "c-string"; Actually executes like this: char *cstr = alloca(strlen("c-string")+1); memcpy(cstr,"c-string",strlen("c-string")+1); You don't see that bit, but it's more or less what the code compiles to.

char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ... char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".

Related

Pointer value to array C++

Pointer stores strings?

C++ Array Inquiry

Query on Pointer dereferencing

initializing char and char pointers

Categories

Resources