Modifying a char *const string - c++

I know that const char * is a pointer to a const char, while char *const is a constant pointer to a char.
I am testing this in the following code:
const char *s = "hello"; // Not permitted to modify the string "hello"
char *const t = "world"; // Not permitted to modify the pointer t
s = "hello2"; // Valid
// t = "world2"; // Invalid, gives compilation error
// *(s + 1) = 'a'; // Invalid, gives compilation error
*(t + 1) = 'a'; // Why does this not work?
The last line does not give any error, but causes the program to terminate unexpectedly. Why is modifying the string pointed to by t not allowed?

t is pointing to a string literal it is undefined behavior to modify a string literal. The C++ draft standard section 2.14.5 String literals paragraph 12 says(emphasis mine):
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.
The relevant section from the C99 draft standard is 6.4.5 String literals paragraph 6 which says(emphasis mine):
It is unspecified whether these arrays are distinct provided their elements have the
appropriate values. If the program attempts to modify such an array, the behavior is
undefined.
On a typical modern Unix platform you will find string literals in the read-only segment which would result in a access violation if we attempt to modify it. We can use objdump to inspect the read-only section as follows:
objdump -s -j .rodata
we can see in the following live example that the string literal will indeed be found in the read-only section. Note that I had to add a printf otherwise the compiler would optimize out the string literal. Sample `objdump output:
Contents of section .rodata:
400668 01000200 776f726c 64002573 0a00 ....world.%s..
An alternative approach would be to have t point to an array with a copy of a string literal like so:
char r[] = "world";
char *const t = r ;

Although string literals in C officially have a type of char[] (array of char, not const), the C standard specifically states that they must be treated as non-modifiable. Compilers tend to put string literals in a read-only segment, so attempting to modify them results in an access violation.
String literals are described in section 6.4.5 of the C11 standard (ISO/IEC 9899:2011).

You can bypass the compiler error by recast it as char*, as in *((char*)s + 1) = 'a'; but as it was already estated in other answers, this is undefined behaviour and will probably result in Segmentation Fault because you are editing a string literal.

If you want to test it properly, initialize the strings in a function so the initialization can be dynamic and use strdup() for that.
int
main(int argc, char **argv)
{
char *d1 = strdup("hello");
char *d2 = strdup("world");
const char *s = d1;
char *const t = d2;
...
free(d1);
free(d2);
}
The d1 and d2 variables are mainly used so that the dynamic allocations can be properly freed using free() at the end. Also, as other answers suggest, always treat string literals as const char *.

Related

why it cannot convert char* to char

In c++ we can write
1 char *s="hello"
but the below lines of program produces an error ( cannot convert char* to char)
2 char *s;
*s="hello";
I am confused here, what is difference between 1 and 2
why this error is coming?
In C++, a string literal is a constant array of characters, not just an array of characters like in C. Anyways, to assign to such a variable (Which is best avoided), you do not have to dereference the pointer. Dereferencing it accesses the first element, which is just a char. A char cannot hold an array of characters inside it, causing an error. This is more the reason why you should be using std::string.
Some compilers such as GCC provide extensions to make such code possible since it is not standards compliant code, and it would look like:
char* s = "hello";
s = "new string";
This generates the following warning in GCC (But still gets the expected result):
warning: ISO C++ forbids converting a string constant to 'char*' [-Wwrite-strings]
Clang also has the same behavior with the same output (Also generating a warning)
A string is an array of characters. The start of a string therefore is const char *.
Therefore to reference a string, you can use const char * s = "hello";
However if you dereference a const char*, you get a const char. This isn't a string i.e. *s gives you 'h'.
In your code *s="hello";, you are saying "assign at the dereferened s the value "hello"". Dereferencing s is a character only, to which you are trying to assign a string.
The problem is the second asterisk in your second example.
The first code is this
char *s="hello";
The equivalent code is this
char *s;
s="hello";
No * before s in the second line.
Now as everyone is pointing out neither of these are legal C++. The correct code is
const char *s="hello";
or
const char *s;
s="hello";
Because string literals are constant, and so you need a pointer to const char.
I am confused here, what is difference between 1 and 2 why this error is coming?
As many others * in C++ means different things in different context:
char *s; // * here means that s type is a pointer to char, not just char
*s; // in this context * means dereference s, result of exression is char
int a = 5 * 2; // in this context * means multiply
so case 1 and 2 may look similar to you but they mean very different things hence the error.

modify const char * vs char * content in easy way

im realy confused about const char * and char *.
I know in char * when we want to modify the content, we need to do something like this
const char * temp = "Hello world";
char * str = new char[strlen(temp) + 1];
memcpy(str, temp, strlen(temp));
str[strlen(temp) + 1] = '\0';
and if we want to use something like this
char * str = "xxx";
char * str2 = "xts";
str = str2;
we get compiler warning. it's ok I know when i want to change char * I have to use something memory copy. but about const char * im realy confused. in const char * I can use this
const char * str = "Hello";
const char * str2 = "World";
str = str2; // and now str is Hello
and I have no compiler error ! why ? why we use memory copy when is not const and in const we only use equal operator ! and done !... how possible? is it ok to just use equal in const? no problem happen later?
As other answers say, you should distinguish pointers and bytes they point to.
Both types of pointers, char * and const char *, can be changed, that is, "redirected" to point to different bytes. However, if you want to change the bytes (characters) of the strings, you cannot use const char *.
So, if you have string literals "Hello" and "World" in your program, you can assign them to pointers, and printing the pointer will print the corresponding literal. However, to do anything non-trivial (e.g. change Hello to HELLO), you will need non-const pointers.
Another example: with some pointer manipulation, you can remove leading bytes from a string literal:
const char* str = "Hello";
std::cout << str; // Hello
str = str + 2;
std::cout << str; // llo
However, if you want to extract a substring, or do any other transformation on a string, you should reallocate it, and for that you need a non-const pointer.
BTW since you are using C++, you can use std::string, which makes it easier to work with strings. It reallocates strings without your intervention:
#include <string>
std::string str("Hello");
str = str.substr(1, 3);
std::cout << str; // ell
This is a confusing hangover from the days of early C. Early C didn't have const, so string literals were "char *". They remained char * to avoid breaking old code, but they became non-modifiable, so const char * in all but name. So modern C++ either warns or gives an error (to be strictly conforming) when the const is omitted.
Your memcpy missed the trailing nul byte, incidentally. Use strcpy() to copy a string, that's the right function with the right name. You can create a string in read/write memory by use of the
char rwstring[] = "I am writeable";
syntax.
That is cause your variables are just a pointers *. You're not modifiying their contents, but where they are pointing to.
char * a = "asd";
char * b = "qwe";
a = b;
now you threw away the contents of a. Now a and b points to the same place. If you modify one, both are modified.
In other words. Pointers are never constants (mostly). your const predicate in a pointer variable does not means nothing to the pointer.
The real difference is that the pointer (that is not const) is pointing to a const variable. and when you change the pointer it will be point to ANOTHER NEW const variable. That is why const has no effect on simple pointers.
Note: You can achieve different behaviours with pointers and const with more complex scenario. But with simple as it, it mostly has no effect.
Citing Malcolm McLean:
This is a confusing hangover from the days of early C. Early C didn't have const, so string literals were "char *". They remained char * to avoid breaking old code, but they became non-modifiable, so const char * in all but name.
Actually, string literals are not pointers, but arrays, this is why sizeof("hello world") works as a charm (yields 12, the terminating null character is included, in contrast to strlen...). Apart from this small detail, above statement is correct for good old C even in these days.
In C++, though, string literals have been arrays of constant characters (char const[]) right from the start:
C++ standard, 5.13.5.8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.
(Emphasised by me.) In general, you are not allowed to assign pointer to const to pointer to non-const:
char const* s = "hello";
char ss = s;
This will fail to compile. Assigning string literals to pointer to non-const should normally fail, too, as the standard explicitly states in C.1.1, subclause 5.13.5:
Change: String literals made const.
The type of a string literal is changed from “array of char” to “array of const char”.
[...]char* p = "abc"; // valid in C, invalid in C++
Still, string literal assignement to pointer to non-const is commonly accepted by compilers (as an extension!), probably to retain compatibility to C. As this is, according to the standard, invalid, the compiler yields a warning, at least...

Replacing characters in a char*

How would one go to replace characters in a char*?
For example:
int main() {
char* hello = "hello";
int i;
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
cout << hello;
}
No output at all. Just pauses on me and says that the program isn't responding.
Expected output: aaaaa
The problem here is that you have a pointer to a string literal, and string literals in C++ are constant arrays of characters. Attempting to modify constant data leads to undefined behavior.
You can solve this by making hello an array:
char hello[] = "hello";
char* hello = "hello"; should be char hello[] = "hello";
The former is a string literal which you are not allowed to change. The latter is an array from which you can change any character in it.
Reason:
char* hello = "hello";
Actually this is a string literal, and linker stores this "hello" string on a separate memory section of the program called Read Only memory area (check the linker generated memory map file (possibly .map extension) to see the program memory map).
char* hello
hello is a pointer variable and it will be stored on the stack area of the program.
Now pointer variable hello keeps the address of the read only memory (base address of the string literals).
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
You are trying to modify Read Only memory, In such case it depends on the OS what exception it generates (In some cases you will find segmentation fault also).
Solution:
Define the array on stack(Local to the function) or data memory(Global).
char hello[] = "hello";
For above convention linker will map the string "hello" on the stack (Local to the function) or data memory(Global).
Recommendation
Use keyword const if using string literals to avoid accidental modification of Read only memory, By defining const compiler will throw a indication if any part of the code is trying to modify the read only area.
const char* hello = "hello";
Read below.
From the C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals. The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the
array elements have type char, and are initialized with the
individual bytes of the multibyte character sequence; for wide string
literals, the array elements have type wchar_t, and are initialized
with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

c++ char * initialization in constructor

I'm just curious, I want to know what's going on here:
class Test
{
char * name;
public:
Test(char * c) : name(c){}
};
1) Why won't Test(const char * c) : name(c){} work? Because char * name isn't const? But what about this:
main(){
char * name = "Peter";
}
name is char*, but "Peter" is const char*, right? So how does that initialization work?
2) Test(char * c) : name(c){ c[0] = 'a'; } - this crashes the program. Why?
Sorry for my ignorance.
Why won't Test(const char * c) : name(c) {} work? Because char * name isn't const?
Correct.
how does this initialization work: char * name = "Peter";
A C++ string literal is of type char const[] (see here, as opposed to just char[] in C, as it didn't have the const keyword1). This assignment is considered deprecated in C++, yet it is still allowed2 for backward compatibility with C.
Test(char * c) : name(c) { c[0] = 'a'; } crashes the program. Why?
What are you passing to Test when initializing it? If you're passing a string literal or an illegal pointer, doing c[0] = 'a' is not allowed.
1 The old version of the C programming language (as described in the K&R book published in 1978) did not include the const keyword. Since then, ANSI C borrowed the idea of const from C++.
2 Valid in C++03, no longer valid in C++11.
A conversion to const is a one-way street, so to speak.
You can convert from T * to T const * implicitly.
Conversion from T const * to T * requires an explicit cast. Even if you started from T *, then converted to T const *, converting back to T * requires an explicit cast, even though it's really just "restoring" the access you had to start with.
Note that throughout, T const * and const T * are precisely equivalent, and T stands for "some arbitrary type" (char in your example, but could just as easily be something else like int or my_user_defined_type).
Initializing a char * from a string literal (e.g., char *s = "whatever";) is allowed even though it violates this general rule (the literal itself is basically const, but you're creating a non-const pointer to it). This is simply because there's lots of code that depends on doing this, and nobody's been willing to break that code, so they have a rule to allow it. That rule's been deprecated, though, so at least in theory some future compiler could reject code that depends on it.
Since the string literal itself is basically const, any attempt at modifying it results in undefined behavior. On most modern systems, this will result in the process being terminated, because the memory storing the string literal will be marked at 'read only'. That's not the only possible result. Just for example, back in the days of MS-DOS, it would often succeed. It could still have bizarre side-effects though. For one example, many compilers "knew" that string literals were supposed to be read-only, so they'd "merge" identical string literals. Therefore if you had something like:
char *a = "Peter"; a[1] = 'a';
char *b = "Peter";
cout << b;
The compiler would have "merged" a and b to actually point at the same memory -- so when you modified a, that change would also affect b, so it would print out "Pater" instead of "Peter".
Note that the string literals didn't need to be entirely identical for this to happen either. As long as one was identical to the end of another, they could be merged:
char *a = "this?";
char *b = "What's this?";
a[2] = 'a';
a[3] = 't';
cout << b; // could print "What's that?"
Mandating one behavior didn't make sense, so the result was (and is) simply undefined.
First of all this is C++, you have std::string. You should really consider using it.
Regarding your question, "Peter" is a char literal, hence it is unmodifiable and surely you can't write on it. You can:
have a const char * member variable and initialize it like you are doing name(c), by declaring "Peter" as const
have a char * member variable and copy the content, eg name(strdup(c)) (and remember to release it in destructor.
Correct.
"Peter" is typically stored in a read-only memory location (actually, it depends on what type of device we are on) because it is a string literal. It is undefined what happens when you attempt to modify a string literal (but you can probably guess that you shouldn't).
You should use std::string anyways.
1a) Right
1b) "Peter" is not const char*, its is char* but it may not be modified. The reason is for compatibility with times before const existed in the language. A lot of code already existed that said char* p = "fred"; and they couldn't just make that code illegal overnight.
2) Can't say why that would crash the program without seeing how you are using that constructor.

need help changing single character in char*

I'm getting back into c++ and have the hang of pointers and whatnot, however, I was hoping I could get some help understanding why this code segment gives a bus error.
char * str1 = "Hello World";
*str1 = '5';
ERROR: Bus error :(
And more generally, I am wondering how to change the value of a single character in a cstring. Because my understanding is that *str = '5' should change the value that str points to from 'H' to '5'. So if I were to print out str it would read: "5ello World".
In an attempt to understand I wrote this code snippet too, which works as expected;
char test2[] = "Hello World";
char *testpa2 = &test2[0];
*testpa2 = '5';
This gives the desired output. So then what is the difference between testpa2 and str1? Don't they both point to the start of a series of null-terminated characters?
When you say char *str = "Hello World"; you are making a pointer to a literal string which is not changeable. It should be required to assign the literal to a const char* instead, but for historical reasons this is not the case (oops).
When you say char str[] = "Hello World;" you are making an array which is initialized to (and sized by) a string known at compile time. This is OK to modify.
Not so simple. :-)
The first one creates a pointer to the given string literal, which is allowed to be placed in read-only memory.
The second one creates an array (on the stack, usually, and thus read-write) that is initialised to the contents of the given string literal.
In the first example you try to modify a string literal, this results in undefined behavior.
As per the language standard in 2.13.4.2
Whether all string literals are
distinct (that is, are stored in
nonoverlapping objects) is
implementation-defined. The effect of
attempting to modify a string literal
is undefined.
In your second example you used string-literal initialization, defined in 8.5.2.1
A char array (whether plain char,
signed char, or unsigned char) can be
initialized by a string- literal
(optionally enclosed in braces); a
wchar_t array can be initialized by a
wide string-literal (option- ally
enclosed in braces); successive
characters of the string-literal
initialize the members of the
array.