I understand that the syntax char * = "stringLiteral"; has been deprecated and may not even work in the future. What I don't understand is WHY.
I searched the net and stack and although there are many echos confirming that char * = "stringLiteral"; is wrong and that const char * = "stringLiteral"; is corect, I have yet to find information about WHY said syntax is wrong. In other words, I'd like to know what the issue really is under the hood.
ILLUSTATING MY CONFUSION
CODE SEGMENT 1 - EVIL WAY (Deprecated)
char* szA = "stringLiteralA"; //Works fine as expected. Auto null terminated.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
So what exactly is the problem here? Seems to work just fine.
CODE SEGMENT 2 (The "OK" way)
const char* szA = "stringLiteralA"; //Works fine as expected. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
Also works fine. No difference. What is the point of adding const?
CODE SEGMENT 3
const char* const szA = "stringLiteralA"; //Works. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Breaks here. Can't reasign.
I am only illustrating here that in order to read only protect the variable content you have to const char* const szA = "something"; .
I don't see the point for deprecation or any issues. Why is this syntax deprecated and considered an issue?
const char * is a pointer (*) to a constant (const) char (pointer definitions are easily read from right to left). The point here is to protect the content, since, as the standard says, modifying the content of such a pointer results in undefined behavior.
This has its roots in the fact that typically (C/C++) compilers group the strings used throughout the program in a single memory zone, and are allowed to use the same memory locations for instances of the same string used in unrelated parts of the program (to minimize executable size/memory footprint). If it was allowed to modify string literals you could affect with one change other, unrelated instances of the same literal, which obviously isn't a great idea.
In facts, with most modern compilers (on hardware that supports memory protection) the memory area of the string table is read-only, so if you attempt to modify a string literal your program crashes. Adding const to pointers that refer to string literals makes these mistakes immediately evident as compilation errors instead of crashes.
By the way, notice that the fact that a string literal can decay implicitly to a non-const char * is just a concession to backwards compatibility with pre-standard libraries (written when const wasn't part of the C language yet), as said above the standard always said that changing string literals is UB.
The idea behind the deprecation is to help the compiler catch errors that would otherwise cause crashes at runtime.
char *hello = "hello";
strcpy(hello, "world"); // Compiles but crashes
as opposed to
const char *hello = "hello";
strcpy(hello, "world"); // Does not compile
This is a relatively cheap way of catching an entire class of very nasty runtime errors, so deprecation of the conversion is very much in line with the general philosophy of C++ as "a better C".
In addition, your code segment 2 does not invalidate the fact that the content of the pointer is protected. It is the pointer itself that gets written over, not its content. There is a difference between const char *ptr and char * const ptr: the former protects the content; the later protects the pointer itself. The two can be combined to protect the pointer and its content as const char * const ptr.
"abc" is a static array that points to possibly immutable memory. In C, modifying the content of a string literal is undefined behavior (UB).
But C99 did not make "abc" an object of type const char [n]. In fact, this is quite the opposite, as to keep compatibility with C89 (and ANSI C), which specifies (§3.1.4/3):
A character string literal has static storage duration and type array of char, and is initialized with the given characters.
That is, the declaration
char* c = "12345";
is not deprecated in C, even up to C11.
From http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf, we can see the rationale in C99 of making the string literal modification UB, while keeping the type to be char [n]:
String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform ertain optimizations. However, string literals do not have the type array of const char in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the C89 Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see §J.5.5)
where C99 §J.5.5 is:
J.5.5 Writable string literals
String literals are modifiable (in which case, identical string literals should denote distinct objects) (6.4.5).
On the other hand, as your code is C++, this should actually be wrong in standard C++, because it requires (C++03 §2.13.4/1)
... An ordinary string literal has type “array of n const char” and static storage duration ...
and assigning a const char[n] to a char* shouldn't compile. The compiler warns about "deprecation", because existing implementation at that time allowed the conversion (because C allows it), so it went into Annex D: Compatibility features:
D.4 Implicit conversion from const strings
The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.
The syntax is wrong because there is not implicit conversion from char const * to char * .
The type of a string literal has been char const * for ever in C and C++. (Might be wrong about very old C.)
The change in the rules has nothing to do with the type of string literals but with allowed conversions between pointer types.
The conversion is a mistake because of a pointer-to-const-thing is that thing is immutable. A string literal, which is a value known to be constant at compile and link time, might be put in read only memory segments.
Related
I've recently been learning C++ and have realised that string literals in C++ have to be constants, whereas in C, they do not. Here is an example. The following code would be valid in C, but not in C++:
char* str = "Hello, World!";
In order to do the same thing in C++, the following statement has to be used:
const char* str = "Hello, World!";
Why is there a difference?
Expanding on Christian Gibbons' answer a bit...
In C, string literals, like "Hello, World!", are stored in arrays of char such that they are visible over the lifetime of the program. String literals are supposed to be immutable, and some implementations will store them in a read-only memory segment (such that attempting to modify the literal's contents will trigger a runtime error). Some implementations don't, and attempting to modify the literal's contents may not trigger a runtime error (it may even appear to work as intended). The C language definition leaves the behavior "undefined" so that the compiler is free to handle the situation however it sees fit.
In C++, string literals are stored in arrays of const char, so that any attempt to modify the literal's contents will trigger a diagnostic at compile time.
As Christian points out, the const keyword was not originally a part of C. It was, however, originally part of C++, and it makes using string literals a little safer.
Remember that the const keyword does not mean "store this in read-only memory", it only means "this thing may not be the target of an assignment."
Also remember that, unless it is the operand of the sizeof or unary * operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T" and the value of the expression will be the address of the first element of the array.
In C++, when you write
const char *str = "Hello, world";
the address of the first character of the string is stored to str. You can set str to point to a different string literal:
str = "Goodbye cruel world";
but what you cannot do is modify the contents of the string, something like
str[0] = 'h';
or
strcpy( str, "Something else" );
C didn't initially have the const keyword, so it would break legacy code if they changed literals to require const-qualification after introduction of the keyword. C's string-literals are immutable, though, so changing the contents is undefined behavior even if it's not const-qualified.
C++, on the other hand, was designed with the const keyword. Initially, C++ did allow for string literals to be assigned to non const-qualified char *s presumably for compatibility with existing C code. As of the C++03 standard, however, they decided to deprecate this functionality rather than allowing the dissonance to continue into perpetuity. I would guess the amount of legacy C++ code relying on non-const qualified char *s pointing to string literals to be small enough that it was a worthy trade-off.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.
The C++11 Standard (ISO/IEC 14882:2011) says in § C.1.1:
char* p = "abc"; // valid in C, invalid in C++
For the C++ it's OK as a pointer to a String Literal is harmful since any attempt to modify it leads to a crash. But why is it valid in C?
The C++11 says also:
char* p = (char*)"abc"; // OK: cast added
Which means that if a cast is added to the first statement it becomes valid.
Why does the casting makes the second statement valid in C++ and how is it different from the first one? Isn't it still harmful? If it's the case, why did the standard said that it's OK?
Up through C++03, your first example was valid, but used a deprecated implicit conversion--a string literal should be treated as being of type char const *, since you can't modify its contents (without causing undefined behavior).
As of C++11, the implicit conversion that had been deprecated was officially removed, so code that depends on it (like your first example) should no longer compile.
You've noted one way to allow the code to compile: although the implicit conversion has been removed, an explicit conversion still works, so you can add a cast. I would not, however, consider this "fixing" the code.
Truly fixing the code requires changing the type of the pointer to the correct type:
char const *p = "abc"; // valid and safe in either C or C++.
As to why it was allowed in C++ (and still is in C): simply because there's a lot of existing code that depends on that implicit conversion, and breaking that code (at least without some official warning) apparently seemed to the standard committees like a bad idea.
It's valid in C for historical reasons. C traditionally specified that the type of a string literal was char * rather than const char *, although it qualified it by saying that you're not actually allowed to modify it.
When you use a cast, you're essentially telling the compiler that you know better than the default type matching rules, and it makes the assignment OK.
You can also use strdup:
char* p = strdup("abc");
or
char p[] = "abc";
as pointed here
You can declare like one of the below options:
char data[] = "Testing String";
or
const char* data = "Testing String";
or
char* data = (char*) "Testing String";
And where are literals in memory exactly? (see examples below)
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).
Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?
const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers
from pointer target type (on gcc), error on VC++2k10 */
char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to
modify const anyway, so why can I and with no errors? And where is the
literal stored for I have a "bus error"?
I have 'access violation writing' on VC++2010 */
const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d); /* displays 'b' (why doesn't it do the same
behavior as modifying a literal? It displays 'a' on VC++2010 */
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.
String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).
But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.
const char* a = "test";
The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file.
In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.
char* b = a;
Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.
char *c = "test"; // no compile errors
c[0] = 'p';
The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level
const char d = 'a';
*(char*)&d = 'b';
First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:
const int d = 55;
*(int*)&d = 56;
The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.
I'd like to add one more case:
char b[] = "test";
b[2] = 'o';
The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.
Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.
It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.
And where are literals in memory exactly? (see examples below)
Initialized data segment. On Linux it is either .data or .rodata.
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.
The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.
I have no warnings even with most of the compiler flags
Really? When I compile the following code snippet:
int main()
{
char* p = "some literal";
}
on g++ 4.5.0 even without any flags, I get the following warning:
warning: deprecated conversion from string constant to 'char*'
You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.
It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.