And where are literals in memory exactly? (see examples below)
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).
Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?
const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers
from pointer target type (on gcc), error on VC++2k10 */
char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to
modify const anyway, so why can I and with no errors? And where is the
literal stored for I have a "bus error"?
I have 'access violation writing' on VC++2010 */
const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d); /* displays 'b' (why doesn't it do the same
behavior as modifying a literal? It displays 'a' on VC++2010 */
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.
String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).
But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.
const char* a = "test";
The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file.
In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.
char* b = a;
Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.
char *c = "test"; // no compile errors
c[0] = 'p';
The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level
const char d = 'a';
*(char*)&d = 'b';
First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:
const int d = 55;
*(int*)&d = 56;
The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.
I'd like to add one more case:
char b[] = "test";
b[2] = 'o';
The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.
Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.
It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.
And where are literals in memory exactly? (see examples below)
Initialized data segment. On Linux it is either .data or .rodata.
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.
The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.
I have no warnings even with most of the compiler flags
Really? When I compile the following code snippet:
int main()
{
char* p = "some literal";
}
on g++ 4.5.0 even without any flags, I get the following warning:
warning: deprecated conversion from string constant to 'char*'
You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.
It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.
Related
I was trying strcpy like this:
int main()
{
char *c="hello";
const char *d="mello";
strcpy(c,d);
cout<<c<<endl;
return 0;
}
Compiling this gives a warning and running the code produces a Segmentation fault.
The warning is:
warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
char *c="hello";
The declaration of strcpy is: char * strcpy ( char * destination, const char * source );
So where am I wrong (regarding the warning)? IMO I have used a const char and a char, same as the function's declaration.
Does c* or d* not allocate memory to hold "hello" and "mello", due to
which it is throwing segmentation fault? How does the
initialization/definition of a variable like c* work?
Both variables are pointing to constant literal text.
Modifying constant literal text is undefined behavior.
Place the text into an array if you want to modify it:
char e[] = "fred";
e[0] = 'd';
Compiler warnings (and errors) always refer to a particular line of code (and a portion of that line as well). It's a good idea to pay close attention to where the problems are.
In your case, the warning is not about calling strcpy, but about initialising c. You're making a char *, a pointer to modifiable char, point to "hello", a read-only string literal. That's what the compiler is warning you about.
This used to be possible in C++, but with the caveat that you must never actually modify the memory pointed to (since string literals are immutable). Since C++11, converting a string literal to a char * is expicitly forbidden and will generate an error instead of a warning.
The reason for the segfault should be clear now: you're trying to overwrite a piece of read-only memory with the strcpy call.
The correct solution is to use writable memory for storing the destination string. You could do it like this:
char c[] = "hello";
This creates a new char array, which is writable normally (just like any other local variable), and initialises the array with the contents of the string literal (effectively a copy thereof).
Of course, since this is C++, you should be using std::string to store strings and not bother with char*, strcpy, and other C-isms at all.
The code you are trying is for TurboC++ it seems. As drescherjm pointed out in the comments, the char *c="hello"; is no longer legal in c++.
If you try out your code in Turbo C++, it'll work just as you expected, although this isn't the same case with the modern c++.
So where am I wrong (regarding the warning)? IMO I have used a const char and a char, same as the function's declaration.
The warning is due to the reason mentioned above, plus the warning isn't on the use of strcpy, it's on the declaration.
Does c* or d* not allocate memory to hold "hello" and "mello", due to which it is throwing segmentation fault? How does the initialization/definition of a variable like c* work?
Well, the compiler would just put the strings in to a random place in the memory and let the c/d pointer point to it. It's risky as you can lose the data if you make the pointer point to something else by accident.
What happen when constant string assigned to constant character pointer(or character pointer)? ex:
const char* p="String";
how and where the compiler take this array .. heap memory ?
and what different from it and :
char* p="String";
thanks.
What happen when constant string assigned to constant character pointer(or character pointer)?
Nothing happens to the const string itself: a pointer to it is assigned to p, that's all.
how and where the compiler take this array .. heap memory?
It does not take it anywhere. String's data remains where it was, which is a compiler-specific thing.
and what different from it and : char* p="String";
The compiler is going to reject a program with the assignment of a literal to non-const, or warn you of a deprecated conversion, depending on the C++ version and/or compiler settings.
If you try to modify p[...]'s content using the const declaration, the compiler is going to stop you. If you try doing the same without const, the program may compile, bit it would cause undefined behavior at runtime.
The string literal "String" is a static array of const char somewhere in your program, probably placed into a read-only part of the address space when the executable is set up by your OS.
When you assign const char *p = "String", then p is initialized with a pointer to that array of const char. So *p is 'S' and p[1] is 't', etc.
When you assign char *p = "String", then your compiler should reject that (perhaps you have insufficient diagnostic level set?). If you tell the compiler to accept it regardless, then you have a pointer to (modifiable) char pointing at the string literal. If you subsequently attempt to write through this pointer, you'll get no compiler error, and instead you are likely to see one of two problems runtime:
(If the compiler/linker has placed the string literal into read-only memory) a signal is raised indicating memory access violation (SIGSEGV on Unix-like systems).
(If the string literal is in writeable memory) other uses of the same string literal get modified, because the compiler is permitted to point them all at the same storage.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.
I understand that the syntax char * = "stringLiteral"; has been deprecated and may not even work in the future. What I don't understand is WHY.
I searched the net and stack and although there are many echos confirming that char * = "stringLiteral"; is wrong and that const char * = "stringLiteral"; is corect, I have yet to find information about WHY said syntax is wrong. In other words, I'd like to know what the issue really is under the hood.
ILLUSTATING MY CONFUSION
CODE SEGMENT 1 - EVIL WAY (Deprecated)
char* szA = "stringLiteralA"; //Works fine as expected. Auto null terminated.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
So what exactly is the problem here? Seems to work just fine.
CODE SEGMENT 2 (The "OK" way)
const char* szA = "stringLiteralA"; //Works fine as expected. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
Also works fine. No difference. What is the point of adding const?
CODE SEGMENT 3
const char* const szA = "stringLiteralA"; //Works. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Breaks here. Can't reasign.
I am only illustrating here that in order to read only protect the variable content you have to const char* const szA = "something"; .
I don't see the point for deprecation or any issues. Why is this syntax deprecated and considered an issue?
const char * is a pointer (*) to a constant (const) char (pointer definitions are easily read from right to left). The point here is to protect the content, since, as the standard says, modifying the content of such a pointer results in undefined behavior.
This has its roots in the fact that typically (C/C++) compilers group the strings used throughout the program in a single memory zone, and are allowed to use the same memory locations for instances of the same string used in unrelated parts of the program (to minimize executable size/memory footprint). If it was allowed to modify string literals you could affect with one change other, unrelated instances of the same literal, which obviously isn't a great idea.
In facts, with most modern compilers (on hardware that supports memory protection) the memory area of the string table is read-only, so if you attempt to modify a string literal your program crashes. Adding const to pointers that refer to string literals makes these mistakes immediately evident as compilation errors instead of crashes.
By the way, notice that the fact that a string literal can decay implicitly to a non-const char * is just a concession to backwards compatibility with pre-standard libraries (written when const wasn't part of the C language yet), as said above the standard always said that changing string literals is UB.
The idea behind the deprecation is to help the compiler catch errors that would otherwise cause crashes at runtime.
char *hello = "hello";
strcpy(hello, "world"); // Compiles but crashes
as opposed to
const char *hello = "hello";
strcpy(hello, "world"); // Does not compile
This is a relatively cheap way of catching an entire class of very nasty runtime errors, so deprecation of the conversion is very much in line with the general philosophy of C++ as "a better C".
In addition, your code segment 2 does not invalidate the fact that the content of the pointer is protected. It is the pointer itself that gets written over, not its content. There is a difference between const char *ptr and char * const ptr: the former protects the content; the later protects the pointer itself. The two can be combined to protect the pointer and its content as const char * const ptr.
"abc" is a static array that points to possibly immutable memory. In C, modifying the content of a string literal is undefined behavior (UB).
But C99 did not make "abc" an object of type const char [n]. In fact, this is quite the opposite, as to keep compatibility with C89 (and ANSI C), which specifies (§3.1.4/3):
A character string literal has static storage duration and type array of char, and is initialized with the given characters.
That is, the declaration
char* c = "12345";
is not deprecated in C, even up to C11.
From http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf, we can see the rationale in C99 of making the string literal modification UB, while keeping the type to be char [n]:
String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform ertain optimizations. However, string literals do not have the type array of const char in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the C89 Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see §J.5.5)
where C99 §J.5.5 is:
J.5.5 Writable string literals
String literals are modifiable (in which case, identical string literals should denote distinct objects) (6.4.5).
On the other hand, as your code is C++, this should actually be wrong in standard C++, because it requires (C++03 §2.13.4/1)
... An ordinary string literal has type “array of n const char” and static storage duration ...
and assigning a const char[n] to a char* shouldn't compile. The compiler warns about "deprecation", because existing implementation at that time allowed the conversion (because C allows it), so it went into Annex D: Compatibility features:
D.4 Implicit conversion from const strings
The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.
The syntax is wrong because there is not implicit conversion from char const * to char * .
The type of a string literal has been char const * for ever in C and C++. (Might be wrong about very old C.)
The change in the rules has nothing to do with the type of string literals but with allowed conversions between pointer types.
The conversion is a mistake because of a pointer-to-const-thing is that thing is immutable. A string literal, which is a value known to be constant at compile and link time, might be put in read only memory segments.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.