char* const p = "world";
p[2] = 'l';
The first statement creates a string pointed by a const pointer p, and the second statement tries to modify the string, and it is accepted by the compiler, while in the running time, an access violation exception is poped, and could anyone explain why?
So your question is two-fold:
Why does it give an access violation: character literal strings are stored as literals in the CODE pages of your executable program; most modern operating system do not allow changes to the these pages (including MS-windows) thus the protection fault.
Why does the compiler allow it: the const keyword in this context refers to the pointer and not the thing it points at. Code such as p="Hello"; will cause a compiler error as you have declared p as constant (not *p). If you wanted to declare the thing it points to as constant then your declaration should be const char *p.
In the
char* const p = "World";
P points to const character array, which resides in the .rodata memory area. So, one can not modify the data pointed by the p variable and one cannot change the p to point to some other string as well.
char* const p = "world";
This is illegal in the current C++ standard (C++11). Most compilers still accept it because they use the previous C++ standard (C++03) by default, but even there the code is deprecated, and a good compiler with the right warning level should warn about this.
The reason is that the type of the literal "world" is char const[6]. In other words, a literal is always constant and cannot be changed. When you say …
char* const p = "world";
… then the compiler converts the literal into a pointer. This is done implicitly by an operation called “array decay”: a C array can be implicitly converted into a pointer that points to its beginning.
So "world" is converted into a value of type char const*. Notice the const – we still are not allowed to change the literal, even when accessed through the pointer.
Alas, C++03 also allows that literals be assigned to a non-const pointer to provide backwards compatibility with C.
Since this is an interview question, the correct answer is thus: the code is illegal and the compiler shouldn’t allow it. Here’s the corrected code:
char const* const p = "world";
//p[2] = 'l'; // Not allowed!
We’ve used two consts here: the first is required for the literal. The second makes the pointer itself (rather than the pointed-to value) const.
The thing is if you define a string literal like this:
char * p = "some text"
the compiler will prepare memory for pointer only, and the text location
will be by default set to .text section
in the other hand if you define it as:
char p[] = "some text"
the compiler will know that you want the memory for entire character array.
In the first case you can't read the value as the MMU is set to readonly access for the .text section of memory, in the 2nd case you can freely access and modify memory.
Another think (to prevent the runtime error) would be correct describing of the memory the pointer points to.
For the constant pointer it would be:
const char * const p = "blaaaah"
and for the normal one:
const char * p = "blaah"
char* const p = "world";
Here p is a pointer with constant memory address. So this will compile fine as there is no syntax error as per compiler and it throws exception in runtime bcoz as its a constant pointer you can change its value.
C++ reference constant pointers
Const correctness
Related
What happen when constant string assigned to constant character pointer(or character pointer)? ex:
const char* p="String";
how and where the compiler take this array .. heap memory ?
and what different from it and :
char* p="String";
thanks.
What happen when constant string assigned to constant character pointer(or character pointer)?
Nothing happens to the const string itself: a pointer to it is assigned to p, that's all.
how and where the compiler take this array .. heap memory?
It does not take it anywhere. String's data remains where it was, which is a compiler-specific thing.
and what different from it and : char* p="String";
The compiler is going to reject a program with the assignment of a literal to non-const, or warn you of a deprecated conversion, depending on the C++ version and/or compiler settings.
If you try to modify p[...]'s content using the const declaration, the compiler is going to stop you. If you try doing the same without const, the program may compile, bit it would cause undefined behavior at runtime.
The string literal "String" is a static array of const char somewhere in your program, probably placed into a read-only part of the address space when the executable is set up by your OS.
When you assign const char *p = "String", then p is initialized with a pointer to that array of const char. So *p is 'S' and p[1] is 't', etc.
When you assign char *p = "String", then your compiler should reject that (perhaps you have insufficient diagnostic level set?). If you tell the compiler to accept it regardless, then you have a pointer to (modifiable) char pointing at the string literal. If you subsequently attempt to write through this pointer, you'll get no compiler error, and instead you are likely to see one of two problems runtime:
(If the compiler/linker has placed the string literal into read-only memory) a signal is raised indicating memory access violation (SIGSEGV on Unix-like systems).
(If the string literal is in writeable memory) other uses of the same string literal get modified, because the compiler is permitted to point them all at the same storage.
static const char* const test_script = "test_script";
When and why would you use a statement such as the one above?
Does it have any benefits?
Why use a char* instead of a constant character?
A "constant Character pointer" (const char*) is a constant already and cannot be changed; Then why use the word static in the front of it? What good does it do?
A const char *p is not a constant pointer. It's a modifiable pointer to a const char, that is, to a constant character. You can make the pointer point to something else, but you cannot change the character(s) to which it points. In other words, p = x is allowed, but *p = y is not.
A char * const is the opposite: a constant pointer to a mutable character. *p = y is allowed, p = x is not.
A const char * const is both: a constant pointer to a constant character.
Regarding static: this gives the declared variable internal linkage (cannot be accessed by name from outside the source file). Since you're asking about both C and C++, note that this is where they differ.
In C++, a variable declared const and not explicitly declared extern has internal linkage by default. Since the pointer in question is const (I'm talking about the second const), the static is superfluous in C++ and doesn't do anything.
That's not the case in C, where const variables cannot be used as constant expressions and do not have internal linkage by default. So in C, the static is necessary to give test_script internal linkage.
The above discussion of static assumes the declaration is at file scope (C) or namespace scope (C++). If it's inside a function, the meaning of static changes. Without static, it would a normal local variable in the function—each invocation would have its own copy. With static, it receives static storage duration and thus persists between function calls—all invocations share that one copy. Since you're asking about both C and C++, I am not going to discuss class scope.
Finally, you asked "why pointer instead of characters." This way, the pointer points to the actual string literal (probably somewhere in the read-only part of the process's memory). One reason to do that would be if you even need to pass test_script somewhere where a const char * const * (a pointer to a constant pointer to constant character(s)) is expected. Also, if the same string literal appears multiple times in source code, it can be shared.
An alternative would be to declare an array:
const char test_script[] = "test_script";
This would copy the string literal into test_script, guaranteeing that it has its own copy of the data. Then you can learn the length at compile-time from sizeof test_script (which includes the terminating NUL). It will also consume slightly less memory (no need for the pointer) if it's the only occurence of that string literal. However, as it will have its own copy of the data, it cannot share the storage of the string literal (if that is used elsewhere in the code as well).
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.
I understand that the syntax char * = "stringLiteral"; has been deprecated and may not even work in the future. What I don't understand is WHY.
I searched the net and stack and although there are many echos confirming that char * = "stringLiteral"; is wrong and that const char * = "stringLiteral"; is corect, I have yet to find information about WHY said syntax is wrong. In other words, I'd like to know what the issue really is under the hood.
ILLUSTATING MY CONFUSION
CODE SEGMENT 1 - EVIL WAY (Deprecated)
char* szA = "stringLiteralA"; //Works fine as expected. Auto null terminated.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
So what exactly is the problem here? Seems to work just fine.
CODE SEGMENT 2 (The "OK" way)
const char* szA = "stringLiteralA"; //Works fine as expected. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Works, so change by something same length OK.
std::cout << szA << std::endl;
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;
Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah
Also works fine. No difference. What is the point of adding const?
CODE SEGMENT 3
const char* const szA = "stringLiteralA"; //Works. Auto null term.
std::cout << szA << std::endl;
szA = "stringLiteralB"; //Breaks here. Can't reasign.
I am only illustrating here that in order to read only protect the variable content you have to const char* const szA = "something"; .
I don't see the point for deprecation or any issues. Why is this syntax deprecated and considered an issue?
const char * is a pointer (*) to a constant (const) char (pointer definitions are easily read from right to left). The point here is to protect the content, since, as the standard says, modifying the content of such a pointer results in undefined behavior.
This has its roots in the fact that typically (C/C++) compilers group the strings used throughout the program in a single memory zone, and are allowed to use the same memory locations for instances of the same string used in unrelated parts of the program (to minimize executable size/memory footprint). If it was allowed to modify string literals you could affect with one change other, unrelated instances of the same literal, which obviously isn't a great idea.
In facts, with most modern compilers (on hardware that supports memory protection) the memory area of the string table is read-only, so if you attempt to modify a string literal your program crashes. Adding const to pointers that refer to string literals makes these mistakes immediately evident as compilation errors instead of crashes.
By the way, notice that the fact that a string literal can decay implicitly to a non-const char * is just a concession to backwards compatibility with pre-standard libraries (written when const wasn't part of the C language yet), as said above the standard always said that changing string literals is UB.
The idea behind the deprecation is to help the compiler catch errors that would otherwise cause crashes at runtime.
char *hello = "hello";
strcpy(hello, "world"); // Compiles but crashes
as opposed to
const char *hello = "hello";
strcpy(hello, "world"); // Does not compile
This is a relatively cheap way of catching an entire class of very nasty runtime errors, so deprecation of the conversion is very much in line with the general philosophy of C++ as "a better C".
In addition, your code segment 2 does not invalidate the fact that the content of the pointer is protected. It is the pointer itself that gets written over, not its content. There is a difference between const char *ptr and char * const ptr: the former protects the content; the later protects the pointer itself. The two can be combined to protect the pointer and its content as const char * const ptr.
"abc" is a static array that points to possibly immutable memory. In C, modifying the content of a string literal is undefined behavior (UB).
But C99 did not make "abc" an object of type const char [n]. In fact, this is quite the opposite, as to keep compatibility with C89 (and ANSI C), which specifies (§3.1.4/3):
A character string literal has static storage duration and type array of char, and is initialized with the given characters.
That is, the declaration
char* c = "12345";
is not deprecated in C, even up to C11.
From http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf, we can see the rationale in C99 of making the string literal modification UB, while keeping the type to be char [n]:
String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform ertain optimizations. However, string literals do not have the type array of const char in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the C89 Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see §J.5.5)
where C99 §J.5.5 is:
J.5.5 Writable string literals
String literals are modifiable (in which case, identical string literals should denote distinct objects) (6.4.5).
On the other hand, as your code is C++, this should actually be wrong in standard C++, because it requires (C++03 §2.13.4/1)
... An ordinary string literal has type “array of n const char” and static storage duration ...
and assigning a const char[n] to a char* shouldn't compile. The compiler warns about "deprecation", because existing implementation at that time allowed the conversion (because C allows it), so it went into Annex D: Compatibility features:
D.4 Implicit conversion from const strings
The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.
The syntax is wrong because there is not implicit conversion from char const * to char * .
The type of a string literal has been char const * for ever in C and C++. (Might be wrong about very old C.)
The change in the rules has nothing to do with the type of string literals but with allowed conversions between pointer types.
The conversion is a mistake because of a pointer-to-const-thing is that thing is immutable. A string literal, which is a value known to be constant at compile and link time, might be put in read only memory segments.
And where are literals in memory exactly? (see examples below)
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).
Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?
const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers
from pointer target type (on gcc), error on VC++2k10 */
char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to
modify const anyway, so why can I and with no errors? And where is the
literal stored for I have a "bus error"?
I have 'access violation writing' on VC++2010 */
const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d); /* displays 'b' (why doesn't it do the same
behavior as modifying a literal? It displays 'a' on VC++2010 */
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.
String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).
But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.
const char* a = "test";
The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file.
In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.
char* b = a;
Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.
char *c = "test"; // no compile errors
c[0] = 'p';
The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level
const char d = 'a';
*(char*)&d = 'b';
First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:
const int d = 55;
*(int*)&d = 56;
The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.
I'd like to add one more case:
char b[] = "test";
b[2] = 'o';
The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.
Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.
It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.
And where are literals in memory exactly? (see examples below)
Initialized data segment. On Linux it is either .data or .rodata.
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.
The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.
I have no warnings even with most of the compiler flags
Really? When I compile the following code snippet:
int main()
{
char* p = "some literal";
}
on g++ 4.5.0 even without any flags, I get the following warning:
warning: deprecated conversion from string constant to 'char*'
You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.
It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.