This question already has answers here:
Modifying String Literal [duplicate]
(4 answers)
Closed 9 years ago.
I am getting a segmentation fault while running the following code :-
char *p ="Hello";
*p = 'M';
what I intended was to replace the first character of the string "Hello" with 'M'. But I'm getting segmentation fault. What could be the reason?
It's undefined behaviour. For compatibility with old C code, C++ compilers have let you point a non-const pointer at a string literal (e.g. your "Hello"), but you can not write through them portably.
It's best to use:
const char* p = "Hello"; // if you really need a pointer, probably so you
// can move it within the text, point it at other
// text, set it to a NULL sentinel after use...
const char[] hello = "Hello"; // if you're really only interested in the text
Any string literal in C++ and C (for example "Hello" in your code) is of type const char [6] and can implicitly be assigned to any const char * value:
const char * str="Hello";
indicating that it resides in memory marked as read-only by the operating system (you should have gotten a compiler warning). Therefore an exception will be thrown when you try to change that memory location.
The reason why the compiler puts this in read-only memory is because you may be using another identical (or even similar) string literal "Hello" in a different part of your code. By marking the memory location of the string literal as read-only, the compiler only needs to store the string literal once in memory.
Note also, that the C++ standard does not require the compiler putting the string literal into read-only memory, it just says that modifying a string literal is undefined behaviour. In practice however, a string literal is stored in read-only memory on any modern operating system or compiler.
The compiler is putting the string for "Hello" into a read-only memory segment and giving you a pointer to it. The fact that you're allowed to assign the pointer to a char* is done for backwards compatibility. C++ states that it's undefined behaviour.
If you want to alter the string then declare it like this :
char p[]="Hello";
Related
What happen when constant string assigned to constant character pointer(or character pointer)? ex:
const char* p="String";
how and where the compiler take this array .. heap memory ?
and what different from it and :
char* p="String";
thanks.
What happen when constant string assigned to constant character pointer(or character pointer)?
Nothing happens to the const string itself: a pointer to it is assigned to p, that's all.
how and where the compiler take this array .. heap memory?
It does not take it anywhere. String's data remains where it was, which is a compiler-specific thing.
and what different from it and : char* p="String";
The compiler is going to reject a program with the assignment of a literal to non-const, or warn you of a deprecated conversion, depending on the C++ version and/or compiler settings.
If you try to modify p[...]'s content using the const declaration, the compiler is going to stop you. If you try doing the same without const, the program may compile, bit it would cause undefined behavior at runtime.
The string literal "String" is a static array of const char somewhere in your program, probably placed into a read-only part of the address space when the executable is set up by your OS.
When you assign const char *p = "String", then p is initialized with a pointer to that array of const char. So *p is 'S' and p[1] is 't', etc.
When you assign char *p = "String", then your compiler should reject that (perhaps you have insufficient diagnostic level set?). If you tell the compiler to accept it regardless, then you have a pointer to (modifiable) char pointing at the string literal. If you subsequently attempt to write through this pointer, you'll get no compiler error, and instead you are likely to see one of two problems runtime:
(If the compiler/linker has placed the string literal into read-only memory) a signal is raised indicating memory access violation (SIGSEGV on Unix-like systems).
(If the string literal is in writeable memory) other uses of the same string literal get modified, because the compiler is permitted to point them all at the same storage.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.
struct st{
int a;
char *ptr;
}obj;
main()
{
obj.a=10;
obj.ptr="Hello World"; // (1) memory allocation?
printf("%d,%s",obj.a,obj.ptr);
}
ptr is declared in struct. When the assignment of Hello world occurs, memory is not allotted and yet this program works fine and gives output properly. Shouldn't it fail/crash when assignment done at marker (1)?
"Hello World" is a string literal residing in a read-only memory section (.rodata) of your program. You point to this section then print the contents. The program behavior is 100% well-defined and should not crash.
It is however good practice to always declare pointers to string literals as const char*, because you are not allowed to modify string literals.
It is perfectly valid.
At compile time (minus compiler optimizations), they are placed in the text/rodata segment of the code. Not sure if you are familiar with the layout of an executable in memory (also known as the runtime environment), but you have the Text, Data, BSS, Heap and Stack.
like
obj.ptr="Hello World";
will place Hello World in the read-only parts of the memory and making obj.ptr a pointer to that, making any writing operation on this memory illegal.
It has no name and has static storage duration (meaning that it lives for the entire life of the program); and a variable of type pointer-to-char, called obj.ptr, which is initialised with the location of the first character in that unnamed, read-only array.
At runtime, the char pointer is allocated on the stack and is first set to point to the area in memory where the string Hello World is.
First, the answer differs somewhat depending on the language (C
or C++), and in the case of C++, the version of the standard.
In both cases, however, "Hello World" is a string literal: in
C, it has typechar [12], and in C++, typechar const [12]`,
and the array has static lifetime, so exists for the lifetime of
the program.
In C, when you assign it to a char*, you have the standard
array to pointer conversion; in C++ pre-C++11, you have
a deprecated char const[] to char* conversion, which is only
valid if the char const[] is a string literal—the
compiler should warn; in C++11, the conversion is illegal, and
your program shouldn't compile (but I'm willing to bet that it
will for many years hence).
my compiler (g++ 4.7.2) throws a warning:
warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
If you don't get this warning, you can try to pass -Wall to your compiler (imo always a good idea)
When you're assigning a string literal to a char pointer you're not assigned memory !
All it does is searching a part of the memory for the string, if it is found - it points to the start of it - if it is not found it creates it. Either way - no memory is assigned and the pointer turns into a constant (read-only) - just try and see you can't change it as it is a constant in memory - Sometimes it's good if you don't want to change that string and sometimes it's not - depends on what you want to do :)
And where are literals in memory exactly? (see examples below)
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).
Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?
const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers
from pointer target type (on gcc), error on VC++2k10 */
char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to
modify const anyway, so why can I and with no errors? And where is the
literal stored for I have a "bus error"?
I have 'access violation writing' on VC++2010 */
const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d); /* displays 'b' (why doesn't it do the same
behavior as modifying a literal? It displays 'a' on VC++2010 */
The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.
Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.
I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.
String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).
But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.
const char* a = "test";
The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file.
In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.
char* b = a;
Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.
char *c = "test"; // no compile errors
c[0] = 'p';
The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level
const char d = 'a';
*(char*)&d = 'b';
First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:
const int d = 55;
*(int*)&d = 56;
The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.
I'd like to add one more case:
char b[] = "test";
b[2] = 'o';
The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.
Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.
It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.
And where are literals in memory exactly? (see examples below)
Initialized data segment. On Linux it is either .data or .rodata.
I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.
Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.
The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.
I have no warnings even with most of the compiler flags
Really? When I compile the following code snippet:
int main()
{
char* p = "some literal";
}
on g++ 4.5.0 even without any flags, I get the following warning:
warning: deprecated conversion from string constant to 'char*'
You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.
It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.
I have few doubts about string literals in c++.
char *strPtr ="Hello" ;
char strArray[] ="Hello";
Now strPtr and strArray are considered to be string literals.
As per my understanding string literals are stored in read only memory so we cannot modify their values.
We cannot do
strPtr[2] ='a';
and strArray[2]='a';
Both the above statements should be illegal.
compiler should throw errors in both cases.
Compiler keeps string literals in read only memory , so if we try to modify them compiler throws errors.
Also const data is also considered as readonly.
Is it that both string literals and const data are treated same way ?
Can I remove constantness using const_cast from string literal can change its value?
Where exactly do string literals are stored ? (in data section of program)
Now strPtr and strArray are considered to be string literals.
No, they aren't. String literals are the things you see in your code. For example, the "Hello". strPtr is a pointer to the literal (which is now compiled in the executable). Note that it should be const char *; you cannot legally remove the const per the C standard and expect defined behavior when using it. strArray is an array containing a copy of the literal (compiled in the execuable).
Both the above statements should be illegal. compiler should throw errors in both cases.
No, it shouldn't. The two statements are completely legal. Due to circumstance, the first one is undefined. It would be an error if they were pointers to const chars, though.
As far as I know, string literals may be defined the same way as other literals and constants. However, there are differences:
// These copy from ROM to RAM at run-time:
char myString[] = "hello";
const int myInt = 42;
float myFloats[] = { 3.1, 4.1, 5.9 };
// These copy a pointer to some data in ROM at run-time:
const char *myString2 = "hello";
const float *myFloats2 = { 3.1, 4.1, 5.9 };
char *myString3 = "hello"; // Legal, but...
myString3[0] = 'j'; // Undefined behavior! (Most likely segfaults.)
My use of ROM and RAM here are general. If the platform is only RAM (e.g. most Nintendo DS programs) then const data may be in RAM. Writes are still undefined, though. The location of const data shouldn't matter for a normal C++ programmer.
char *strPtr ="Hello" ;
Defines strPtr a pointer to char pointing to a string literal "Hello" -- the effective type of this pointer is const char *. No modification allowed through strPtr to the pointee (invokes UB if you try to do so). This is a backward compatibility feature for older C code. This convention is deprecated in C++0x. See Annex C:
Change: String literals made const
The type of a string literal is changed from “array of char” to “array of const char.” [...]
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to modify its argument.
Effect on original feature: Change to semantics of well-defined feature. Difficulty of converting: Simple syntactic transformation, because string literals can be converted to char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially modifiable memory are probably rare.
char strArray[] ="Hello";
The declared type of strPtr is -- it is an array of characters of unspecified size containing the string Hello including the null terminator i.e. 6 characters. However, the initialization makes it a complete type and it's type is array of 6 characters. Modification via strPtr is okay.
Where exactly do string literals are stored ?
Implementation defined.
The older C and C++ compilers were purely based on low level coding where higher standards of data protection were not available, and they can not even be enforced, typically in C and C++ you can write anything you want..
You can even write a code to access and modify your const pointers as well, if you know how to play with the addresses.
Although C++ does enforce some compile level protection, but there is no protection on runtime. You can certainly access your own stack, and use its values to manipulate any data that came in const pointer as well.
That is the reason C# was invented where little higher level standards are enforced because whatever you access is reference, it is a fixed structure governing all rules of data protection and it has hidden pointer which can not be accessed and nor modified.
The major difference is, C++ can only give you compile time protection, but C# will give you protection even at runtime.