I'm getting back into c++ and have the hang of pointers and whatnot, however, I was hoping I could get some help understanding why this code segment gives a bus error.
char * str1 = "Hello World";
*str1 = '5';
ERROR: Bus error :(
And more generally, I am wondering how to change the value of a single character in a cstring. Because my understanding is that *str = '5' should change the value that str points to from 'H' to '5'. So if I were to print out str it would read: "5ello World".
In an attempt to understand I wrote this code snippet too, which works as expected;
char test2[] = "Hello World";
char *testpa2 = &test2[0];
*testpa2 = '5';
This gives the desired output. So then what is the difference between testpa2 and str1? Don't they both point to the start of a series of null-terminated characters?
When you say char *str = "Hello World"; you are making a pointer to a literal string which is not changeable. It should be required to assign the literal to a const char* instead, but for historical reasons this is not the case (oops).
When you say char str[] = "Hello World;" you are making an array which is initialized to (and sized by) a string known at compile time. This is OK to modify.
Not so simple. :-)
The first one creates a pointer to the given string literal, which is allowed to be placed in read-only memory.
The second one creates an array (on the stack, usually, and thus read-write) that is initialised to the contents of the given string literal.
In the first example you try to modify a string literal, this results in undefined behavior.
As per the language standard in 2.13.4.2
Whether all string literals are
distinct (that is, are stored in
nonoverlapping objects) is
implementation-defined. The effect of
attempting to modify a string literal
is undefined.
In your second example you used string-literal initialization, defined in 8.5.2.1
A char array (whether plain char,
signed char, or unsigned char) can be
initialized by a string- literal
(optionally enclosed in braces); a
wchar_t array can be initialized by a
wide string-literal (option- ally
enclosed in braces); successive
characters of the string-literal
initialize the members of the
array.
Related
I am learning C++. In the program shown here, as far as I know, str1 and str2 store the addresses of first characters of each of the relevant strings:
#include <iostream>
using namespace std;
int main()
{
char str1[]="hello";
char *str2="world";
cout<<str1<<endl;
cout<<str2<<endl;
}
However, str1is not giving any warnings, while with str2 I get this warning:
warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
char *str2="world";
What's different between these two declarations that causes the warning in the second case but not the first?
When you write
char str1[] = "hello";
you are saying "please make me an array of chars that holds the string "hello", and please choose the size of the array str1 to be the size of the string initializing it." This means that str1 ends up storing its own unique copy of the string "hello". The ultimate type of str1 is char[6] - five for hello and one for the null terminator.
When you write
char *str2 = "world";
you are saying "please make me a pointer of type char * that points to the string literal "world"." The string literal "world" has type const char[6] - it's an array of six characters (five for hello and one for the null terminator), and importantly those characters are const and can't be modified. Since you're pointing at that array with a char * pointer, you're losing the const modifier, which means that you now (unsafely) have a non-const pointer to a const bit of data.
The reason that things are different here is that in the first case, you are getting a copy of the string "hello", so the fact that your array isn't const isn't a problem. In the second case, you are not getting a copy of "hello" and are instead getting a pointer to it, and since you're getting a pointer to it there's a concern that modifying it could be a real problem.
Stated differently, in the first case, you're getting an honest-to-goodness array of six characters that have a copy of hello in them, so there's no problem if you then decide to go and mutate those characters. In the second case, you're getting a pointer to an array of six characters that you're not supposed to modify, but you're using a pointer that permits you to mutate things.
So why is it that "world" is a const char[6]? As an optimization on many systems, the compiler will only put one copy of "world" into the program and have all copies of the literal "world" point to the exact same string in memory. This is great, as long as you don't change the contents of that string. The C++ language enforces this by saying that those characters are const, so mutating them leads to undefined behavior. On some systems, that undefined behavior leads to things like "whoa, my string literal has the wrong value in it!," and in others it might just segfault.
The problem is that you are trying to convert a string literal (with type const char*) to char*.
How would one go to replace characters in a char*?
For example:
int main() {
char* hello = "hello";
int i;
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
cout << hello;
}
No output at all. Just pauses on me and says that the program isn't responding.
Expected output: aaaaa
The problem here is that you have a pointer to a string literal, and string literals in C++ are constant arrays of characters. Attempting to modify constant data leads to undefined behavior.
You can solve this by making hello an array:
char hello[] = "hello";
char* hello = "hello"; should be char hello[] = "hello";
The former is a string literal which you are not allowed to change. The latter is an array from which you can change any character in it.
Reason:
char* hello = "hello";
Actually this is a string literal, and linker stores this "hello" string on a separate memory section of the program called Read Only memory area (check the linker generated memory map file (possibly .map extension) to see the program memory map).
char* hello
hello is a pointer variable and it will be stored on the stack area of the program.
Now pointer variable hello keeps the address of the read only memory (base address of the string literals).
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
You are trying to modify Read Only memory, In such case it depends on the OS what exception it generates (In some cases you will find segmentation fault also).
Solution:
Define the array on stack(Local to the function) or data memory(Global).
char hello[] = "hello";
For above convention linker will map the string "hello" on the stack (Local to the function) or data memory(Global).
Recommendation
Use keyword const if using string literals to avoid accidental modification of Read only memory, By defining const compiler will throw a indication if any part of the code is trying to modify the read only area.
const char* hello = "hello";
Read below.
From the C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals. The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the
array elements have type char, and are initialized with the
individual bytes of the multibyte character sequence; for wide string
literals, the array elements have type wchar_t, and are initialized
with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
I'm completely new to the C++ language (pointers in particular, experience is mainly in PHP) and would love some explanation to the following (I've tried searching for answers).
How are both lines of code able to do exactly the same job in my program? The second line seems to go against everything I've learnt & understood so far about pointers.
char disk[3] = "D:";
char* disk = "D:";
How am I able to initialize a pointer to anything other than a memory address? Not only that, in the second line I'm not declaring the array properly either - but it's still working?
The usual way to initialize an array in C and C++ is:
int a[3] = { 0, 1, 2 };
Aside: And you can optionally leave out the array bound and have it deduced from the initializer list, or have a larger bound than there are initializers:
int aa[] = { 0, 1, 2 }; // another array of three ints
int aaa[5] = { 0, 1, 2 }; // equivalent to { 0, 1, 2, 0, 0}
For arrays of characters there is a special rule that allows an array to be initialized from a string literal, with each element of the array being initialized from the corresponding character in the string literal.
Your first example uses the string literal "D:" so each element of the array will be initialized to a character from that string, equivalent to:
char disk[3] = { 'D', ':', '\0' };
(The third character is the null terminator, which is implicitly present in all string literals).
Aside: Here too you can optionally leave out the array bound and have it deduced from the string literal, or have a larger bound than the string length:
char dd[] = "D:"; // another array of three chars
char ddd[5] = "D:"; // equivalent to { 'D', ':', '\0', '\0', '\0'}
Just like the aaa example above, the extra elements in ddd that don't have a corresponding character in the string will be zero-initialized.
Your second example works because the string literal "D:" will be output by the compiler and stored somewhere in the executable as an array of three chars. When the executable is run the segment that contains the array (and other constants) will be mapped into the process' address space. So your char* pointer is then initialized to point to the location of that array, wherever that happens to be. Conceptually it's similar to:
const char __some_array_created_by_the_compiler[3] = "D:";
const char* disk = __some_array_created_by_the_compiler;
For historical reasons (mostly that const didn't exist in the early days of C) it was legal to use a non-const char* to point to that array, even though the array is actually read-only, so C and the first C++ standard allow you to use a non-const char* pointer to point to a string literal, even though the array that it refers to is really const:
const char __some_array_created_by_the_compiler[3] = "D:";
char* disk = (char*)__some_array_created_by_the_compiler;
This means that despite appearances your two examples are not exactly the same, because this is only allowed for the first one:
disk[0] = 'C';
For the first example that is OK, it alters the first element of the array.
For the second example it might compile, but it results in undefined behaviour, because what it's actually doing is modifying the first element of the __some_array_created_by_the_compiler which is read-only. In practice what will probably happen is that the process will crash, because trying to write to a read-only page of memory will raise a segmentation fault.
It's important to understand that there are lots of things in C++ (and even more in C) which the compiler will happily compile, but which cause Very Bad Things to happen when the code is executed.
char disk[3] = "D:";
Is treated as
char disk[3] = {'D',':','\0'};
Where as in C++11 and above
char* disk = "D:";
Is an error as a string literal is of type const char[] and cannot be assigned to a char *. You can assign it to a const char * though.
String literals are actually read-only, zero-terminated arrays of characters, and using a string literal gives you a pointer to the first character in the array.
So in the second example
char* disk = "D:";
you initialize disk to point to the first character of an array of three characters.
Note in my first paragraph above that I said that string literals are read-only arrays, that means that having a plain char* pointing to this array could make you think that it's okay to modify this array when it's not (attempting to modify a string literal leads to undefined behavior). This is the reason that const char* is usually used:
const char* disk = "D:";
Since C++11 it's actually an error to not use a const char*, through most compilers still only warn about it instead of producing an error.
You are absolutely right to say that pointers can store only memory address. Then how is the second statement valid? Let me explain.
When you put a sequence of characters in double quotes, what happens behind the screens is that the string gets stored in a read only computer memory and the address of the location where the string is stored is returned. So at run-time, the expression is evaluated, the string evaluates to the memory address, which is a character pointer. It is this pointer that is assigned to your pointer variable.
So what is the difference between the two statements? The string in the second case is a constant, while the string declared by the first statement can be changed.
I started my adventure with C++ one week back. I have read a lot about C++.
I was experimenting with the following:
char * String1 = "abcdefgh";
I, then, tried to modify its value in the following way:
String1[2] = 'f';
This resulted in an UNHANDLED EXCEPTION.
But the following results in proper execution:
char String2[9]="abcdefgh";
String2[7]='s';
I tried to extract information about the binary generated using above code using DUMPBIN.
DUMPBIN is a Visual Studio Tool. I used the /ALL option to extract every information contained in the binary.
I could see two instances of "abcdefgh" in the RAWDATA section. And I understand why.
My questions are as follows:
1) Although both String1 and String2 are essentially pointers to two different instances of the same character sequence, why is the String1 manipulation not a legal one?
2) I know the compiler generates a SYMBOL TABLE for mapping variable names and their values. IS there any tool to visualize the SYMBOL TABLE in Windows OS?
3) If I have an array of integers instead of the character sequence, can it be found in the RAWDATA?
I could also see the following in RAWDATA:
Unknown Runtime Check Error.........
Stack memory around _alloca was corrupted.......
....A local variable was used before it was initialized.........
....Stack memory was corrupted..
........A cast to a smaller data type has caused a loss of data.
If this was intentional, you should mask the source of the cast with the appropriate bitmask.
How do these things get into the binary executable? What is the purpose of having these messages in the binary(which obviously is not readable)?
EDIT:
My question 1) has a word INSTANCES, which is used to mean the following:
The character sequence "abcdefgh" is derived from a set of non-capitalized ENGLISH ALPHABETS, i.e., {a,b,...,y,z}. This sequence is INSTANCIATED twice and stored at two memory locations, say A and B. String1, points to A(assumption) and String2 points to B. There is no conceptual mix-up in the question.
What I wanted to comprehend was the difference in the attributes of the memory locations A and B, i.e., why one of them was immutable.
Note: all of the code below refers to a scope within a function.
The code below initializes a writeable buffer string2 with data. The compiler generates initialization code to copy from the read-only compiler generated string to this buffer.
char string2[] = "abcdefgh";
The code below stores a pointer to a read-only, compiler-generated string in string1. The string's contents are in a read-only section of the executable image. That's why modifying it will fail.
char * string1 = "abcdefgh";
You can make it work by having string1 point to a writeable buffer. This can be achieved by copying the string:
char * string1 = strdup("abcdefgh");
....
free(string1); // don't forget to free the buffer!
char * String1 = "abcdefgh";
In C (and C++) is const, the compiler is allowed to store fixed const data however it likes, it may have a separate DATA segment, it might have completely const program store (in a Harvard architecture)
char String2[9]="abcdefgh";
Allocates a 9 element arrays of chars and just happens to initialise it with some string. You can do what you want with the array. Arrays of any other type would be stored in the same way.
The error messages for some runtime errors are stored in the program data segment(in the same way as your original char* string). Some of them like "this program needs windows" must obviously be in there rather than in the OS because DOS wouldn't know a program needed a later version of Windows. But I'm not sure why these particular runtime errors aren't created by the OS
You cannot modify a string literal. The type of a string literal is
char const[], and any attempt to modify one is undefined behavior.
And given a statement like:
char* s1 = "a litteral";
, the compiler really should generate a warning. The implicit
conversion to non-const here is deprecated, and was only introduced into
the language to avoid breaking existing code (dating from an epoch when
C didn't have const).
In the case:
char s2[] = "init";
, there isn't really a string literal. The "string literal" is in fact an
initialization specification, and unlike string literals, doesn't appear
anywhere in memory; it is used by the compiler to determine how s2
should be initialized, and is the exact the equivalent of:
char s2[] = { 'i', 'n', 'i', 't', '\0' };
(It is a bit more convenient to write.)
--
A short historical sidelight: early C didn't have const. The type of
a string literal was char[], and modifying it was legal. This lead
to some very horrible code:
char* f() { return "abcd"; }
/* ... */
f()[1] = 'x';
and the next time you called f, it returned "axcd". A litteral
which doesn't have the value which appears in the source listing is
not the way to readable code, and the C standards committee decided
that this was one feature it was better not to keep.
char string[] = "foo"
This allocates a char array, and initializes it with the values {'f', 'o', 'o', '\0'}. You get "your own" storage for the chars, and you can modify the array.
char strptr* = "foo"
This allocates a pointer, and sets the value of that pointer to the address of a char array which contains {'f', 'o', 'o', '\0'}. The pointer is yours to do with as you wish, but the char array is not. In fact, the type of the array is not char[], but const char[], and strptr really ought to be declared as const char* so that you do not mistakenly attempt to modify the const array.
In the first case, "foo" is an array initializer. In the second, "foo" is a string literal.
More specific details about exactly where the memory for each situation is located tend to be unspecified by the standard. However, generally speaking, char string[] = "foo" allocates a char array on the stack, char strptr* = "foo" allocates a char pointer on the stack and (statically) allocates a const char array in the data section of the executable.
1) As pointed in the c++ standard (2003) (http://www.iso.org/iso/catalogue_detail.htm?csnumber=38110)
1 A string literal is a sequence of characters surrounded by
double quotes, optionally beginning with the letter L, as in "..."
or L"...". A string literal that does not begin with L is an
ordinary string literal, also referred to as a narrow string
literal. An ordinary string literal has type "array of n const
char" and static storage duration (basic.stc), where n is the size
of the string as defined below, and is initialized with the given
characters. A string literal that begins with L, such as L"asdf", is
a wide string literal. A wide string literal has type "array of n
const wchar_t" and has static storage duration, where n is the size of
the string as defined below, and is initialized with the given charac-
ters.
2 Whether all string literals are distinct (that is, are stored
in nonoverlapping objects) is implementation-defined. The
effect of attempting to modify a string literal is undefined.
As stated above, it's not illegal, is undefined behavior, so, with VS you get an exception on windows, with g++ you will get a segmentation fault in linux (basically they look alike though)
2) You can use a Disassembly program and check for the data section of the exe file (check this wiki for more info about several exe file structures x86 Disassembly/Windows Executable Files)
3) Yes, it should be in the .data section of the exe file
Why do we need the *?
char* test = "testing";
From what I understood, we only apply * onto addresses.
This is a char:
char c = 't';
It can only hold one character!
This is a C-string:
char s[] = "test";
It can hold multiple characters. Another way to write the above is:
char s[] = {'t', 'e', 's', 't', 0};
The 0 at the end is called the NUL terminator. It denotes the end of a C-string.
A char* stores the starting memory location of a C-string.1 For example, we can use it to refer to the same array s that we defined above. We do this by setting our char* to the memory location of the first element of s:
char* p = &(s[0]);
The & operator gives us the memory location of s[0].
Here is a shorter way to write the above:
char* p = s;
Notice:
*(p + 0) == 't'
*(p + 1) == 'e'
*(p + 2) == 's'
*(p + 3) == 't'
*(p + 4) == 0 // NUL
Or, alternatively:
p[0] == 't'
p[1] == 'e'
p[2] == 's'
p[3] == 't'
p[4] == 0 // NUL
Another common usage of char* is to refer to the memory location of a string literal:
const char* myStringLiteral = "test";
Warning: This string literal should not be changed at runtime. We use const to warn the programmer (and compiler) not to modify myStringLiteral in the following illegal manner:
myStringLiteral[0] = 'b'; // Illegal! Do not do this for const char*!
This is different from the array s above, which we are allowed to modify. This is because the string literal "test" is automatically copied into the array at initialization phase. But with myStringLiteral, no such copying occurs. (Where would we copy to, anyways? There's no array to hold our data... just a lonely char*!)
1 Technical note: char* merely stores a memory location to things of type char. It can certainly refer to just a single char. However, it is much more common to use char* to refer to C-strings, which are NUL-terminated character sequences, as shown above.
The char type can only represent a single character. When you have a sequence of characters, they are piled next to each other in memory, and the location of the first character in that sequence is returned (assigned to test). Test is nothing more than a pointer to the memory location of the first character in "testing", saying that the type it points to is a char.
You can do one of two things:
char *test = "testing";
or:
char test[] = "testing";
Or, a few variations on those themes like:
char const *test = "testing";
I mention this primarily because it's the one you usually really want.
The bottom line, however, is that char x; will only define a single character. If you want a string of characters, you have to define an array of char or a pointer to char (which you'll initialize with a string literal, as above, more often than not).
There are real differences between the first two options though. char *test=... defines a pointer named test, which is initialized to point to a string literal. The string literal itself is allocated statically (typically right along with the code for your program), and you're not supposed to (attempt to) modify it -- thus the preference for char const *.
The char test[] = .. allocates an array. If it's a global, it's pretty similar to the previous except that it does not allocate a separate space for the pointer to the string literal -- rather, test becomes the name attached to the string literal itself.
If you do this as a local variable, test will still refer directly to the string literal - but since it's a local variable, it allocates "auto" storage (typically on the stack), which gets initialized (usually from a normal, statically allocated string literal) on every entry to the block/scope where it's defined.
The latter versions (with an array of char) can act deceptively similar to a pointer, because the name of an array will decay to the address of the beginning of the array anytime you pass it to a function. There are differences though. You can modify the array, but modifying a string literal gives undefined behavior. Conversely, you can change the pointer to point at some other chars, so something like:
char *test = "testing";
if (whatever)
test = "not testing any more";
...is perfectly fine, but trying to do the same with an array won't work (arrays aren't assignable).
The main thing people forgot to mention is that "testing" is an array of chars in memory, there's no such thing as primitive string type in c++. Therefore as with any other array, you can't reference it as if it is an element.
char* represents the address of the beginning of the contiguous block of memory of char's. You need it as you are not using a single char variable you are addressing a whole array of char's
When accessing this, functions will take the address of the first char and step through the memory. This is possible as arrays use contiguous memory (i.e. all of the memory is consecutive in memory).
Hope this clears things up! :)
Using a * says that this variable points to a location in memory. In this case, it is pointing to the location of the string "testing". With a char pointer, you are not limited to just single characters, because now you have more space available to you.
In C a array is represented by a pointer to the first element in it.