Replacing characters in a char*

Replacing characters in a char* - c++

How would one go to replace characters in a char*?
For example:
int main() {
char* hello = "hello";
int i;
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
cout << hello;
}
No output at all. Just pauses on me and says that the program isn't responding.
Expected output: aaaaa

The problem here is that you have a pointer to a string literal, and string literals in C++ are constant arrays of characters. Attempting to modify constant data leads to undefined behavior.
You can solve this by making hello an array:
char hello[] = "hello";

char* hello = "hello"; should be char hello[] = "hello";
The former is a string literal which you are not allowed to change. The latter is an array from which you can change any character in it.

Reason:
char* hello = "hello";
Actually this is a string literal, and linker stores this "hello" string on a separate memory section of the program called Read Only memory area (check the linker generated memory map file (possibly .map extension) to see the program memory map).
char* hello
hello is a pointer variable and it will be stored on the stack area of the program.
Now pointer variable hello keeps the address of the read only memory (base address of the string literals).
for (i = 0; i < 5; i++) {
hello[i] = 'a';
}
You are trying to modify Read Only memory, In such case it depends on the OS what exception it generates (In some cases you will find segmentation fault also).
Solution:
Define the array on stack(Local to the function) or data memory(Global).
char hello[] = "hello";
For above convention linker will map the string "hello" on the stack (Local to the function) or data memory(Global).
Recommendation
Use keyword const if using string literals to avoid accidental modification of Read only memory, By defining const compiler will throw a indication if any part of the code is trying to modify the read only area.
const char* hello = "hello";
Read below.
From the C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to
each multibyte character sequence that results from a string literal
or literals. The multibyte character sequence is then used to
initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the
array elements have type char, and are initialized with the
individual bytes of the multibyte character sequence; for wide string
literals, the array elements have type wchar_t, and are initialized
with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

Related

Is it possible for separately initialized string variables to overlap?

If I initialize several string(character array) variables in the following ways:
const char* myString1 = "string content 1";
const char* myString2 = "string content 2";
Since const char* is simply a pointer a specific char object, it does not contain any size or range information of the character array it is pointing to.
So, is it possible for two string literals to overlap each other? (The newly allocated overlap the old one)
By overlap, I mean the following behaviour;
// Continue from the code block above
std::cout << myString1 << std::endl;
std::cout << myString2 << std::endl;
It outputs
string costring content 2
string content 2
So the start of myString2 is somewhere in the middle of myString1. Because const char* does not "protect"("possess") a range of memory locations but only that one it points to, I do not see how C++ can prevent other string literals from "landing" on the memory locations of the older ones.
How does C++/compiler avoid such problem?
If I change const char* to const char[], is it still the same?

Yes, string literals are allowed to overlap in general. From lex.string#9
... Whether all string-literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
So it's up to the compiler to make a decision as to whether any string literals overlap in memory. You can write a program to check whether the string literals overlap, but since it's unspecified whether this happens, you may get different results every time you run the program.

A string is required to end with a null character having a value of 0, and can't have such a character in the middle. So the only case where this is even possible is when two strings are equal from the start of one to the end of both. That is not the case in the example you gave, so those two particular strings would never overlap.
Edit: sorry, I didn't mean to mislead anybody. It's actually easy to put a null character in the middle of a string with \0. But most string handling functions, particularly those in the standard library, will treat that as the end of a string - so your strings will get truncated. Not very practical. Because of that the compiler won't try to construct such a string unless you explicitly ask it to.

The compiler knows the size of each string, because it can "see" it in your code.
Additionally, they are not allocated the same way, that you would allocate them at run-time. Instead, if the strings are constant and defined globally, they are most likely located in the .text section of the object file, not on the heap.
And since the compiler knows the size of a constant string at compile-time, it can simply put its value in the free space of the .text section. The specifics depend on the compiler you use, but be assured the people who wrote are smart enough to avoid this issue.
If you define these strings inside some function instead, the compiler can choose between the first option and allocating space on the stack.
As for the const char[], most compilers will treat it the same way as const char*.

Two string literals will not likely overlap unless they are the same. In that case though the pointers will be pointing to the same thing. (This isn't guaranteed by the standard though, but I believe any modern compiler should make this happen.)
const char *a = "Hello there."
const char *b = "Hello there."
cout << (a == b);
// prints "1" which means they point to the same thing
The const char * can share a string though.
const char *a = "Hello there.";
const char *b = a + 6;
cout << a;
// prints "Hello there."
cout << b;
// prints "there."
I think to answer your second question an explanation of c-style strings is useful.
A const char * is just a pointer to a string of characters. The const means that the characters themselves are immutable. (They are stored as part of the executable itself and you wouldn't want your program to change itself like this. You can use the strings command on unix to see all the strings in an executable easily i.e. strings a.out. You will see many more strings than what you coded as many exist as part of the standard library other required things for an executable.)
So how does it know to just print the string and then stop at the end? Well a c-style string is required to end with a null byte (\0). The complier implicitly puts it there when you declare a string. So "string content 1" is actually "string content 1\0".
const char *a = "Hello\0 there.";
cout << a;
// prints "Hello"
For the most part const char *a and const char a[] are the same.
// These are valid and equivalent
const char *a = "Hello";
const char b[] = "there."
// This is valid
const char *c = b + 3; // *c = "re."
// This, however, is not valid
const char d[] = b + 3;

Why C++ variable doesn't need defining properly when it's a pointer?

I'm completely new to the C++ language (pointers in particular, experience is mainly in PHP) and would love some explanation to the following (I've tried searching for answers).
How are both lines of code able to do exactly the same job in my program? The second line seems to go against everything I've learnt & understood so far about pointers.
char disk[3] = "D:";
char* disk = "D:";
How am I able to initialize a pointer to anything other than a memory address? Not only that, in the second line I'm not declaring the array properly either - but it's still working?

The usual way to initialize an array in C and C++ is:
int a[3] = { 0, 1, 2 };
Aside: And you can optionally leave out the array bound and have it deduced from the initializer list, or have a larger bound than there are initializers:
int aa[] = { 0, 1, 2 }; // another array of three ints
int aaa[5] = { 0, 1, 2 }; // equivalent to { 0, 1, 2, 0, 0}
For arrays of characters there is a special rule that allows an array to be initialized from a string literal, with each element of the array being initialized from the corresponding character in the string literal.
Your first example uses the string literal "D:" so each element of the array will be initialized to a character from that string, equivalent to:
char disk[3] = { 'D', ':', '\0' };
(The third character is the null terminator, which is implicitly present in all string literals).
Aside: Here too you can optionally leave out the array bound and have it deduced from the string literal, or have a larger bound than the string length:
char dd[] = "D:"; // another array of three chars
char ddd[5] = "D:"; // equivalent to { 'D', ':', '\0', '\0', '\0'}
Just like the aaa example above, the extra elements in ddd that don't have a corresponding character in the string will be zero-initialized.
Your second example works because the string literal "D:" will be output by the compiler and stored somewhere in the executable as an array of three chars. When the executable is run the segment that contains the array (and other constants) will be mapped into the process' address space. So your char* pointer is then initialized to point to the location of that array, wherever that happens to be. Conceptually it's similar to:
const char __some_array_created_by_the_compiler[3] = "D:";
const char* disk = __some_array_created_by_the_compiler;
For historical reasons (mostly that const didn't exist in the early days of C) it was legal to use a non-const char* to point to that array, even though the array is actually read-only, so C and the first C++ standard allow you to use a non-const char* pointer to point to a string literal, even though the array that it refers to is really const:
const char __some_array_created_by_the_compiler[3] = "D:";
char* disk = (char*)__some_array_created_by_the_compiler;
This means that despite appearances your two examples are not exactly the same, because this is only allowed for the first one:
disk[0] = 'C';
For the first example that is OK, it alters the first element of the array.
For the second example it might compile, but it results in undefined behaviour, because what it's actually doing is modifying the first element of the __some_array_created_by_the_compiler which is read-only. In practice what will probably happen is that the process will crash, because trying to write to a read-only page of memory will raise a segmentation fault.
It's important to understand that there are lots of things in C++ (and even more in C) which the compiler will happily compile, but which cause Very Bad Things to happen when the code is executed.

char disk[3] = "D:";
Is treated as
char disk[3] = {'D',':','\0'};
Where as in C++11 and above
char* disk = "D:";
Is an error as a string literal is of type const char[] and cannot be assigned to a char *. You can assign it to a const char * though.

String literals are actually read-only, zero-terminated arrays of characters, and using a string literal gives you a pointer to the first character in the array.
So in the second example
char* disk = "D:";
you initialize disk to point to the first character of an array of three characters.
Note in my first paragraph above that I said that string literals are read-only arrays, that means that having a plain char* pointing to this array could make you think that it's okay to modify this array when it's not (attempting to modify a string literal leads to undefined behavior). This is the reason that const char* is usually used:
const char* disk = "D:";
Since C++11 it's actually an error to not use a const char*, through most compilers still only warn about it instead of producing an error.

You are absolutely right to say that pointers can store only memory address. Then how is the second statement valid? Let me explain.
When you put a sequence of characters in double quotes, what happens behind the screens is that the string gets stored in a read only computer memory and the address of the location where the string is stored is returned. So at run-time, the expression is evaluated, the string evaluates to the memory address, which is a character pointer. It is this pointer that is assigned to your pointer variable.
So what is the difference between the two statements? The string in the second case is a constant, while the string declared by the first statement can be changed.

How are variables of various data types stored in C++(native) binary?

I started my adventure with C++ one week back. I have read a lot about C++.
I was experimenting with the following:
char * String1 = "abcdefgh";
I, then, tried to modify its value in the following way:
String1[2] = 'f';
This resulted in an UNHANDLED EXCEPTION.
But the following results in proper execution:
char String2[9]="abcdefgh";
String2[7]='s';
I tried to extract information about the binary generated using above code using DUMPBIN.
DUMPBIN is a Visual Studio Tool. I used the /ALL option to extract every information contained in the binary.
I could see two instances of "abcdefgh" in the RAWDATA section. And I understand why.
My questions are as follows:
1) Although both String1 and String2 are essentially pointers to two different instances of the same character sequence, why is the String1 manipulation not a legal one?
2) I know the compiler generates a SYMBOL TABLE for mapping variable names and their values. IS there any tool to visualize the SYMBOL TABLE in Windows OS?
3) If I have an array of integers instead of the character sequence, can it be found in the RAWDATA?
I could also see the following in RAWDATA:
Unknown Runtime Check Error.........
Stack memory around _alloca was corrupted.......
....A local variable was used before it was initialized.........
....Stack memory was corrupted..
........A cast to a smaller data type has caused a loss of data.
If this was intentional, you should mask the source of the cast with the appropriate bitmask.
How do these things get into the binary executable? What is the purpose of having these messages in the binary(which obviously is not readable)?
EDIT:
My question 1) has a word INSTANCES, which is used to mean the following:
The character sequence "abcdefgh" is derived from a set of non-capitalized ENGLISH ALPHABETS, i.e., {a,b,...,y,z}. This sequence is INSTANCIATED twice and stored at two memory locations, say A and B. String1, points to A(assumption) and String2 points to B. There is no conceptual mix-up in the question.
What I wanted to comprehend was the difference in the attributes of the memory locations A and B, i.e., why one of them was immutable.

Note: all of the code below refers to a scope within a function.
The code below initializes a writeable buffer string2 with data. The compiler generates initialization code to copy from the read-only compiler generated string to this buffer.
char string2[] = "abcdefgh";
The code below stores a pointer to a read-only, compiler-generated string in string1. The string's contents are in a read-only section of the executable image. That's why modifying it will fail.
char * string1 = "abcdefgh";
You can make it work by having string1 point to a writeable buffer. This can be achieved by copying the string:
char * string1 = strdup("abcdefgh");
....
free(string1); // don't forget to free the buffer!

char * String1 = "abcdefgh";
In C (and C++) is const, the compiler is allowed to store fixed const data however it likes, it may have a separate DATA segment, it might have completely const program store (in a Harvard architecture)
char String2[9]="abcdefgh";
Allocates a 9 element arrays of chars and just happens to initialise it with some string. You can do what you want with the array. Arrays of any other type would be stored in the same way.
The error messages for some runtime errors are stored in the program data segment(in the same way as your original char* string). Some of them like "this program needs windows" must obviously be in there rather than in the OS because DOS wouldn't know a program needed a later version of Windows. But I'm not sure why these particular runtime errors aren't created by the OS

You cannot modify a string literal. The type of a string literal is
char const[], and any attempt to modify one is undefined behavior.
And given a statement like:
char* s1 = "a litteral";
, the compiler really should generate a warning. The implicit
conversion to non-const here is deprecated, and was only introduced into
the language to avoid breaking existing code (dating from an epoch when
C didn't have const).
In the case:
char s2[] = "init";
, there isn't really a string literal. The "string literal" is in fact an
initialization specification, and unlike string literals, doesn't appear
anywhere in memory; it is used by the compiler to determine how s2
should be initialized, and is the exact the equivalent of:
char s2[] = { 'i', 'n', 'i', 't', '\0' };
(It is a bit more convenient to write.)
--
A short historical sidelight: early C didn't have const. The type of
a string literal was char[], and modifying it was legal. This lead
to some very horrible code:
char* f() { return "abcd"; }
/* ... */
f()[1] = 'x';
and the next time you called f, it returned "axcd". A litteral
which doesn't have the value which appears in the source listing is
not the way to readable code, and the C standards committee decided
that this was one feature it was better not to keep.

char string[] = "foo"
This allocates a char array, and initializes it with the values {'f', 'o', 'o', '\0'}. You get "your own" storage for the chars, and you can modify the array.
char strptr* = "foo"
This allocates a pointer, and sets the value of that pointer to the address of a char array which contains {'f', 'o', 'o', '\0'}. The pointer is yours to do with as you wish, but the char array is not. In fact, the type of the array is not char[], but const char[], and strptr really ought to be declared as const char* so that you do not mistakenly attempt to modify the const array.
In the first case, "foo" is an array initializer. In the second, "foo" is a string literal.
More specific details about exactly where the memory for each situation is located tend to be unspecified by the standard. However, generally speaking, char string[] = "foo" allocates a char array on the stack, char strptr* = "foo" allocates a char pointer on the stack and (statically) allocates a const char array in the data section of the executable.

1) As pointed in the c++ standard (2003) (http://www.iso.org/iso/catalogue_detail.htm?csnumber=38110)
1 A string literal is a sequence of characters surrounded by
double quotes, optionally beginning with the letter L, as in "..."
or L"...". A string literal that does not begin with L is an
ordinary string literal, also referred to as a narrow string
literal. An ordinary string literal has type "array of n const
char" and static storage duration (basic.stc), where n is the size
of the string as defined below, and is initialized with the given
characters. A string literal that begins with L, such as L"asdf", is
a wide string literal. A wide string literal has type "array of n
const wchar_t" and has static storage duration, where n is the size of
the string as defined below, and is initialized with the given charac-
ters.
2 Whether all string literals are distinct (that is, are stored
in nonoverlapping objects) is implementation-defined. The
effect of attempting to modify a string literal is undefined.
As stated above, it's not illegal, is undefined behavior, so, with VS you get an exception on windows, with g++ you will get a segmentation fault in linux (basically they look alike though)
2) You can use a Disassembly program and check for the data section of the exe file (check this wiki for more info about several exe file structures x86 Disassembly/Windows Executable Files)
3) Yes, it should be in the .data section of the exe file

Why can't I write to a string literal while I can write to a string object?

If i define something like below,
char *s1 = "Hello";
why I can't do something like below,
*s1 = 'w'; // gives segmentation fault ...why???
What if I do something like below,
string s1 = "hello";
Can I do something like below,
*s1 = 'w';

Because "Hello" creates a const char[]. This decays to a const char* not a char*. In C++ string literals are read-only. You've created a pointer to such a literal and are trying to write to it.
But when you do
string s1 = "hello";
You copy the const char* "hello" into s1. The difference being in the first example s1 points to read-only "hello" and in the second example read-only "hello" is copied into non-const s1, allowing you to access the elements in the copied string to do what you wish with them.
If you want to do the same with a char* you need to allocate space for char data and copy hello into it
char hello[] = "hello"; // creates a char array big enough to hold "hello"
hello[0] = 'w'; // writes to the 0th char in the array

string literals are usually allocated in read-only data segment.

Because Hello resides in read only memory. Your signature should actually be
const char* s1 = "Hello";
If you want a mutable buffer then declare s1 as a char[]. std::string overloads operator [], so you can index into it, i.e., s1[index] = 'w'.

Time to confuse matters:
char s0[] = "Hello";
s0[0] = 'w';
This is perfectly valid! Of course, this doesn't answer the original question so here we go: string literals are created in read-only memory. That is, their type is char const[n] where n is the size of the string (including the terminating null character, i.e. n == 6 for the string literal "Hello". But why, oh, why can this type be used to initialize a char const*? The answer is simply backward compatibility, respectively compatibility to [old] C code: by the time const made it into the language, lots of places already initialized char* with string literals. Any decent compiler should warn about this abuse, however.

need help changing single character in char*

I'm getting back into c++ and have the hang of pointers and whatnot, however, I was hoping I could get some help understanding why this code segment gives a bus error.
char * str1 = "Hello World";
*str1 = '5';
ERROR: Bus error :(
And more generally, I am wondering how to change the value of a single character in a cstring. Because my understanding is that *str = '5' should change the value that str points to from 'H' to '5'. So if I were to print out str it would read: "5ello World".
In an attempt to understand I wrote this code snippet too, which works as expected;
char test2[] = "Hello World";
char *testpa2 = &test2[0];
*testpa2 = '5';
This gives the desired output. So then what is the difference between testpa2 and str1? Don't they both point to the start of a series of null-terminated characters?

When you say char *str = "Hello World"; you are making a pointer to a literal string which is not changeable. It should be required to assign the literal to a const char* instead, but for historical reasons this is not the case (oops).
When you say char str[] = "Hello World;" you are making an array which is initialized to (and sized by) a string known at compile time. This is OK to modify.

Not so simple. :-)
The first one creates a pointer to the given string literal, which is allowed to be placed in read-only memory.
The second one creates an array (on the stack, usually, and thus read-write) that is initialised to the contents of the given string literal.

In the first example you try to modify a string literal, this results in undefined behavior.
As per the language standard in 2.13.4.2
Whether all string literals are
distinct (that is, are stored in
nonoverlapping objects) is
implementation-defined. The effect of
attempting to modify a string literal
is undefined.
In your second example you used string-literal initialization, defined in 8.5.2.1
A char array (whether plain char,
signed char, or unsigned char) can be
initialized by a string- literal
(optionally enclosed in braces); a
wchar_t array can be initialized by a
wide string-literal (option- ally
enclosed in braces); successive
characters of the string-literal
initialize the members of the
array.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js