C++: Changing value of Pointer Strings - c++

There is something that has been bugging me for a while and I need an anwer for it,
char *p = "hello world";
p="wazzup";
p="Hey";
Here I declare an pointer to point to a string (or in other words I made a string by using a pointer)
I have had some strange results with this that i normally wouldnt have gotten if I used an char array string
cout <<p<< endl; //"Hey" Gets printer
cout <<p+8<< endl; // I kept adding numbers till "wazzup" got printed
cout <<p+29<< endl; // No matter how much I increment, I cant print "Hello World"
So my question is:
When I change the value that a char pointer is pointing to. Does it
overwrite the original Data like it would do with char array;
or it creates a new string right before it in the memory and points to it;
or does it add the new string at the begining of the old one(including null);
or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance

It does none of the options above. Changing the value of a pointer merely changes to the address in memory it points to. In each case of the assignments to p, it is set to point to the first character of a (different) string literal - which is stored in memory.
The behaviour of using a pointer that points beyond the end a string literal such as
cout <<p+8<< endl
is undefined. This is why using pointers is fraught with danger.
The behaviour you are seeing is implementation dependant: The compiler stores the strings literals adjacent in memory, so running off the end of one runs into another. Your program might equally have crashed when compiled with a different compiler.

By adding to the pointer, you are increasing the address value... so while printing it will print the value stored at that memory location and nothing else...
If you could print
"hello world"
"wazzup"
that will be a fluke :)

All three of these are string literal constants. They appear directly in your executable's binary, and each time you assign p, you point to these locations in memory. They are completely separate memory; reassigning p to another one does not modify any string data.

You are only making the pointer point to something else each time you assign to it. So there is no scope for data being overwritten anywhere. What happens under the hood is implementation dependent, so what you see is just by pure chance. When you do this:
cout <<p+8<< endl;
you are going beyond the bounds of the string literal, invoking undefined behaviour.

When I change the value that a char pointer is pointing to. Does it
-overwrite the original Data like it would do with char array.
No, the data part of your address space contains all the three strings "Hello World", "wazzup", and "Hey". When you change the pointer, you are just changing the value in p to the starting address of either of above strings. Changing the address a pointer is pointing to and changing the value a pointer is pointing to are two different things.
-or it creates a new string right before it in the memory and points to it.
The compiler creates the strings(character bytes) at the compile time and NOT at run-time.
-or does it create a new string in a new place in the memory and I was able to print "wazzup" only by chance
I think above answer covers this question.
-or does it add the new string at the begining of the old one.(including null)
It depends on the compiler specification.

The three strings are not located in same memory position. This three memory may be sequential or different location.If compiler allocation three memory then you find it using +/- some value. It totally depends on compiler. In c you can ready any memory location so you never get any error while +/- then pointer p.

Since they 3 are not same strings, they are located at different parts of memory. I think they could be separated by 64 byte aligning. try p+64 :)
Only same strings sit in the same location of memory and only if compiler supports it.
There is a probability "wazzup" at p+64 and "hey" at p+128(if you use VC++ 2010 express and you have pentium-m cpu and you use windows xp sp-3)
cout <<*(p+64)<< endl;
cout <<*(p+128)<< endl;

Related

Memory in a char[]

So I was working in c++ 11 and this occurred to me:
char s[100];
strcpy(s, "Andrei");
int n=strlen(Andrei); // (it would be 6)
I do this:
s[n]=' ';
s[n+9]='\0';
What happens with s[n+1], ... s[n+8]?
If I go
cout <<s;
The array will be displayed as normal.
But if I specify
cout <<s[n+2]
it will print odd characters
The array elements s[n+1] ... s[n+8] are uninitialized. They are said to have indeterminate value. Trying to output an indeterminate value causes undefined behaviour which means anything can happen.
So, you have a buffer of 100 chars. Due the fact there is no strict initialization policy in C++, you got a buffer of BIG SOMETHING. Content of that "big something" would differ between each app launch, may vary on different build modes (debug/release) etc. Indeed, querying that "something" is like trying to listen a radio on a random wave and hear a noise sequence in return (it could be different each time you do this).
Running your strcpy would initialize just first N chars, the rest of the buffer is still "something".

Why does an unassigned int have a value?

If i run the following code it shows a long number.
int i;
int *p;
p= &i;
cout<<*p;
Why does an unassigned int have a value? And what is that value?
The value of the pointer p is the address of the int i. You assigned it with the address-of & operator: p = &i. The int i itself is not initialized also called default initialized. When you dereference your pointer with *p you get the value of your uninitialized int i which could be anything.
The value of your int i is the uninitialized memory interpreted as int. Using uninitialized variables is undefined behaviour.
Also you would have the same behaviour without a pointer by simply doing:
int i;
cout << i;
Because this is what "undefined behavior" means in C++.
"Undefined behavior" means "anything is possible". This includes:
You getting some random value for the object. It can be always the same, or different every time you run the code.
The program crashes.
Your computer starts playing the latest Justin Bieber video, all by itself, with no way to stop it.
The universe, as you know it, comes to an end.
etc... That's what "undefined behavior" means.
Imagine you want to buy a land, where you intend to build a house. To buy a land, you contact the local land seller.
You need to tell him how much units of land you need. In return, he will tell you the location of the land.
Done - your land is ready for use. But did you notice something ? The land seller only told you the coordinates of the land. He didn't say anything about the land. On the land there could be already existing house. There could even be a Hotel, or an Airport. Who knows what is there? If you try to use land, without building your house first, you have no guarantee what will be there. It is your responsibility as a land owner, to build something on top of the land, and use it as appropriate.
C/C++ is the same as the above example. Asking for a int, is like asking for a land with size of 8 units. C/C++ will give you the land, telling you its coordinates. It won't tell you what the land contains. You're responsible for using the land to put a house on top of it. If you don't put a house, and try to enter "the house", you might end up in a Airport. Hope it's clearer now :).
Simply because the memory location where i is has some value (whatever value it is). As Sam pointed out, it is a good example of undefined (and unwanted) behavior.
Because variables can't be empty.
Every byte of computer memory always contains something.
Computer programs usually don't clean up memory when they done with it (for speed reasons), thus when you leave a variable uninitialized, it will have some random (more or less) value which was left in this place of memory by another program or by our own code.
Usually it is 0 or a value of some other variable that was recently destructed or some internal pointer.
The current contents of the memory location ( in the stack ) of the variable i.

c++ strings and pointers confusion

string * sptemp = (string *) 0x000353E0;
What does this code exactly want to say ?
I know that in the left side we define a string pointer but I couldn't understand the right part.
It means take a numeric value, convert it to a pointer with that value as the address it points to, and then use that value to initialise the variable sptemp.
If the memory at that address contains a valid string object, then you can use the pointer to access it. If not, trying to do so will give undefined behaviour.
string * sptemp = (string *) 0x000353E0;
What does this code exactly want to say ?
It says, treat the data located at address 0x000353E0 as though it holds a string and assign the address to the variable sptemp. The data can be accessed through the pointer variable sptemp after that.
These comments are mostly right, but not completely. We don't actually know that string is std::string here. It could be that string is a bit of memory-mapped hardware whose address on the OP's embedded SBC is defined by the hardware 0x000353E0. In that case, this is completely sensible, and what people do all the time. The pointer "string *sptemp" is set to point to the hardware interface.
But it's probably nonsense.

Does buffer overflow happen in C++ strings?

This is concerning strings in C++. I have not touched C/C++ for a very long time; infact I did programming in those languages only for the first year in my college, about 7 years ago.
In C to hold strings I had to create character arrays(whether static or dynamic, it is not of concern). So that would mean that I need to guess well in advance the size of the string that an array would contain. Well I applied the same approach in C++. I was aware that there was a std::string class but I never got around to use it.
My question is that since we never declare the size of an array/string in std::string class, does a buffer overflow occur when writing to it. I mean, in C, if the array’s size was 10 and I typed more than 10 characters on the console then that extra data would be writtein into some other object’s memory place, which is adjacent to the array. Can a similar thing happen in std::string when using the cin object.
Do I have to guess the size of the string before hand in C++ when using std::string?
Well! Thanks to you all. There is no one right answer on this page (a lot of different explanations provided), so I am not selecting any single one as such. I am satisfied with the first 5. Take Care!
Depending on the member(s) you are using to access the string object, yes. So, for example, if you use reference operator[](size_type pos) where pos > size(), yes, you would.
Assuming no bugs in the standard library implementation, no. A std::string always manages its own memory.
Unless, of course, you subvert the accessor methods that std::string provides, and do something like:
std::string str = "foo";
char *p = (char *)str.c_str();
strcpy(p, "blah");
You have no protection here, and are invoking undefined behaviour.
The std::string generally protects against buffer overflow, but there are still situations in which programming errors can lead to buffer overflows. While C++ generally throws an out_of_range exception when an operation references memory outside the bounds of the string, the subscript operator [] (which does not perform bounds checking) does not.
Another problem occurs when converting std::string objects to C-style strings. If you use string::c_str() to do the conversion, you get a properly null-terminated C-style string. However, if you use string::data(), which writes the string directly into an array (returning a pointer to the array), you get a buffer that is not null terminated. The only difference between c_str() and data() is that c_str() adds a trailing null byte.
Finally, many existing C++ programs and libraries have their own string classes. To use these libraries, you may have to use these string types or constantly convert back and forth. Such libraries are of varying quality when it comes to security. It is generally best to use the standard library (when possible) or to understand the semantics of the selected library. Generally speaking, libraries should be evaluated based on how easy or complex they are to use, the type of errors that can be made, how easy these errors are to make, and what the potential consequences may be.
refer https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/coding/295-BSI.html
In c the cause is explained as follow:
void function (char *str) {
char buffer[16];
strcpy (buffer, str);
}
int main () {
char *str = "I am greater than 16 bytes"; // length of str = 27 bytes
function (str);
}
This program is guaranteed to cause unexpected behavior, because a string (str) of 27 bytes has been copied to a location (buffer) that has been allocated for only 16 bytes. The extra bytes run past the buffer and overwrites the space allocated for the FP, return address and so on. This, in turn, corrupts the process stack. The function used to copy the string is strcpy, which completes no checking of bounds. Using strncpy would have prevented this corruption of the stack. However, this classic example shows that a buffer overflow can overwrite a function's return address, which in turn can alter the program's execution path. Recall that a function's return address is the address of the next instruction in memory, which is executed immediately after the function returns.
here is a good tutorial that can give your answer satisfactory.
In C++, the std::string class starts out with a minimum size (or you can specify a starting size). If that size is exceeded, std::string allocates more dynamic memory.
Assuming the library providing std::string is correctly written, you cannot cause a buffer overflow by adding characters to a std::string object.
Of course, bugs in the library are not impossible.
"Do buffer overflows occur in C++ code?"
To the extent that C programs are legal C++ code (they almost all are), and C programs have buffer overflows, C++ programs can have buffer overflows.
Being richer than C, I'm sure C++ can have buffer overflows in ways that C cannot :-}

Difference between Array initializations

Please see the following statements:
char a[5]="jgkl"; // let's call this Statement A
char *b="jhdfjnfnsfnnkjdf"; // let's call this Statement B , and yes i know this is not an Array
char c[5]={'j','g','k','l','\0'}; // let's call this Statement C
Now, is there any difference between Statements A and C?
I mean both should be on Stack dont they? Only b will be at Static location.
So wouldn't that make "jgkl" exist at the static location for the entire life of the program? Since it is supposed to be read-only/constant?
Please clarify.
No, because the characters "jgkl" from Statement A are used to initialize a, it does not create storage in the executable for a character string (other than the storage you created by declaring a). This declaration creates an array of characters in read-write memory which contain the bytes {'j','g','k','l','\0'}, but the string which was used to initialize it is otherwise not present in the executable result.
In Statement B, the string literal's address is used as an initializer. The variable char *b is a pointer stored in read-write memory. It points to the character string "jhdfjnfnsfnnkjdf". This string is present in your executable image in a segment often called ".sdata", meaning "static data." The string is usually stored in read-only memory, as allowed by the C standard.
That is one key difference between declaring an array of characters and a string constant: Even if you have a pointer to the string constant, you should not modify the contents.
Attempting to modify the string constant is "undefined behavior" according to ANSI C standard section 6.5.7 on initialization.
If a[] is static then so is c[] - the two are equivalent, and neither is a string literal. The two could equally well be declared so that they were on the stack - it depends where and how they are declared, not the syntax used to specify their contents.
the value "jgkl" may never be loaded into working memory. Before main is ever called, a function (often called cinit) is run. One of the things this function does is initialize static and file-scope variables. On the DSP compiler I use, the initial values are stored in a table which is part of the program image. The table's format is unrelated to the format of the variables that are being initialized. The initializer table remains part of the program image and is never copied to RAM. Simply put, there is nowhere in memory that I can reliably access "jgkl".
Small strings like a might not even be stored in that table at all. The optimizer may reduce that to the equivalent (pseudoinstruction) store reg const(152<<24|167<<16|153<<8|154)
I suspect most compilers are similar.
A and C are exactly equivalent. The syntax used in A is an abbreviation for the syntax in C.
Each of the objects named a and c is an array of bytes of length 5, stored at a certain location in memory which is fixed during execution. The program can change the element bytes at any time. It is the compiler's responsibility to decide how to initialize the objects. The compiler might generate something similar to a[0] = 'j'; a[1] = 'g'; ..., or something similar to memcpy(a, static_read_only_initialization_data[1729], 5), or whatever it chooses. The array is on the (conceptual) stack if the declaration occurs in a function, or in global writable memory if the declaration occurs at file scope.
The object named b is a pointer to a byte. Its initial value is a pointer to string literal memory, which is read-only on many implementations that have read-only memory, but not all. The value of b could change (for example to point to a different string, or to NULL). The program is not allowed to change the contents of jhdfjnfnsfnnkjdf", though as usual in C the implementation may not enforce this.
C-Literals always are read-only.
a) allocates 5 bytes memory and store the
CONTENT from literal incl. '\0' in it
b) allocates sizeof(size_t) bytes
memory and store the literal-address in it
c) allocates 5 bytes memory and
store the 5 character-values in it