strange behavior of std::string assign,clear and operator[] - c++

I am observing some strange behavior of string operation.
Ex :
int main()
{
std::string name("ABCDEFGHIJ");
std::cout << "Hello, " << name << "!\n";
name.clear();
std::cout << "Hello, " << name << "!\n";
name.assign("ABCDEF",6);
std::cout << "Hello, " << name << "!\n";
std::cout << "Hello, " << name[8] << "!\n";
}
Output:
Hello, ABCDEFGHIJ!
Hello, !
Hello, ABCDEF!
Hello, I!
string::clear is actually not clearing because I am able to access the data even after clear. As per documentation when we are accessing something out of bound the result is undefined. But here I am getting the same result every time.
Can somebody explains how it works at memory level when we call clear or opeartor[].

Welcome to C++'s amazing attraction called "undefined behavior".
When name contains a six-character string, "ABCDEF", name[8] attempts to access a nonexistent member of the string, which is undefined behavior.
Which means that the result of this operation are completely meaningless.
The C++ standard does not define the result of accessing a nonexistent member character of the string; hence the undefined behavior. The potential results of this operation can be:
Some previous value that was in the string, at the given position.
Some garbage, random character.
Your program crashes.
Anything else.
A result that's different every time you execute the program, selected from options 1 through 4.

name.assign("ABCDEF",6);
Now the string has length 6. So you may legally only access elements 0 through 5.
std::cout << "Hello, " << name[8] << "!\n";
Therefore this is Undefined Behaviour. The compiler is free to do whatever the hell it pleases. Not just with the statement, but with the whole program, even the preceding lines!
At this time, it returned the character that used to be at that position earlier. It could have returned anything else, it could have crashed, it could have skipped that statement altogether, it could have skipped the assignment and many other funny things (up to and including making daemons fly out of your nose!).
And I am saying that because all that behaviour (except the daemons) can be actually observed in the wild in various circumstances.

As others said, accessing an std::string outside it's logical boundaries (i.e. [0, size()], notice that size() is included) is undefined behavior, so the compiler can make anything happen.
Now, the particular flavor of UB you are seeing is nothing particularly unexpected.
clear() just zeroes the logical length of the string, but the memory that it used is retained (it's actually required by the standard, and quite some code would work way slower without this behavior).
Given that there's no good reason to waste time in zeroing out the old data, by accessing the string out of bounds you are seeing what was at that index previously.
This may change if you e.g. call the shrink_to_fit() method after clear(), which asks to the string to free all the extra memory it's keeping.

I'd like to add to the other answers that you can use std::string::at instead of using the operator[].
std::string::at does boundary checking and will throw a std::out_of_range when you try to access an element that is out of range.

[I ran your code through a debugger. Take note of the capacity of the string. It is still 15. "assign" did not change the capacity. SO you won't get "garbage" value as everyone is saying. You're getting the exact same data which is stored in the same location. As stated the string is just a pointer to a memory address. It will go over x bytes to access the element. name[8] is a constant value it will go to the exact same memory location.
Here is a picture of the string in debugger

Related

Buffer overflow - The changes of variables

void go()
{
//{1}
char buffer[2];
gets(buffer);
//{2}
cout << allow;
}
I tried to run the procedure above in 2 cases:
-1st: I declare "int allow;' at position 1
-2nd: I declare "int allow;' at position 2
In both cases, when i tried to enter the string "123" (without the quotation marks), the allow's value was 51.
However, as I read about the memory layout, only in the first case, the position of "allow" in the stack is before buffer, which means that when the string is longer than the buffer, the value of "allow" is changed.
Then, I tried to declare "char sth[10]" in both position. This time, only when I declared sth in first position, the value of it was changed.
Can anyone explain what happened?
Since changing allow via overflow is Undefined Behavior, the compiler might even not have a variable allow at all and change your code to cout << 0 instead when compiling with optimization. This is not a valid way to check for overflow, regardless of where you put allow.
To emphasize: All changes of allow you observe are the result of UB. There are no guarantees on this in the standard what so ever. You can go ahead and speculate on why you see this output today, on you system, with this very toolchain, but the outcome might change to anything (like your program moving your lawn or stealing the crown jewels) for any reason.
Indeed, there is no way to use gets safely. This is why it is removed in both the current C++ and C standard.
You can use std::string and std::getline instead:
string buffer;
std::getline(std::cin, buffer);

Why does that works in c++ char arrays [duplicate]

This question already has answers here:
No out of bounds error
(7 answers)
Closed 8 years ago.
I have the next code:
char arr[6] = "Hello";
strcpy(arr, "Hello mellow");
cout << strlen(arr) << ", " << arr << endl; // 12, Hello mellow
char arr1[] = "Hello";
strcpy(arr1, "Hello mellow");
cout << strlen(arr1) << ", " << arr1 << endl; // 12, Hello mellow
So, why does that work? Why doesn't it get limited somehow? Whatever I put instead of "Hello mellow", it works and prints it out.
It works because strcpy doesn't check that the destination array is at least as large as the source one. Your code invokes undefined behavior as you call strcpy with invalid arguments, and because the behavior is undefined, anything can happen; In your case, the memory is silently overwritten. Your program could crash as well.
In general, C and C++ don't check for boundaries (unlike other higher-level languages: Java, PHP, Python, Javascript, etc).
This means that if you try to strcopy, say, a 13-bytes string such as "Hello mellow" to a character array, it won't check whether or not the given array has been instantiated with enough memory to contain the string. It will just copy the given string, character by character, to the given memory pointer.
What happens here, is that you write at some places in memory you are not supposed to access; once in a while, this program might just crash, with no other indication than: segmentation fault.
If you happen to try this...
char arr1[8];
char arr2[8];
strcpy(arr1,"Hello mellow");
printf("%s\n", arr1);
printf("%s\n", arr2);
...it is very likely (but not 100% sure, see comments) you would get the following output:
Hello mellow
llow
Why? Because the second char[] would have been overwritten by the data you tried to put in the first one, without it having enough reserved space for it.
See: http://en.wikipedia.org/wiki/Stack_buffer_overflow
Native arrays in C/C++ are very low-level abstractions that are treated as pointers to memory locations in many use cases. So, when passing arr to strcpy, all strcpy knows is the address of arr[0]. As a result, there is no possibility of bounds checking. This is a very good thing for performance reasons. It is up to the programmer to ensure that he/she uses these low-level constructs safely, for instance by using strncpy and giving an appropriate bound, or using std::vector and checking for bounds explicitly or using std::vector::at to check bounds when accessing a location.
I think that's because there's no check runtime nor compile time and if you're Lucky, you won't get a segmentation fault;)

Pointer increment and decrement

I was solving a question my teacher gave me and hit a little snag.
I am supposed to give the output of the following code:(It's written in Turbo C++)
#include<iostream.h>
void main()
{
char *p = "School";
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
}
The output the program gives is:
T,chool
ijool,j,i
I got the part where the pointer itself increments and then increments the value which the pointer points to. But i don't get the part where the string prints out ijool
Can someone help me out?
The program you showed is non-standard and ill-formed (and should not compile).
"Small" problems:
The proper header for input/output streams in C++ is <iostream>, not <iostream.h>
main() returns an int, not a void.
cout and endl cannot be used without a using namespace std; at the beginning of the file, or better: use std::cout and std::endl.
"Core" problems:
char* p = "School"; is a pointer to string litteral. This conversion is valid in C++03 and deprecated in C++11. Aside from that, normally string litterals are read only, and attempts to modify them often result in segfaults (and modifying a string litteral is undefined behvior by the standard). So, you have undefined behavior everytime you use p, because you modify what it points to, which is the string litteral.
More subtle (and the practical explanation): you are modifying p several times in the line std::cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<std::endl;. It is undefined behavior. The order used for the operations on p is not defined, here it seems the compiler starts from the right. You can see sequence points, sequence before/after for a better explanation.
You might be interested with the live code here, which is more like what you seemed to expect from your program.
Let's assume you correct:
the header to <iostream> - there is no iostream.h header
your uses of cout and endl with std::cout and std::endl respectively
the return type of main to int
Okay,
char *p = "School";
The string literal "School" is of type "array of 7 const char." The conversion to char* was deprecated in C++03. In C++11, this is invalid.
c=++(*(p++));
Here we hit undefined behaviour. As I said before, the chars in a string literal are const. You simply can't modify them. The prefix ++ here will attempt to modify the S character in the string literal.
So from this point onwards, there's no use making conjectures about what should happen. You have undefined behaviour. Anything can happen.
Even if the preceding lines were legal, this line is also undefined behavior, which means that you cannot accurately predict what the output will be:
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
Notice how it modifies the value of p multiple times on that line (really between sequence points)? That's not allowed. At best you can say "on this compiler with this run-time library and this environment at this moment of execution I observed the following behavior", but because it is undefined behavior you can't count on it to do the same thing every time you run the program, or even if the same code is encountered multiple times within the same run of the program.
There are at least three problems with this code (and maybe more; I'm not a C++ expert).
The first problem is that string constants like should not be modified as they can be placed in read-only parts of the program memory that the OS maps directly to the exe file on disk (the OS may share them between several running instances of that same program for example, or avoid those parts of memory needing to be written to the swap file when RAM is low, as it knows it can get the original from the exe). The example crashes on my compiler, for example. To modify the string you should allocate a modifiable duplicate of the string, such as with strdup.
The second problem is it's using cout and endl from the std namespace without declaring that. You should prefix their accesses with std:: or add a using namespace std; declaration.
The third problem is that the order in which the operations on the second cout line happen is undefined behavior, leading to the apparently mysterious change of the string between the time it was displayed at the end of the first cout line and the next line.
Since this code is not intended to do anything in particular, there are different, valid ways you could fix it. This will probably run:
#include <iostream>
#include <string.h>
#include <stdlib.h>
using namespace std;
int main()
{
char *string = strdup("School");
char *p = string;
char c;
c=++(*(p++));
cout<<c<<","<<p<<endl;
cout<<p<<","<<++(*(p--))<<","<<++(*(p++))<<endl;
free(string);
}
(On my compiler this outputs: T,chool, diool,i,d.)
It still has undefined behavior though. To fix that, rework the second cout line as follows:
cout << p << ",";
cout << ++(*(p--)) << ",";
cout << ++(*(p++)) << endl;
That should give T,chool, chool,d,U (assuming a character set that has A to Z in order).
p++ moves the position of p from "School" to "chool". Before that, since it is p++, not ++p, it increments the value of the char. Now c = "T" from "S"
When you output p, you output the remainder of p, which we identified before as "chool".
Since it is best to learn from trial and error, run this code with a debugger. That is a great tool which will follow you forever. That will help for the second set of cout statements. If you need help with gdb or VS debugger, we can walk through it.

use of std::ends() and freeze(false)

I am looking over some legacy code and there is a fair amount of stringstream usage. The code is generating messages generally from various types ok so far. Apart from the fact that it is in some cases doing the following:
std::ostringstream f1;
f1 << sDirectory << mFileName << sFileExtension << '\0';
and in others doing (Just illustration)
std::ostringstream f1;
f1 << sDirectory << mFileName << sFileExtension << std::ends;
I believe These calls are because further on it accesses f1.str().c_str() and needs to null terminate it.
Is there any difference in these calls ? I see from http://en.cppreference.com/w/cpp/io/manip/ends that std::ends doesn't flush, is std::ends different across different platforms (Linux/Windows/Mac)? Should I prefer one over the other?
Further to that I read that there should be a call to freeze(false) on the stringstream later in the scope (after str() use) to allow the buffer to be deallocated (http://en.cppreference.com/w/cpp/io/ostrstream/freeze). Again (possibly I misread or misunderstood) but there is no call to freeze(false) so does that indicate that every stream above is leaking?
N.B. FYI This is Visual Studio 2005/Windows 7 but I don't know if that has any baring.
Apologies if I'm being dense...
std::ends is defined as having the following effect:
Inserts a null character into the output sequence: calls os.put(charT()).
When charT is char, it is value initialized to have the value 0, which is equivalent to the character literal \0. So when charT is char, which it usually is, the two lines of code are exactly the same.
However, using std::ends will work well even when the character type of your stream is not char.

Confusing std::string::c_str() behavior in VS2010

I'm sure I've done something wrong, but for the life of me I can't figure out what! Please consider the following code:
cerr<<el.getText()<<endl;
cerr<<el.getText().c_str()<<endl;
cerr<<"---"<<endl;
const char *value = el.getText().c_str();
cerr<<"\""<<value<<"\""<<endl;
field.cdata = el.getText().c_str();
cerr<<"\""<<field.cdata<<"\""<<endl;
el is an XML element and getText returns a std::string. As expected, el.getText() and el.getText().c_str() print the same value. However, value is set to "" - that is, the empty string - when it assigned the result of c_str(). This code had been written to set field.cdata=value, and so was clearing it out. After changing it to the supposedly-identical expression value is set from, it works fine and the final line prints the expected value.
Since el is on the stack, I thought I might have been clobbering it - but even after value is set, the underlying value in el is still correct.
My next thought was that there was some weird compiler-specific issue with assigning things to const pointers, so I wrote the following:
std::string thing = "test";
std::cout << thing << std::endl;
std::cout << thing.c_str() << std::endl;
const char* value = thing.c_str();
std::cout << value << std::endl;
As expected, I get 'test' three times.
So now I have no clue what is going on. It would seem obvious that there is something strange going on in my program that's not happening in the sample, but I don't know what it is and I'm out of ideas about how to keep looking. Can somebody enlighten me, or at least point me in the right direction?
I assume that el.getText() is returning a temporary string object. When that object is destroyed the pointer returned by c_str() is no longer valid (keep in mind that that are other ways the pointer returned by c_str() can be invalidated, too).
The temporary object will be destroyed at the end of the full expression it's created in (which is generally at the semi-colon in your example above).
You may be able to solve your problem with something like the following:
const char *value = strdup(el.getText().c_str());
which creates a copy of the string as a raw char array in dynamically allocated memory. You then become responsible for calling free() on that pointer at some point when that data is no longer needed.