Effects of modifying std::string using op[] beyond its size? - c++

I'm bit puzzled by how modifying a std::string beyond its size is handled? In an example I tried, it allowed me to modify the string beyond its size using op[] (and I'm aware that standard doesn't stop you from doing it). However, when I print the string using cout it prints the original string but when I print whats returned by cstr (), it prints the modified version. How does it keep track of both sizes (3 & 5)?.
#include <string>
#include <iostream>
using namespace std;
int main(void) {
std::string a = "abc";
cout << "str before : " << a << endl;
const char * charPtr = a.c_str ();
cout << "c_str before : " << charPtr << endl;
cout << "str size / capacity : " << a.size () << ", " << a.capacity () << endl;
a[3] = 'd';
a[4] = 'e';
cout << "str after : " << a << endl;
const char * charPtr2 = a.c_str ();
cout << "c_str after : " << charPtr2 << endl;
cout << "str size / capacity : " << a.size () << ", " << a.capacity () << endl;
return 0;
}
output :
str before : abc
c_str before : abc
str size / capacity : 3, 3
str after : abc
c_str after : abcde
str size / capacity : 3, 3

Although you already got a correct comment saying the behaviour is undefined, there is something worthy of an actual answer too.
A C++ string object can contain any sequence of characters you like. A C-style string is terminated by the first '\0'. Consequently, a C++ string object must store the size somewhere other than by searching for the '\0': it may contain embedded '\0' characters.
#include <string>
#include <iostream>
int main() {
std::string s = "abc";
s += '\0';
s += "def";
std::cout << s << std::endl;
std::cout << s.c_str() << std::endl;
}
Running this, and piping the output through cat -v to make control characters visible, I see:
abc^#def
abc
This explains what you're seeing: you're overwriting the '\0' terminator, but you're not overwriting the size, which is stored separately.
As pointed out by kec, you might have seen garbage except you were lucky enough to have an additional zero byte after your extra characters.

Related

Is it possible to have memory problems that don’t crash a program?

I wrote a text cipher program. It seems to works on text strings a few characters long but does not work on a longer ones. It gets the input text by reading from a text file. On longer text strings, it still runs without crashing, but it doesn’t seem to work properly.
Below I have isolated the code that performs that text scrambling. In case it is useful, I am running this in a virtual machine running Ubuntu 19.04. When running the code, enter in auto when prompted. I removed the rest of code so it wasn't too long.
#include <iostream>
#include <string>
#include <sstream>
#include <random>
#include <cmath>
#include <cctype>
#include <chrono>
#include <fstream>
#include <new>
bool run_cypher(char (&a)[27],char (&b)[27],char (&c)[11],char (&aa)[27],char (&bb)[27],char (&cc)[11]) {
//lowercase cypher, uppercase cypher, number cypher, lowercase original sequence, uppercase original sequence, number original sequence
std::ifstream out_buffer("text.txt",std::ios::in);
std::ofstream file_buffer("text_out.txt",std::ios::out);
//out_buffer.open();
out_buffer.seekg(0,out_buffer.end);
std::cout << "size of text: " << out_buffer.tellg() << std::endl;//debug
const int size = out_buffer.tellg();
std::cout << "size: " << size << std::endl;//debug
out_buffer.seekg(0,out_buffer.beg);
char *out_array = new char[size + 1];
std::cout << "size of out array: " << sizeof(out_array) << std::endl;//debug
for (int u = 0;u <= size;u = u + 1) {
out_array[u] = 0;
}
out_buffer.read(out_array,size);
out_buffer.close();
char original[size + 1];//debug
for (int bn = 0;bn <= size;bn = bn + 1) {//debug
original[bn] = out_array[bn];//debug
}//debug
for (int y = 0;y <= size - 1;y = y + 1) {
std::cout << "- - - - - - - -" << std::endl;
std::cout << "out_array[" << y << "]: " << out_array[y] << std::endl;//debug
int match;
int case_n; //0 = lowercase, 1 = uppercase
if (isalpha(out_array[y])) {
if (islower(out_array[y])) {
//std::cout << "out_array[" << y << "]: " << out_array[y] << std::endl;//debug
//int match;
for (int ab = 0;ab <= size - 1;ab = ab + 1) {
if (out_array[y] == aa[ab]) {
match = ab;
case_n = 0;
std::cout << "matched letter: " << aa[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
std::cout << "case_n: " << case_n << std::endl;//debug
}
}
}
if (isupper(out_array[y])) {
for (int cv = 0;cv <= size - 1;cv = cv + 1) {
if (out_array[y] == bb[cv]) {
case_n = 1;
match = cv;
std::cout << "matched letter: " << bb[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
std::cout << "case_n: " << case_n << std::endl;//debug
}
}
}
if (case_n == 0) {
out_array[y] = a[match];
std::cout << "replacement letter: " << a[match] << " | new character: " << out_array[y] << std::endl;//debug
}
if (case_n == 1) {
std::cout << "replacement letter: " << b[match] << " | new character: " << out_array[y] << std::endl;//debug
out_array[y] = b[match];
}
}
if (isdigit(out_array[y])) {
for (int o = 0;o <= size - 1;o = o + 1) {
if (out_array[y] == cc[o]) {
match = o;
std::cout << "matched letter: " << cc[match] << std::endl;//debug
std::cout << "letter index: " << match << std::endl;//debug
}
}
out_array[y] = c[match];
std::cout << "replacement number: " << c[match] << " | new character: " << out_array[y] << std::endl;//debug
}
std::cout << "- - - - - - - -" << std::endl;
}
std::cout << "original text: " << "\n" << original << "\n" << std::endl;
std::cout << "encrypted text: " << "\n" << out_array << std::endl;
delete[] out_array;
return 0;
}
int main() {
const int alpha_size = 27;
const int num_size = 11;
char l_a_set[] = "abcdefghijklmnopqrstuvwxyz";
char cap_a_set[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char n_a_set[] = "0123456789";
std::cout << "sizeof alpha_set: " << std::endl;//debug
char lower[alpha_size] = "mnbvcxzasdfghjklpoiuytrewq";
char upper[alpha_size] = "POIUYTREWQASDFGHJKLMNBVCXZ";
char num[num_size] = "9876543210";
int p_run; //control variable. 1 == running, 0 == not running
int b[alpha_size]; //array with values expressed as index numbers
std::string mode;
int m_set = 1;
while (m_set == 1) {
std::cout << "Enter 'auto' for automatic cypher generation." << std::endl;
std::cout << "Enter 'manual' to manually enter in a cypher. " << std::endl;
std::cin >> mode;
std::cin.ignore(1);
std::cin.clear();
if (mode == "auto") {
p_run = 2;
m_set = 0;
}
if (mode == "manual") {
p_run = 3;
m_set = 0;
}
}
if (p_run == 2) { //automatic mode
std::cout <<"lower cypher: " << lower << "\n" << "upper cypher: " << upper << "\n" << "number cypher: " << num << std::endl;//debug
run_cypher(lower,upper,num,l_a_set,cap_a_set,n_a_set);
return 0;//debug
}
while (p_run == 3) {//manual mode
return 0;//debug
}
return 0;
}
For example, using an array containing “mnbvcxzasdfghjklpoiuytrewq” as the cipher for lower case letters, I get “mnbv” if the input is “abcd”. This is correct.
If the input is “a long word”, I get “m gggz zzzv” as the output when it should be “m gkjz rkov”. Sort of correct but still wrong. If I use “this is a very very long sentence that will result in the program failing” as the input, I get "uas” as the output, which is completely wrong. The program still runs but it fails to function as intended. So as you can see, it does work, but not on any text strings that are remotely long. Is this a memory problem or did I make horrible mistake somewhere?
For your specific code, you should run it through a memory checking tool such as valgrind, or compile with an address sanitizer.
Here are some examples of memory problems that most likely won't crash your program:
Forgetting to delete a small object, which is allocated only once in the program. A memory leak can remain undetected for decades, if it does not make the program run out of memory.
Reading from allocated uninitialized memory. May still crash if the system allocates objects lazily at the first write.
Writing out of bounds slightly after an object that sits on heap, whose size is sizeof(obj) % 8 != 0. This is so, since heap allocation is usually done in multiples of 8 or 16. You can read about it at answers of this SO question.
Dereferencing a nullptr does not crash on some systems. For example AIX used to put zeros at and near address 0x0. Newer AIX might still do it.
On many systems without memory management, address zero is either a regular memory address, or a memory mapped register. This memory can be accessed without crashing.
On any system I have tried (POSIX based), it was possible to allocate valid memory at address zero through memory mapping. Doing so can even make writing through nullptr work without crashing.
This is only a partial list.
Note: these memory problems are undefined behavior. This means that even if the program does not crash in debug mode, the compiler might assume wrong things during optimization. If the compiler assumes wrong things, it might create an optimized code that crashes after optimization.
For example, most compilers will optimize this:
int a = *p; // implies that p != nullptr
if (p)
boom(p);
Into this:
int a = *p;
boom(p);
If a system allows dereferencing nullptr, then this code might crash after optimization. It will not crash due to the dereferencing, but because the optimization did something the programmer did not foresee.

Why does string size not change if I add an additional character in it?

In the below program, when I add one more character to string, its size still remains the same (as evident from str1.size() function). Why is that?
#include <iostream>
#include <cstring>
using std::cout;
using std::endl;
int main() {
std::string str1 = "hello";
cout << "std::string str1 = \"hello\""<< endl;
cout << "string is " << str1 << " with length " << str1.size() << endl;
str1[5] = 'a';
cout << "string is " << str1 << " with length " << str1.size() << endl;
for (int i = 0 ; i < 7; i++) {
cout << "str["<<i<<"] = " << str1[i] << " (int)(str[i])" << (int)str1[i] << endl;
}
}
Output
std::string str1 = "hello"
string is hello with length 5
string is hello with length 5 //expected 6
str[0] = h (int)(str[i])104
str[1] = e (int)(str[i])101
str[2] = l (int)(str[i])108
str[3] = l (int)(str[i])108
str[4] = o (int)(str[i])111
str[5] = a (int)(str[i])97
str[6] = (int)(str[i])0
Operaton str1[5] = 'a'; does not "add" something to a string; it sets the value at a particular position, and the position must be in the range 0..(length()-1); Otherwise, the behaviour is undefined.
To append something to a string, use
str1 += "a";
or
str1.push_back('a');
Note that an std::string - in contrast to plain "C"-style strings - maintains the length in a separate property (and does not calculate it purely relying on a string terminating character '\0').

What's the difference between returning a const object reference (getter), and just the string?

I was going through the c++ website tutorials as a nice compliment to a college course I'm taking this semester (beginner). While learning about copy constructors and destructors, I came across this section of code:
// destructors
#include <iostream>
#include <string>
using namespace std;
class Example4 {
string* ptr;
public:
// constructors:
Example4() : ptr(new string) {}
Example4 (const string& str) : ptr(new string(str)) {}
// destructor:
~Example4 () {delete ptr;}
// access content:
const string& content() const {return *ptr;}
};
int main () {
Example4 foo;
Example4 bar ("Example");
cout << "bar's content: " << bar.content() << '\n';
return 0;
}
Now, I understand the destructor part, but the getter for the string member confused me. Why return a reference (alias) to the object (string in this case)?
// access content:
const string& content() const {return *ptr;}
What is the difference between that, and just returning the string?
string content() const {
return *ptr;
}
Is returning a const alias more efficient? Are you returning just the address of the string, or the string itself? What about when just returning the string, are you returning the entire string? Thanks.
Returning a string would be undesirable for two reasons:
It means an unnecessary copy of the string is performed, which is bad for performance
It also means that someone might attempt to modify the returned string, thinking that they modify the actual member of the class - const reference would not allow this, and trigger a compilation error.
the getter for the string member confused me. Why return a
reference (alias) to the object (string in this case)?
const string& content() const {return *ptr;}
What is the difference between that [return a reference],
and just returning the string?
string content() const { return *ptr;}
And you might ask if there is a difference between that
and returning only the pointer
const string* content() const { return ptr;}
I find no advantage of one over the other(s).
Well, maybe consider the scenario where the string contains 26 million chars, you probably want to avoid copying that.
But there is another issue (or maybe 2) you should be aware of, if only to evaluate what you have learned here.
On Lubuntu 18.04, using g++ (Ubuntu 7.3.0-27), a string s, with no data,
std::string s;
cout << sizeof(s) << " " << s.size() << endl;
Reports the numbers "32 0".
std::string s ("01234567890123456789");
cout << sizeof(s) << " " << s.size() << endl;
Reports values "32 20"
{
std::string s;
for (int i=0; i<1000000; i++)
{
for (char j='A'; j<='Z'; j++)
s.push_back(j);
}
cout << " " << sizeof(s) << " " << s.size() << endl;
}
This reports the values "32 26000000"
1 million alphabets
s is still only 32 bytes
From this, you may conclude a) that an instance of 'string' occupies 32 bytes regardless of data. b) because all the data resides elsewhere c) so some of the 32 bytes in a std::string instance is a pointer to where in dynamic memory the chars resides.
Hmmm.
If the obj instance is only 32 bytes, then you might ask why does Example4 use a pointer to place this SMALL object (the string instance) into the dynamic memory ... using 8 bytes to find 32, and then needing a second reference (of some pointer inside of the string instance) to reach the char's of Example4 string.
In the same way, a std::vector is 24 bytes (regardless of how many elements, and regardless of how big the elements). std::vector takes care of memory management, so that you don't have to.
Perhaps this lesson is meant to help you discover and evaluate what is in dynamic memory, and what is in automatic memory, to improve your choices.
The key idea, is that STL library containers handle dynamic memory for you, to greatly simplify your effort.
Or, perhaps the professor wants you to know more about the tools you are using. The standard containers, in some ways, isolate you from how this stuff works. Perhaps this assignment is to get a glimpse into what std::string does.
// here is some "g++ -std=c++17" code to single step through, illustrating several of the ideas
#include <iostream>
using std::cout, std::endl;
#include <sstream>
using std::stringstream;
#include <iomanip>
using std::setfill, std::setw;
#include <string>
using std::string;
#include <cstring>
using std::strlen;
class Example4
{
string* ptr;
public:
Example4() : ptr(new string) {}
Example4 (const string& str) : ptr(new string(str)) {}
~Example4 () {delete ptr;}
// access content:
const string& content() const {return *ptr;}
const string* contentP() const {return ptr;}
string show(string lbl)
{
stringstream ss;
ss << "\n " << lbl
<< " . 5 4 3 2 1"
<< "\n . '09876543210987654321098765432109876543210987654321'"
<< "\n " << "*ptr : '" << *ptr << "'"
<< "\n " << "(*ptr).size() : " << (*ptr).size()
<< "\n " << " ptr->size() : " << ptr->size()
<< "\n " << "strlen((*ptr).c_str()) : " << strlen((*ptr).c_str())
<< "\n " << "strlen(ptr->c_str()) : " << strlen(ptr->c_str())
<< "\n\n " << "sizeof(*ptr) : " << sizeof(*ptr)
<< " # 0x" << ptr << ',' // where ptr points to
<< "\n " << "sizeof (ptr) : " << sizeof(ptr)
<< "\n\n";
return ss.str();
}
};
class T996_t
{
public:
int operator()() { return exec(); }
private: // methods
int exec()
{
Example4 e4("Now is the time to answer all questions01234567890");
cout << "\n " << e4.show("Example4")
<< "\n '" << e4.content() << "'"
<< "\n '" << *e4.contentP() << "'\n\n"
<< endl;
{
std::string s;
cout << " " << sizeof(s) << " " << s.size() << endl;
}
{
std::string s("01234567890123456789");
cout << " " << sizeof(s) << " " << s.size() << endl;
}
{
std::string s;
for (int i=0; i<1000000; i++)
{
for (char j='A'; j<='Z'; j++)
s.push_back(j);
}
cout << " " << sizeof(s) << " " << s.size() << endl;
}
return 0;
}
}; // class T996_t
int main(int, char**) { return T996_t()(); }
This code compiles and runs on my Lubuntu. The compile command built by my make file starts with:
g++ -std=c++17 -m64 -ggdb

C++ string pointer doesn't have the same memory address as the string it points to

I wrote a simple program to see how C++ handles pointers to string objects (new to OOP), and I was suprised to see that string* as which was assigned the memory address of string a, didn't store a value equivalent to &a. Also, the console didn't print the value to *as. Could this be an error on my end or the system, or am missing something fundamental here?
#include <iostream>
#include <string>
using std::cout;
using std::cin;
using std::endl;
using std::string;
string a = "asdf";
string* as = &a;
string* as_holder = &a;
int main()
{
cout << "a = " << a << "\t" << "&a = " << &a << " *as = " << *as << endl
<< "as = " << as << endl
<< "++as = " << ++as << endl
<< "*as = " << *as << endl << endl;
return 0;
}
output:
a = asdf &a = 011ff68C *as =
as = 011FF6A8
++as = 011FF6A8
*as =
In my test of the valid portion of your program (the first two lines of cout), the printout showed the same address:
a = asdf &a = 0x8049c90 *as = asdf
as = 0x8049c90
(link to a demo)
Lines three and four, however, amount to undefined behavior: once you do ++as, you are moving the pointer to the next std::string in an "array of strings" (which does not exist). Therefore, the subsequent attempt at dereferencing as is undefined behavior.
If you would like to obtain a pointer to the data of your string, such that you could move to the next character by incrementing the pointer, you could use c_str() member function, like this:
const char *as = a.c_str();
as++;
cout << as << endl; // This would print "sdf"

Strcpy does not copy an allocated character array

Why does this not work:
SomeClass::SomeClass(char *lit) //Ctor
{
str = new char[strlen(lit)+1]; // str is a pointer to char in SomeClass
strcpy(str,"have");
cout << str << " " << "In Ctor" << " +Size=" << strlen(str)<< endl;
}
The above code shows a string with length 0. But this code works:
SomeClass::SomeClass(char *lit)
{
char newstr[strlen(lit)+1];
strcpy(newstr,"have");
cout << newstr << " " << "In Ctor" << " +Size=" << strlen(newstr)<< endl;
}
Here is the complete code.
EDIT:
Added the link to Ideone which OP removed after I answered the Question.
Without the link to source code, this Q & answer to it is useless.
There is no problem with the strcpy, You are just messing your pointer.
The problem is here:
str = new char[strlen(lit)+1];
strcpy(str,lit);
length=leng(); <------------- str points to \0 after this call
cout << str << " " << "In Ctor" << " +Size=" << strlen(lit)<< endl;
str is your class member and You move the pointer str to point to the \0 in the function leng(), Naturally, You don't see any output in the next statement.
Solution is to hold the starting address in a separate pointer inside the function.
int String :: leng()
{
int length=0;
char *tempPtr= str; <----------- Store the address in a temporary pointer
while(*str)
{
length++;
str++;
}
str = tempPtr; <---------- Point the Pointer member to right address again
return length;
}
Another way to write String::leng():
int String::leng()
{
char *endPtr = str;
while(*endPtr)
endPtr++;
return endPtr - str;
}