Difference between "size" and "capacity" in c++ string? - c++

I have this snippet from Thinking in C++.
#include <iostream>
#include <string>
int main ()
{
string bigNews("I saw Elvis in a UFO. ");
cout << bigNews << endl;
bigNews.insert(0, " thought I ");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
bigNews.append("I've been working too hard.");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
return 0;
}
And I get output as shown below:
I saw Elvis in a UFO.
thought I I saw Elvis in a UFO.
Size = 33
Capacity = 44
thought I I saw Elvis in a UFO. I've been working too hard.
Size = 60
Capacity = 88
I can figure out why the size increases, but I am not able to make out how the Capacity increases?
What i know is Capacity is the string buffer where we can Pushback, but how that space is allocated?

capacity is the maximum number of characters that the string can currently hold without having to grow. size is how many characters actually exist in the string. The reason they're separate concepts is that allocating memory is generally inefficient, so you try to allocate as rarely as possible by grabbing more memory than you actually need at one time. (Many data structures use a "doubling" method where, if they hit their capacity of N and need more space, they will allocate 2*N space, to avoid having to reallocate again any time soon.)
capacity will increase automatically as you use the string and require more space. You can also manually increase it using the reserve function.

From the documentation:
capacity()
returns the number of characters that can be held in currently allocated storage
(public member function)
So, it is the allocation size of the internal buffer. What you see is its size doubling when it's exhausted -- this is a common technique for using dynamically-sized buffers efficiently, and it's called "exponential storage expansion". What it boils down to is basically this:
void resize_buffer(char **buf, size_t *cap, size_t newsize)
{
while (newsize > *cap)
*cap *= 2;
*buf = realloc(*buf, *cap);
}
(Of course, this is largely simplified, don't use this for actual reallocation code in production.) Probably your implementation of std::string is using this trick, that's why you see the buffer size going up by 100%.

Related

How do I tell if I am using VLA (Variable Length Array)?

I am on a project where we have to read in from a file, temporarily store them in dynamically allocated memory, do sorting and stuff, and deallocate the memory.
As per the project is testing our knowledge over dynamic memory and memory leak, one of the instructions is do not use VLA.
I am not sure what our instructor means by we should not use VLA, are we not allowed use [] bracket syntax? or we can use them as long as we use memories from heap and deallocate them properly once they are of no use anymore.
Here is my main.cpp, it is not complete yet, so please excuse some typos and possible errors, but if you have something to suggest or correct, those are more than welcome as well.
Thank you and do have a good weekend y'all.
#include "proj2-arrayFunctions.h"
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main() {
ifstream file;
int size = 0;
int* numberArray;
int counter = 0;
file.open("arrays.txt");
if (!file)
{
cout << "error: file is not opened! " << endl;
return 1;
}
while(file.good())
{
file >> size;
numberArray = new int[size];
for (int i = 0; i < size; i++)
{
file >> numberArray[i];
}
bubbleSort(numberArray, size);
cout << "the largest value from this array is: " << largestValue(numberArray, size) << endl;
cout << "the smallest value from this array is: " << smallestValue(numberArray, size) << endl;
cout << "the average value of this array is: " << averageValue(numberArray, size) << endl;
cout << "the median value of this array is: " << medianValue(numberArray, size) << endl;
delete[] numberArray;
}
return 0;
}
int *numberArray; numberArray = new int[size]; is not a variable-length array, it's a dynamically allocated array. That's fine. Note that you have to delete[] it when done, which you do.
A VLA declaration would look like int numberArray[size]; where size is not a constant. It gets automatically deallocated when it goes out of scope, so you don't use delete on it. They are typically allocated on the stack and so can be created and deallocated very fast, but have various pitfalls. The main one is that there is no way to check if enough stack space is available, and your program will simply crash if there isn't; there is no way to safely detect or handle that error. So you would have to be very careful about checking that the value of size is reasonable.
VLAs are not part of standard C++, but some compilers support them anyway.
If you want to avoid using VLAs, use the appropriate compiler flag to treat VLAs as errors, which would be -Werror=vla for GNUC (gcc, clang, icc, et cetera). MSVC doesn't support VLAs anyway.

Decreasing string capacity in C++

Why doesn't the string capacity adjust according to the resized number (15)?
int main()
{
string str = "Hello";
cout << "Original: " << str.capacity() << endl;
str.resize(15);
cout << "New: " << str.capacity() << endl;
return 0;
}
Result:
Original: 22
New: 22
I'm new to programming, so your simple explanation will be much appreciated. Thanks!
The only requirement is that the capacity is at least as large as the size. If you want to increase the capacity, use reserve, and if you want to use the smallest possible capacity for the current string size, use shrink_to_fit. Bear in mind that if you're doing this for performance reasons it's probably unnecessary though.
In this case the size is always going to be at least 22 characters due to small string optimisation.

Deep understanding of strcat and strlen functions

We know that strcat() recevies a poiner to a destination array as parameters and concatenate them with source string. The destination array should be large enough to store the concatenated result. Recently i found out that it is still possible for strcat() to execute as expected, for small programs, even when the destination array is not large enough to add second string. I start surfing stackoverflow and found out couple -
answers for this question. I want to go more deeply,and understand what exactly happends in hardware layer when i run this code below ?
#include<iostream>
#include<iomanip>
#include<cmath>
#include<cstring>
using namespace std;
int main(){
char p[6] = "Hello";
cout << "Length of p before = " << strlen(p) << endl;
cout << "Size of p before = " << sizeof(p) << endl;
char as[8] = "_World!";
cout << "Length of as before = " << strlen(as) << endl;
cout << "Size of as before = " << sizeof(as) << endl;
cout << strcat(p,as) << endl;
cout << "After concatenation:" << endl;
cout << "Length of p after = " << strlen(p) << endl;
cout << "Size of p after = " << sizeof(p) << endl;
cout << "Length of as after = " << strlen(as) << endl;
cout << "Size of as after = " << sizeof(as) << endl;
return 0;
}
After running this code the length of array p[] is 12, and the size of p[] is 6. How can physically such length be stored on such array size ? I mean for this array the number of bytes are limited, so does it mean that strlen(p) function looks only for NULL terminator, and keeps counting untill it founds it and ignores the actual allocated size of that array. And sizeof() function doesn't really care if the last element in array, allocated purposely for null-character, is stores null-character or not.
The array p is allocated on the function stack frame, so strcat "overflows" the buffer p and continues wrting to some other area of the stack - typically it overrides other local parameters, function return address, etc (keep in mind that on x86 platform function stack usually grows "downwards", i.e. towards lesser addresses). This is well-known "buffer overflow" vulnerability.
strlen cannot know what is the actual size of your buffer, it just looks for 0-terminator. On the other hand, sizeof is a compile-time function that returns the array size in bytes.
You are writing outside the bounds of p and the behavior of your program is therefore undefined.
While the behavior is totally undefined, there are a couple of common behaviors that occur:
You overwrite some unrelated data. This could be other local variables, the function return address, etc. It's impossible to guess exactly what will get overwritten without examining the assembly generated by the compiler for that specific program. This can result in a severe security vulnerability since it can allow an attacker to inject their own code into your program's memory space and let them overwrite a function's return address to cause the program to execute their injected code.
The program crashes. This can happen if you write far enough past the end of the array to pass a memory page boundary. The program can try to write to a virtual memory address that the OS hasn't mapped to physical memory for your application. This results in the OS killing your application (with a SIGSEGV on Linux, for example). This will usually happen more often with dynamically allocated arrays than function-local arrays.

How does std::vector deal with variable length std::string?

We can construct a vector to store a bunch of strings by writing vector, but a string is can be variable length, how can vector deal with that?
I also test a demo, test[0] begin with 0x2508cb0, test[1] begin with 0x2508cb8, but the diff of two addresses and the capacity of test[0] seems to be not same.
int main()
{
vector<string> test;
test.push_back("tes3235235et");
test.push_back("135125151241241241");
cout << test[0].capacity() << endl;
cout << test[1].capacity() << endl;
cout << &(test[0]) << endl;
cout << &(test[1]) << endl;
return 0;
}
Output:
12
18
0x2508cb0
0x2508cb8
The vector doesn't need to deal with that, because the string deals with that. Just like std::vector, std::string stores its elements in dynamically allocated memory. The characters are not part of the string object itself (except in the case of small string optimization), but are instead just referred to via a pointer. The actual size of the string object is set at compile time and is the same for all strings (and can be obtained by sizeof(std::string)), regardless of the number of characters.

C++ sizeof C-style string / char array - optimization

I'm a student at university. I work mostly with Java, C++ is very new to me, so I probably make many silly mistakes and I have upcoming exams to cope with. Don't be too harsh with me.
Note: I can NOT use C++ std::string because I need to work with C-strings due to university tasks!
Referring to my studies and the question I asked about pointers and const arguments (which you find here) I tried messing around with memory management but it seems it has no effect, or I just misunderstood some aspects about sizeof or actual sizes of certain elements.
This is my class Person:
Person.cpp
using namespace std;
Person::Person()
{
Person::name = new (char[64]);
Person::adress = new (char[64]);
Person::phone = new (char[64]);
cout << "standard constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
}
Person::Person(const char *name, const char *adress , const char *phone)
{
Person::name = new (char[strlen(name)]);
Person::adress = new (char[strlen(adress)]);
Person::phone = new (char[strlen(phone)]);
setName(name);
setAdress(adress);
setPhone(phone);
cout << "general constructor called; object created, allocated " << sizeof(this->name) << "+" << sizeof(this->adress) << "+" << sizeof(this->phone) << "bytes" << endl;
};
Person::Person(Person const &other)
{
Person::name = new (char[strlen(other.getName())]);
Person::adress = new (char[strlen(other.getAdress())]);
Person::phone = new (char[strlen(other.getPhone())]);
setName(other.getName());
setAdress(other.getAdress());
setPhone(other.getPhone());
cout << "copy constructor called; object created, allocated " << sizeof(name) << "+" << sizeof(adress) << "+" << sizeof(phone) << "bytes" << endl;
};
Person::~Person()
{
delete [] name;
delete [] adress;
delete [] phone;
cout << "destructor called; object removed" << endl;
};
I tried to spare memory with creating a C-string with a string length of the given parameters.
Thinking that a C-string is a char array, sparing chars would result in sparing memory, e.g. a C-string of "John" takes up less memory than a C-string of "Jonathan".
So now I'm not sure if I just got the wrong concept of C-strings or char arrays, or my implementation is just faulty.
In my main I create the following objects:
int main()
{
Person t;
t.printPerson();
cout << "size of t: " << sizeof(t) << endl;
Person p("John", "some street", "0736182");
p.printPerson();
cout << "size of p: " << sizeof(p) << endl;
Person x(p);
x.printPerson();
cout << "size of x: " << sizeof(x) << endl;
Person y("Jonathan", "Lancaster Ave 53", "3584695364");
y.printPerson();
cout << "size of y: " << sizeof(y) << endl;
cin.get();
};
But I alwas get a size of 24 per object, so 8 for each member variable. Why is that?
Thanks in advance.
I think you are expecting the sizeof operator to behave differently than it actually does. Let's take this code, for example:
const char* str = new char[137];
Here, if you write sizeof(str) you'll probably either get 4 or 8, depending on your system, because sizeof(str) measures the number of bytes of the pointer str itself rather than the number of bytes in the array pointed at by str. So, on a 32-bit system, you'd probably get 4, and on a 64-bit system you'd probably get 8, independently of how many characters you allocated.
Unfortunately, C++ doesn't have a way for you to get the number of characters or the memory used up by a dynamically allocated array. You just have to track that yourself.
Similarly, in your main function, when you write sizeof(p), you're measuring the number of bytes used by the object p, not the total number of bytes used by p and the arrays it points at. You'll always get back the same value for sizeof(p) regardless of what strings it points at.
If you're planning on working with strings in C++, I strongly recommend using std::string over raw C-style strings. They're much easier to use, they remember their length (so it's harder to mix up strlen and sizeof), and if you have a class holding s bunch of std::strings you don't need a copy constructor or assignment operator to handle the logic to shuffle them around. That would significantly clean up your code and eliminate most of the memory errors in it.
sizeof gives you a number of bytes which c/c++ need to keep the object in memory. In you r case (though you have not shown it) it looks like name, address, and phone are pointers to char:
struct Person {
char *name, *address, *phone;
}
a pointer is a variable which keeps an address of another object. So, depending on the underlying system it could occupy 32 bits (4 bytes) or 64 bite (8 bytes) (or some other number). In this case the sizeof struct person will be for 64-bit system -- 24. (3 pointers per 8 bytes each). This corresponds to your results.
The sizeof provides you with a shallow size calculation. Your strings are pointed by the those pointers and their lengths are not included. So, potentially you need to create a member function which will calculate those for you, i.e.
struct Person {
char *name, *address, *phone;
int getSize() {
return strlen(name) + strlen(address) + strlen(phone);
}
};
And as mentioned in the comments before, every char *string in c/c++ must have a termination character ('\0') which tells the program where the string ends. So, if you allocate space for a string, you should provide space for it as well (+ 1 to the length). And you have to make sure that this character is written as '\0'. if you use library functions to copy strings, they will take car of it, otherwise you need to do it manually.
void setName(const char *n) {
name = new char[strlen(n) + 1]; // includes needed '0', if exists in 'n'
strcpy(name, n); // copies the string and adds `\0` to the end
}
If you use the loop to copy chars instead of strcpy you would need to add it manually:
name[strlen(n)] = 0;