Decreasing string capacity in C++ - c++

Why doesn't the string capacity adjust according to the resized number (15)?
int main()
{
string str = "Hello";
cout << "Original: " << str.capacity() << endl;
str.resize(15);
cout << "New: " << str.capacity() << endl;
return 0;
}
Result:
Original: 22
New: 22
I'm new to programming, so your simple explanation will be much appreciated. Thanks!

The only requirement is that the capacity is at least as large as the size. If you want to increase the capacity, use reserve, and if you want to use the smallest possible capacity for the current string size, use shrink_to_fit. Bear in mind that if you're doing this for performance reasons it's probably unnecessary though.
In this case the size is always going to be at least 22 characters due to small string optimisation.

Related

Why capacity of std::string is 15. Why my reserve() is ignored?

#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;
s.reserve(5);
cout << s.capacity() << endl;
}
The reserve is a std::string's function that sets the capacity. The capacity function shows the space size of c_string in the std::string.
But, the result is not 5, but 15. I don't know why.
From https://en.cppreference.com/w/cpp/string/basic_string/reserve
If new_cap is greater than the current capacity(), new storage is allocated, and capacity() is made equal or greater than new_cap.
If new_cap is less than the current capacity(), this is a non-binding shrink request.
If new_cap is less than the current size(), this is a non-binding shrink-to-fit request equivalent to shrink_to_fit() (since C++11).
(until C++20)
If new_cap is less than or equal to the current capacity(), there is no effect. (since C++20)
There is no guarantee that by calling reserve will leave you a capacity exactly as what you provided regardless of the C++ version you are using
Note that also
int main()
{
std::string s;
std::cout << s.capacity() << "\n";
s.reserve(5);
std::cout << s.capacity() << "\n";
}
Would print 15 twice.
Consider the output of this
#include <iostream>
#include <string>
int main()
{
std::string s;
std::cout << s.capacity() << "\n";
std::cout << sizeof(std::string) << "\n";
std::cout << sizeof(char) << "\n";
std::cout << sizeof(char*) << "\n";
std::cout << sizeof(size_t) << "\n";
}
Possible output:
15
32
1
8
8
A std::string somehow has to store the character array, its size, capacity and perhaps some more bookkeeping. The obvious way to store a character array is with a char* and a size_t for the size (only null-terminator is not sufficient for constant time size() and others). However, thats 16 bytes. A single character is only 1 byte. Hence an empty std::string before allocating any dynamic memory has enough space to store some characters by reusing memory that is used for other stuff once the string grows.
Thats is short string optimization. I only outlined the general idea. For details I refer you to other resources or the implementation.
Further reserve(n) will only make sure that the string has enough space for at least n characters. An empty string already has enough space to store 5 characters. When n is smaller than the current capacity the string might shrink or not (it does not since C++20).
TL;DR: Your call to reserve is not ignored. After the call the string has enough capacity to hold 5 characters. The initial capacity on your implementation seems to be 15. It could be some other value as well.

char array initialization using cin.get()

I am trying something different with cin.get() like given below:
char chArray[30]="character array with size "; //current string size is 25 character with null character
cout<< chArray << sizeof(chArray)<< endl;
cout<< "Now we will try to enter more than 30 character in chArray using cin.get()";
cin.get(chArray,100);
cout<< chArray << endl << sizeof(chArray)<< endl;
output of above code is very strange as given below:
character array with size 30
Now we will try to enter more than 30 character in chArray using cin.get().
The character array size is 30 but we are entering more than 30 using cin.get() but the size is still 30.
How is size of chArray not changing from 30 to the size of the string we entered using cin.get()?
Please explain.
A fixed array is not dynamically sizable. Once the array is declared, it cannot change size (sizeof() is fixed at compile-time). Your code has a buffer overflow that will corrupt surrounding memory if you try to enter more characters than the array can hold. In your example, your array can only hold 30 chars max, but you are telling cin that it can read up to 100 chars (well, 99, plus a null terminator) into the array.
For what you are trying to do, you need to read into a std::string instead of a char[] array. The size() of a std::string can change dynamically at runtime, eg:
#include <string>
std::string str = "character string with size ";
std::cout << str << str.size() << std::endl;
std::cout << "Now we will try to enter more than 30 character in str using cin";
std::cin >> str; // or: std::getline(std::cin, str);
std::cout << str << std::endl << str.size() << std::endl;
How is size of chArray not changing from 30 to the size of the string we entered using cin.get()?
Arrays in C++ have fixed size. They are created on the stack with a fixed size given by the programmer. That means you give them a specific size and it is known to the compiler at compile time. This size does not change. Ever.
If you write more characters into the array than the size for example writing 100 characters in an array of size 30, it is called buffer overflow or buffer overrrun. It basically means you crossed the boundary i.e., the fixed size set, which is 30 in this case.
The other characters entered (after the limit of 30) can go anywhere in the memory because it is undefined where they will go. If you try to print this array, your program will terminate with an error:
*** stack smashing detected ***: terminated
The error in this particular case means you tried to put more data into the stack than it's capacity.
However, we have string in C++, which you can use if you want a container which changes its size as required. Example:
std::string mystr;
std::cout << "Mystr size before: " << mystr.size() << '\n';
std::getline (std::cin, mystr);
std::cout << "Mystr size after: " << mystr.size() << '\n';

Deep understanding of strcat and strlen functions

We know that strcat() recevies a poiner to a destination array as parameters and concatenate them with source string. The destination array should be large enough to store the concatenated result. Recently i found out that it is still possible for strcat() to execute as expected, for small programs, even when the destination array is not large enough to add second string. I start surfing stackoverflow and found out couple -
answers for this question. I want to go more deeply,and understand what exactly happends in hardware layer when i run this code below ?
#include<iostream>
#include<iomanip>
#include<cmath>
#include<cstring>
using namespace std;
int main(){
char p[6] = "Hello";
cout << "Length of p before = " << strlen(p) << endl;
cout << "Size of p before = " << sizeof(p) << endl;
char as[8] = "_World!";
cout << "Length of as before = " << strlen(as) << endl;
cout << "Size of as before = " << sizeof(as) << endl;
cout << strcat(p,as) << endl;
cout << "After concatenation:" << endl;
cout << "Length of p after = " << strlen(p) << endl;
cout << "Size of p after = " << sizeof(p) << endl;
cout << "Length of as after = " << strlen(as) << endl;
cout << "Size of as after = " << sizeof(as) << endl;
return 0;
}
After running this code the length of array p[] is 12, and the size of p[] is 6. How can physically such length be stored on such array size ? I mean for this array the number of bytes are limited, so does it mean that strlen(p) function looks only for NULL terminator, and keeps counting untill it founds it and ignores the actual allocated size of that array. And sizeof() function doesn't really care if the last element in array, allocated purposely for null-character, is stores null-character or not.
The array p is allocated on the function stack frame, so strcat "overflows" the buffer p and continues wrting to some other area of the stack - typically it overrides other local parameters, function return address, etc (keep in mind that on x86 platform function stack usually grows "downwards", i.e. towards lesser addresses). This is well-known "buffer overflow" vulnerability.
strlen cannot know what is the actual size of your buffer, it just looks for 0-terminator. On the other hand, sizeof is a compile-time function that returns the array size in bytes.
You are writing outside the bounds of p and the behavior of your program is therefore undefined.
While the behavior is totally undefined, there are a couple of common behaviors that occur:
You overwrite some unrelated data. This could be other local variables, the function return address, etc. It's impossible to guess exactly what will get overwritten without examining the assembly generated by the compiler for that specific program. This can result in a severe security vulnerability since it can allow an attacker to inject their own code into your program's memory space and let them overwrite a function's return address to cause the program to execute their injected code.
The program crashes. This can happen if you write far enough past the end of the array to pass a memory page boundary. The program can try to write to a virtual memory address that the OS hasn't mapped to physical memory for your application. This results in the OS killing your application (with a SIGSEGV on Linux, for example). This will usually happen more often with dynamically allocated arrays than function-local arrays.

How does std::vector deal with variable length std::string?

We can construct a vector to store a bunch of strings by writing vector, but a string is can be variable length, how can vector deal with that?
I also test a demo, test[0] begin with 0x2508cb0, test[1] begin with 0x2508cb8, but the diff of two addresses and the capacity of test[0] seems to be not same.
int main()
{
vector<string> test;
test.push_back("tes3235235et");
test.push_back("135125151241241241");
cout << test[0].capacity() << endl;
cout << test[1].capacity() << endl;
cout << &(test[0]) << endl;
cout << &(test[1]) << endl;
return 0;
}
Output:
12
18
0x2508cb0
0x2508cb8
The vector doesn't need to deal with that, because the string deals with that. Just like std::vector, std::string stores its elements in dynamically allocated memory. The characters are not part of the string object itself (except in the case of small string optimization), but are instead just referred to via a pointer. The actual size of the string object is set at compile time and is the same for all strings (and can be obtained by sizeof(std::string)), regardless of the number of characters.

Difference between "size" and "capacity" in c++ string?

I have this snippet from Thinking in C++.
#include <iostream>
#include <string>
int main ()
{
string bigNews("I saw Elvis in a UFO. ");
cout << bigNews << endl;
bigNews.insert(0, " thought I ");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
bigNews.append("I've been working too hard.");
cout << bigNews << endl;
cout << "Size = " << bigNews.size() << endl;
cout << "Capacity = "
<< bigNews.capacity() << endl;
return 0;
}
And I get output as shown below:
I saw Elvis in a UFO.
thought I I saw Elvis in a UFO.
Size = 33
Capacity = 44
thought I I saw Elvis in a UFO. I've been working too hard.
Size = 60
Capacity = 88
I can figure out why the size increases, but I am not able to make out how the Capacity increases?
What i know is Capacity is the string buffer where we can Pushback, but how that space is allocated?
capacity is the maximum number of characters that the string can currently hold without having to grow. size is how many characters actually exist in the string. The reason they're separate concepts is that allocating memory is generally inefficient, so you try to allocate as rarely as possible by grabbing more memory than you actually need at one time. (Many data structures use a "doubling" method where, if they hit their capacity of N and need more space, they will allocate 2*N space, to avoid having to reallocate again any time soon.)
capacity will increase automatically as you use the string and require more space. You can also manually increase it using the reserve function.
From the documentation:
capacity()
returns the number of characters that can be held in currently allocated storage
(public member function)
So, it is the allocation size of the internal buffer. What you see is its size doubling when it's exhausted -- this is a common technique for using dynamically-sized buffers efficiently, and it's called "exponential storage expansion". What it boils down to is basically this:
void resize_buffer(char **buf, size_t *cap, size_t newsize)
{
while (newsize > *cap)
*cap *= 2;
*buf = realloc(*buf, *cap);
}
(Of course, this is largely simplified, don't use this for actual reallocation code in production.) Probably your implementation of std::string is using this trick, that's why you see the buffer size going up by 100%.