Trimming C++ string in constant time - c++

Is there a STL/library method to reduce the string size (trim it) in constant time.
In C this this can be done in constant time by just adding the '\0' past the last index.
C++ resize compexity is undefined and mostly likely be O(N)
http://www.cplusplus.com/reference/string/string/resize/

#SamVarshavchik is being coy in the comments, but it's worth spelling out: in many implementations, including libstdc++, std::string::resize() will reduce the size of a string in constant time, by reducing the length of the string and not reallocating/copying the data:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/basic_string.tcc . The estimate of O(n) is if you increase the size, forcing the string to be reallocated and copied over.
Alternatively, in C++17, std::string_view is rough immutable counterpart to "trimming" a C string by slapping a null byte. It takes a slice of a string (pointer + size) without copying its content, and you can then pass it around to various STL functions or print it from a stream.
std::string hello_world = "hello world";
auto hello = std::string_view(hello_world.data(), 5);
std::cout << hello; // prints hello
The caveat is that the string_view doesn't own the data, so you can't use it once the original string goes out of scope, and you don't want to modify the original string in a way that might cause it to reallocate.

The C++17 way, we can achieve the substr operation in O(1).
https://www.modernescpp.com/index.php/c-17-avoid-copying-with-std-string-view
std::string_view do not allocate the memory on heap for large string as well.
std::string allocate memory on heap but exception is for std::string size is 15 for MSVC and GCC and 23 for Clang. std::string below above mentioned size are not allocated memory on heap.

Related

std::string capacity remains same after deleting elements, so is it holding up some memory?

The following piece of code:
string a = "abc";
cout << a.capacity();
a.erase(a.begin() + 1, a.end());
cout << a.capacity();
...outputs:
33
Even if I remove some elements from the string, the capacity remains the same. So my questions are:
Is some memory being held up because of capacity? What if I have not explicitly reserve()'d?
If I use reserve() and don't end up using the entire capacity, am I wasting memory?
Will this extra memory (which I am not using) be allocated to something else if required?
EDIT:
Suppose i have
string a= "something";
a = "ab";
Now I know that a won't ever have more than two characters. So is it wise to call reserve(2) so that memory is not wasted?
I'll answer your questions first:
The memory belongs to the string, but isn't used entirely. If you don't reserve, you can't control the capacity. You just know it is sufficiently large.
Correct.
As said in 1): no. It belongs to the string. Nothing else can use this memory. The string itself could use it for additional characters though.
For further details I'd recommend the documentation of string::reserve.
One final remark: If you don't ever reserve, everything will work fine - it might be unnecessarily slow though. That is only ever the case if you were to frequently add few characters and the string has to re-alloc alot (much like a vector). Reserving is basically intended to bypass this situation.
On the addendum: Calling reserve can help to save memory. If you call reserve(n), this ensures the string has an internal capacity for at least n characters. Note that reserve is not required to set the capacity to exactly n nor to reduce the capacity at all for small n (cf. reserve documentation).
Back to your example: If you call reserve it can never do any harm. It's the best you can do in general. (In case you have C++11 features, I'd recommend shrink_to_fit).
I tested with (older) versions of gcc / clang in which cases the capacity of a got changed to exactly 2. Since I'm not 100% sure what the added question referes to, here is what I ran:
auto a = string{"something"};
a = "ab";
cout << a << " " << a.capacity() << endl;
a.reserve(2);
cout << a << " " << a.capacity() << endl;
Which produces:
ab 9
ab 2
1) Is some memory being held up because of capacity? What if i have not
explicitly reserve()'d?
Even if you did not call reserve, the std::string object can still hold up some memory1 for (even a default constructed) std::string. And this is true with std::string implementations that uses Short String Optimization
2)If i use std::reserve() and dont end up using the entire capcity
then am i wasting memory?
It depends on what you mean by wasting memory; std::string dynamically resizes its buffer to accommodate changes in the size of the string. Well, in the case of Short String Optimized std::string implementations, there is nothing you can do about it. The memory is in the string object itself.
3)Will this extra memory which i am not using be allocated to
something else, if required?
No. A std::string object manages the memory it allocated, and it may or may not give it up2 wholly or partly, until its destroyed. See David Schwartz's comment
EDIT: Suppose i have
string a= "something";
Now i edit the string and know that a won't have more than two
characters.So is it wise to call a.reserve(2) so that memory is not
wasted?
If you modified a in a way that changes a.size(), such as calling resize method or assigning it to a new string of length 2, then the proceeding reserve call can2 be beneficial.
Note that, calling reserve would not reduce the string's contents. std::string::reserve is not permitted to truncate the string. It is only permitted to work on unused memory. If you call std::string::reserve(new_capacity_intended) with new_capacity_intended less than the size() of the string, the best that could possibly happen is the same effect of std::string::shrink_to_fit.
To reduce the string's memory (if the implementation does a binding shrink_to_fit request) and shrink it to the first two characters:
string a= "something";
//resize it first
a.resize(2); //or by some assignment such as a = "so";
//then
a.reserve(2); // or better still a.shrink_to_fit();
1: by memory, I assume Virtual Memory
2: std::string::reserve or std::string::shrink_to_fit may or may not give up the string's unused memory.

Can std::string capacity be changed for optimisation?

Can the std::string capacity be changed to optimise it?
For example:
std::string name0 = "ABCDEABCDEABCDEF";
int cap = name0.capacity(); //cap = 31
int size = name0.size(); //size = 16
Okay, this is perfectly fine for a couple of strings in memory, but what if there are thousands? This wastes a lot of memory. Isn't it then better to use char* so you can control how much memory is allocated for the specific string?
(I know some people will ask why are there thousands of strings in memory, but I would like to stick to my question of asking if the string capacity can be optimised?)
If you're asking how to reduce capacity() so that it matches size(), then C++11 added shrink_to_fit() for this purpose, but be aware that it is a non-binding request, so implementations are allowed to ignore it.
name0.shrink_to_fit();
Or there's the trick of creating a temporary string and swapping:
std::string(name0.begin(), name0.end()).swap(name0);
However, neither of these are guaranteed to give you a capacity() that matches size(). From GotW #54:
Some implementations may choose to round up the capacity slightly to their next larger internal "chunk size," with the result that the capacity actually ends up being slightly larger than the size.

Efficient const char* concatenations and output to std::string [duplicate]

This question already has answers here:
Most optimized way of concatenation in strings
(9 answers)
Closed 9 years ago.
Consider first that the amount of total data that will be stored in the output string will almost certainly be small and so I doubt any of these have a noticeable affect on performance. My primary goal is to find a way to concatenate a series of const char*'s of unknown size that doesn't look terrible while also keeping efficiency in mind. Below are the results of my search:
Method 1:
std::string str = std::string(array1) + array2 + array3;
Method 2:
std::string str(array1);
str += array2;
str += array3;
I decided on the first method as it is short and concise. If I'm not mistaken, both methods will invoke the same series of operations. the unoptimized compiler would first create a temporary string and internally allocate some amount of space for its buffer >= sizeof(array1). If that buffer is sufficiently large, the additional + operations will not require any new allocations. Finally, if move semantics are supported, then the buffers of the temporary and named str are swapped.
Are there any other ways to perform such an operation that also look nice and don't incur terrible overhead?
Remember, that, in case of arrays, sizeof(array) returns actual size (aka length) of it's parameter, if it has been declared as an array of explicit size (and you wrote 'series of const char*'s of unknown size'). So, assuming you want to create universal solution, strlen() should come under consideration instead.
I don't think you can avoid all additional operations. In case of many concatenations, the best solution would be to allocate buffer, that is large enough to store all concatenated strings.
We can easily deduce, that the most optimal version of append() in this case is:
string& append (const char* s, size_t n);
Why? Because reference says: 'If s does not point to an array long enough (...), it causes undefined behavior'. So we can assume, that internally no additional checks take place (especially additional strlen() calls). Which is good, since you are completely sure, that values passed to append() are correct and you can avoid unnecesary overhead.
Now, the actual concatenation can be done like this:
len_1 = strlen(array_1);
len_2 = strlen(array_2);
len_3 = strlen(array_3);
//Preallocate enough space for all arrays. Only one reallocation takes place.
target_string.reserve(len_1 + len_2 + len_3);
target_string.append(array_1, len_1);
target_string.append(array_2, len_2);
target_string.append(array_3, len_3);
I do not know if this solution 'looks good' in your opinion, but it's definitely clear and is optimized for this use case.

std::string capacity size

Is the capacity size of string always a multiple value of 15?
for example: In all cases the capacity is 15
string s1 = "Hello";
string s2 = "Hi";
string s3 = "Hey";
or is it random?
Is the capacity size of string always a multiple value of 15?
No; the only guarantee about the capacity of a std::string is that s.capacity() >= s.size().
A good implementation will probably grow the capacity exponentially so that it doubles in size each time a reallocation of the underlying array is required. This is required for std::vector so that push_back can have amortized constant time complexity, but there is no such requirement for std::string.
In addition, a std::string implementation can perform small string optimizations where strings smaller than some number of characters are stored in the std::string object itself, not in a dynamically allocated array. This is useful because many strings are short and dynamic allocation can be expensive. Usually a small string optimization is performed if the number of bytes required to store the string is smaller than the number of bytes required to store the pointers into a dynamically allocated buffer.
Whether or not your particular implementation performs small string optimizations, I don't know.
Implementation specific - std::String usually allocates a small starting string, 16bytes is common.
It's a compromise between not having to do a realloc and move for very short strings and not wasting space
It's an implementation detail on which you're not supposed to rely on; to see exactly how std::string grow in your implementation you can have a look at the sources of its CRT. In general it has an exponential grow.

C++ string literals vs. const strings

I know that string literals in C/C++ have static storage duration, meaning that they live "forever", i.e. as long as the program runs.
Thus, if I have a function that is being called very frequently and uses a string literal like so:
void foo(int val)
{
std::stringstream s;
s << val;
lbl->set_label("Value: " + s.str());
}
where the set_label function takes a const std::string& as a parameter.
Should I be using a const std::string here instead of the string literal or would it make no difference?
I need to minimise as much runtime memory consumption as possible.
edit:
I meant to compare the string literal with a const std::string prefix("Value: "); that is initialized in some sort of a constants header file.
Also, the concatenation here returns a temporary (let us call it Value: 42 and a const reference to this temporary is being passed to the function set_text(), am I correct in this?
Thank you again!
Your program operates on the same literal every time. There is no more efficient form of storage. A std::string would be constructed, duplicated on the heap, then freed every time the function runs, which would be a total waste.
This will use less memory and run much faster (use snprintf if your compiler supports it):
void foo(int val)
{
char msg[32];
lbl->set_label(std::string(msg, sprintf(msg, "Value: %d", val)));
}
For even faster implementations, check out C++ performance challenge: integer to std::string conversion
How will you build your const std::string ? If you do it from some string literral, in the end it will just be worse (or identical if compiler does a good job). A string literal does not consumes much memory, and also static memory, that may not be the kind of memory you are low of.
If you can read all your string literals from, say a file, and give the memory back to OS when the strings are not used any more, there may be some way to reduce memory footprint (but it will probably slow the program much).
But there is probably many other ways to reduce memory consumption before doing that kind of thing.
Store them in some kind of resource and load/unload them as necessary.