Space complexity of str.substr() in C++

Space complexity of str.substr() in C++ - c++

What is the space complexity of the str.substr() function and how does it compare to str.erase()?
Wondering because I was running a code on leetcode, and used 150MB of memory when I used the substr function:
num = num.substr(1,num.size());
As soon as I removed this function and instead used the erase function, while changing nothing else in my code, the memory usage fell to 6.8MB. Updated code with erase function:
num = num.erase(0,1);

num = num.substr(1,num.size());
substr creates a copy of string without the first character, so out of the 1 character less after the call you have (almost) two times the initial string
(1) the string are shared, so after the assignment the initial string is deleted if it is not referenced from elsewhere, but before the assignment you had the two versions in memory requiring memory.
num = num.erase(0,1);
modifies the string, so that need only one version of the string during the execution
note it is the same as doing
num.erase(0,1);
(1): from Pete Becker remark since C++11 the internal representation of std::basic_string is explicitly not allowed to be shared

Space complexity of str.substr() in C++
Technically, it depends on the type of str.
Reasonably, there should be no overhead on top of the size of the output. The space complexity of std::string is linear in relation to the length of the string.

Related

C++17 what is the fastest way to convert list ( or other container ) to string cross platform

i have list of strings
std::list<std::string> list = { "--version", "--setcolor", "--setint" };
list.push_back("--command1");
list.push_back("--command2");
list.push_back("--command3");
basically list of commands which in the end like to convert them to string
the list, in the end, can contain up to 100 commands.
what is the fastest way to do it in c++17?
UPDATE
It can be also array or other container
the end result should be like this :
std::string s = "--version --setcolor --setint --command1 --command2 --command3";
Thanks

Using std::string as an accumulator (e.g., "string buffer", as seen in Java or C#) is standard practice in C++ and there's nothing C++11/14/17/20 adds to make it "faster".
Simply add up the length of the substrings - and the delimiters between them - string::reserve enough capacity in your target string, then string::append away ...
[#Justin answered in a comment, but putting it here as well because comments are ephemeral]
In C++ strings are modifiable (unlike other languages like Java and C# where they're immutable). So you can change them, including adding characters to them (e.g., via string::append).
There are two relevant concepts: the string's size - which is how many characters it has right now, and the string's capacity - which is how much it could hold given the storage already allocated to it. If the capacity is the same as the size then adding anything to the string will require a new allocation of storage from the heap, and a copy of the current contents of the string into it (and a deallocation of the old storage back to the heap.) (*) If the capacity is larger than the size then appending new characters will not require that allocation/copy/deallocation if the new characters will fit in the capacity available.
If you're doing multiple appends to a string this might mean multiple alloc/copy/dealloc cycles. Which is expensive.
So instead you reserve additional capacity up front: By calling string::reserve the capacity is increased to the amount specified in one shot. Then you append all your stuff bit by bit (byte by byte ..., substring by substring) and it'll all fit (if you calculated the necessary capacity correctly) without any additional alloc/copy/dealloc.
string::reserve is discussed here at cppreference. While you're there check out string::capacity - to find the current capacity of the string, and why not also look at string::resize - used to shrink the capacity until it is just as large as needed for the current contents of the string.
(*) Yes I'm leaving out the small string optimization ...

c++14 - Is this a good way to prepend a char on a string?

If I wanted to add char c onto the beginning of string s, is the following good practice?
string s = "oo";
char c = 'f';
s = c + s;
In the Question "Prepend std::string" on SO the answers that suggested doing this were less well received than the top answer, which suggested using the member-function .insert().
Is there a reason besides efficiency (s = c + s is not efficient since all the contents of string s must be copied)?

Since both perform the same operation, what reason could there be besides efficiency? c+s will create a temporary string, thus requiring a copy of every character in both c and s, and potentially a heap allocation. The temporary will then be moved into the given object, which will have its current memory deallocated (if any). These are not cheap operations.
By contrast, insert will only perform a heap allocation if there is insufficient capacity for the new character. You'll still have the copying going on, since you're inserting at the beginning. But that's about it. It is as efficient as insertion at the head of a contiguous array gets.

The s = c + s operation would create a temporary object probably dynamically allocating memory on heap. Do the required append operation and then copy it back to the string variable. More number of instructions and memory operations are involved.
Memory operations like allocating and de-allocating memory are costly.
Insert would reallocate memory only if not enough contiguous memory is available for the string. Worst case it would still match to s = c + s approach.
Although it is not much of a performance issue (considering the worst case) it is more elegant and easy to understand from a programmers perspective.

Note also that there is nothing to stop an implementation of string from allowing limited appends at both ends without needing to move the contents. The default implementations do not do this, but some implementation might reserve extra space at the front of the string the first time you prepend, so that a subsequent prepend is "free". There are vector implementations that do this out there.

Difference between std::string(size, '\0') and s.resize(size)?

Unlike std::vector, std::string does not provide a unary constructor that takes a size:
std::string s(size); // ERROR
Is there any difference between:
std::string s(size, '\0');
and
std::string s;
s.resize(size);
in terms of their performance on common implementations?
Will resize initialize the string to all zero characters or will it leave them an unspecified value?
If all zero, is there any way to construct a string of a given size, but leave the characters with an unspecified value?

There is a difference, as in std::string s(size, '\0');, all of the memory needed for the string can be allocated at once. However, with the second example, if size is greater than the amount of characters stored for small string optimization, an extra allocation may have to be performed, although this is implementation defined, and will definitely not be more performant in that regard in a standard-compliant C++ 17 implementation. However, the first example is more consise, and may be more performant, so it is probably preferable. When calling s.resize(size);, all new characters will be initialized with char's default constructor, aka '\0'. There is no way to initialize a string with unspecified values.

The actual answer would be implementation-based, but I'm fairly sure that std::string s(size, '\0'); is faster.
std::string s;
s.resize(size);
According to the documentation for std::string.
1) Default constructor. Constructs empty string (zero size and unspecified capacity).
The default constructor will create a string with an "unspecified capacity". My sense here is that the implementation is free to determine a default capacity, probably in the realm of 10-15 characters (totally speculation).
Then in the next line, you will reallocate the memory (resize) with the new size if the size is greater than the current capacity. This is probably not what you want!
If you really want to find out definitively, you can run a profiler on the two methods.

There is already a good answer from DeepCoder.
For the records however, I'd like to point out that strings (as for vectors) there are two distinct notions:
the size(): it's the number of actual (i.e. meaningful) characters in the string. You can change it using resize() (to which you can provide a second parameter to say what char you want to use as filler if it should be other than '\0')
the capacity(): it's the number of characters allocated to the string. Its at least the size but can be more. You can increase it with reserve()
If you're worried about allocation performance, I believe it's better to play with the capacity. The size should really be kept for real chars in the string not for padding chars.
By the way, more generally, s.resize(n) is the same as s.resize(n, char()). So if you'd like to fill it on the same way at construction, you could consider string s(n, char()). But as long as you don't use basic_string<T> for T being different from characters, your '\0' just does the trick.

Resize does not leave elements uninitialized. According to the documentation: http://en.cppreference.com/w/cpp/string/basic_string/resize
s.resize(size) will value-initialize each appended character. That will cause each element of the resized string to be initialized to '\0'.
You would have to measure the performance difference of your specific C++ implementation to really decide if there's a worthwhile difference or not.
After looking at the machine generated by Visual C++ for an optimized build, I can tell you the amount of code for either version is similar. What seems counter intuitive is that the resize() version measured faster for me. Still, you should check your own compiler and standard library.

Construct std::string from up to X characters, stopping at null char

I am reading strings from a structure in a file where each string has a fixed length, with '\0' padding. They are not zero-terminated if the stored string needs the whole length.
I'm currently constructing std::strings out of those like this:
// char MyString[1000];
std::string stdmystring(MyString, ARRAYSIZE(MyString));
However, this copies the padding, too. I could trim the string now, but is there an elegant and quick way to prevent the copying in the first place?
Speed is more important than space, because this runs in a loop.

Simple solutions are:
Just calculate the correct length first
either use strnlen as Dieter suggested
or std::find(MyString,MyString+ARRAYSIZE(MyString),'\0') which IME isn't any slower
note that if your string fits in cache, that will likely dominate the extra loop cost
reserve the max string size (you did say space was less important), and write a loop appending characters until you exhaust the width or hit a nul (like copy_until)
actually create a max-size string initialized with nuls, strncpy into it, and optionally erase unused nuls if you want the size to be correct
The second option uses only a single loop, while the third notionally uses two (it in the string ctor, and then in the copy). However, the push_back of each character seems more expensive than the simple character assignment, so I wouldn't be surprised if #3 were faster in reality. Profile and see!

Well If size is not a problem one potential way to do it is to create an empty std::string then use reserve() to pre-allocate the space potentially needed and then add each char until you come across '\0'.
std::string stdmystring;
stdmystring.reserve(MyString_MAX_SIZE) ;
for(size_t i=0;i<MyString_MAX_SIZE && MyString[i]!='\0';++i);
stdmystring+=MyString[i];
reserve() garanties you one memory allocation since you know the max_size and the string will never get larger than that.
The calls to += operator function will probably be inlined but it still has to check that the string has the needed capacity which is wasteful in your case. Infact this could be the same or worse than simply using strlen to find the exact length of the string first so you have to test it.

I think the most straightforward way is to overallocate your internal MyString array by one byte, always null terminate that final byte, and use the C-string constructor of std::string. (Keep in mind that most likely your process will be I/O bound on the file so whatever algorithm the C-string constructor uses should be fine).

Does the compiler copy a std::string into stack while passing it to a function in C++?

I have a simple question. I have a long std::string that I want to pass it to a function.
I wanna know that this string will be copy to stack then a copy of that will be passed or something like pointer will be passed and no additional space will be required?
(C++)
I have another little question: How much memory does an element of a string take?Just like char?

Yes, it will be deep copied, so use const reference is recommended.
void fun(const std::string & arg)
Typically std::string has 2 fields, a pointer pointing to dynamic allocated memory and the length, so it is 16+actual length on 64bit machines.

Spoiler Alert: My answer wont be that relevant, just an optimization technique.
If you dont want to duplicate the string, write your customized string class, which has two pointers or one pointer with size. In the past it has reduced me a lot of duplicates. This will work only as read-only and do a copy_on_write, i.e duplicate only if you encounter a write.

When passing an argument by value in C++ it is conceptually copied. Whether this copy really happens is another question, though, and depends on how the argument is passed and, to some extend, on the compiler: the compiler is explicitly allowed to elide certain copies, in particular copies of temporary objects. For example, when you return an object from a function and it us clear that the object will be returned, the copy is likely to be elided. Similarily, when passing the result of a function directly on to another function, it is likely not to be copied.
Beyond this C++ 2011 added another dimension of possibilities by supporting move constructors. These cover to some extend similar ground but also allow you to have better control: you can explicitly indicate that it would be acceptable for an object to be moved rather than being copied. Still, in no event will an object passed by reference.
With respect to the used bytes per element, the std::string uses just sizeof(cT) bytes (where cT is the character template argument of the std::basic_string). However, the string will overallocate the space in many cases and certainly when characters are added to the string. You can determine the overallocation by comparing size() and capacity() and control it to some extend with reserve() although this function isn't required of getting rid of any overallocation but the capacity() has to be at least as much as was last reserve()d. If the string is small (e.g. at most 15 characters) modern implementations won't make any allocation. This is called the string optimization.
With respect to the actual represention of the string: unless it is small it will use one word for the address of the storage, one word each for the the size and the capacity, and for strings with stateful allocators the size of the allocator (typically another word). Given alignment requirements this effectively means that in most cases the string will take four words in addition to the elements. Typically the small string optimization uses these words to store characters if the string firs there unless, of course, it needs to store a stateful allocator.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js