Why is std::string::append() less powerful than std::string::operator+()? - c++

I noticed that
std::string str;
str += 'b'; // works
str.append('b'); // does not work
str.append(1, 'b'); // works, but not as nice as the previous
Is there any reason why the append method does not support a single character to be appended? I assumed that the operator+= is actually a wrapper for the append method, but this does not seem to be the case.

I figure that operator+=() is intended to handle all the simple cases (taking only one parameter), while append() is for things that require more than one parameter.
I am actually more surprised about the existence of the single-parameter append( const basic_string& str ) and append( const CharT* s ) than about the absence of append( CharT c ).
Also note my comment above: char is just an integer type. Adding a single-parameter, integer-type overload to append() -- or the constructor (which, by design, have several integer-type overloads already) might introduce ambiguity.
Unless somebody finds some written rationale, or some committee members post here what they remember about the discussion, that's probably as good an explanation as any.

It is interesting to note that the form of append here;
string& append( size_type count, CharT ch );
Mirrors the constructor taking similar input.
basic_string( size_type count,
CharT ch,
const Allocator& alloc = Allocator() );
And some other methods that take a count with a character, such as resize( size_type count, CharT ch );.
The string class is large and it is possible that the particular use case (and overload) for str.append('b'); was not considered, or the alternatives were considered sufficient.
Just simple the introduction of a single overload for this could introduce ambiguity if the integrals int and char correspond (on some platforms this may be the case).
There are several alternatives to the append adding a single character.
Adding a string containing a single character can be done str.append("b");. Albeit that this not exactly the same, it has the same effect.
As mentioned there is operator+=
There is also push_back(), which is consistent with other standard containers
Point is, it was probably never considered as a use case (or strong enough use case), thus, a suitable overload/signature was not added to append to cater for it.
Alternative designs could be debated, but given the maturity of the standard and this class, it is unlikely they will be changed soon - it could very well break a lot of code.
Alternate signatures for append could also be considered; one possible solution could have been to reverse the order of the count and char (possibly adding a default);
string& append(CharT ch, size_type count = 1);
Another, as described in some of the critique of basic_string is to remove append, there are many methods to achieve what it does.

Related

is a += a safe for std::string in c++ [duplicate]

Considering a code like this:
std::string str = "abcdef";
const size_t num = 50;
const size_t baselen = str.length();
while (str.length() < num)
str.append(str, 0, baselen);
Is it safe to call std::basic_string<T>::append() on itself like this? Cannot the source memory get invalidated by enlarging before the copy operation?
I could not find anything in the standard specific to that method. It says the above is equivalent to str.append(str.data(), baselen), which I think might not be entirely safe unless there is another detection of such cases inside append(const char*, size_t).
I checked a few implementations and they seemed safe one way or another, but my question is if this behavior is guaranteed. E.g. "Appending std::vector to itself, undefined behavior?" says it's not for std::vector.
According to §21.4.6.2/§21.4.6.3:
The function [basic_string& append(const charT* s, size_type n);] replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
Note: This applies to every append call, as every append can be implemented in terms of append(const charT*, size_type), as defined by the standard (§21.4.6.2/§21.4.6.3).
So basically, append makes a copy of str (let's call the copy strtemp), appends n characters of str2 to strtemp, and then replaces str with strtemp.
For the case that str2 is str, nothing changes, as the string is enlarged when the temporary copy is assigned, not before.
Even though it is not explicitly stated in the standard, it is guaranteed (if the implementation is exactly as stated in the standard) by the definition of std::basic_string<T>::append.
Thus, this is not undefined behavior.
This is complicated.
One thing that can be said for certain. If you use iterators:
std::string str = "abcdef";
str.append(str.begin(), str.end());
then you are guaranteed to be safe. Yes, really. Why? Because the specification states that the behavior of the iterator functions is equivalent to calling append(basic_string(first, last)). That obviously creates a temporary copy of the string. So if you need to insert a string into itself, you're guaranteed to be able to do it with the iterator form.
Granted, implementations don't have to actually copy it. But they do need to respect the standard specified behavior. An implementation could choose to make a copy only if the iterator range is inside of itself, but the implementation would still have to check.
All of the other forms of append are defined to be equivalent to calling append(const charT *s, size_t len). That is, your call to append above is equivalent to you doing append(str.data(), str.size()). So what does the standard say about what happens if s is inside of *this?
Nothing at all.
The only requirement of s is:
s points to an array of at least n elements of charT.
Since it does not expressly forbid s pointing into *this, then it must be allowed. It would also be exceedingly strange if the iterator version allows self-assignment, but the pointer&size version did not.

Should I compare a std::string to "string" or "string"s?

Consider this code snippet:
bool foo(const std::string& s) {
return s == "hello"; // comparing against a const char* literal
}
bool bar(const std::string& s) {
return s == "hello"s; // comparing against a std::string literal
}
At first sight, it looks like comparing against a const char* needs less assembly instructions1, as using a string literal will lead to an in-place construction of the std::string.
(EDIT: As pointed out in the answers, I forgot about the fact that effectively s.compare(const char*) will be called in foo(), so of course no in-place construction takes place in this case. Therefore striking out some lines below.)
However, looking at the operator==(const char*, const std::string&) reference:
All comparisons are done via the compare() member function.
From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
Which of the comparisons should I prefer?
Does one version have advantages over the other (may be in specific situations)?
1 I'm aware that less assembly instructions doesn't neccessarily mean faster code, but I don't want to go into micro benchmarking here.
Neither.
If you want to be clever, compare to "string"sv, which returns a std::string_view.
While comparing against a literal like "string" does not result in any allocation-overhead, it's treated as a null terminated string, with all the concomittant disadvantages: No tolerance for embedded nulls, and users must heed the null terminator.
"string"s does an allocation, barring small-string-optimisation or allocation elision. Also, the operator gets passed the length of the literal, no need to count, and it allows for embedded nulls.
And finally using "string"sv combines the advantages of both other approaches, avoiding their individual disadvantages. Also, a std::string_view is a far simpler beast than a std::string, especially if the latter uses SSO as all modern ones do.
At least since C++14 (which generally allowed eliding allocations), compilers could in theory optimise all options to the last one, given sufficient information (generally available for the example) and effort, under the as-if rule. We aren't there yet though.
No, compare() does not require construction of a std::string for const char* operands.
You're using overload #4 here.
The comparison to string literal is the "free" version you're looking for. Instantiating a std::string here is completely unnecessary.
From my understanding, this means that we will need to construct a std::string anyway in order to perform the comparison, so I suspect the overhead will be the same in the end (although hidden by the call to operator==).
This is where that reasoning goes wrong. std::compare does not need to allocate its operand as a C-style null-terminated string to function. According to one of the overloads:
int compare( const CharT* s ) const; // (4)
4) Compares this string to the null-terminated character sequence beginning at the character pointed to by s with length Traits::length(s).
Although whether to allocate or not is an implementation detail, it does not seem reasonable that a sequence comparison would do so.

Is it safe to append std::string to itself?

Considering a code like this:
std::string str = "abcdef";
const size_t num = 50;
const size_t baselen = str.length();
while (str.length() < num)
str.append(str, 0, baselen);
Is it safe to call std::basic_string<T>::append() on itself like this? Cannot the source memory get invalidated by enlarging before the copy operation?
I could not find anything in the standard specific to that method. It says the above is equivalent to str.append(str.data(), baselen), which I think might not be entirely safe unless there is another detection of such cases inside append(const char*, size_t).
I checked a few implementations and they seemed safe one way or another, but my question is if this behavior is guaranteed. E.g. "Appending std::vector to itself, undefined behavior?" says it's not for std::vector.
According to §21.4.6.2/§21.4.6.3:
The function [basic_string& append(const charT* s, size_type n);] replaces the string controlled by *this with a string of length size() + n whose first size() elements are a copy of the original string controlled by *this and whose remaining elements are a copy of the initial n elements of s.
Note: This applies to every append call, as every append can be implemented in terms of append(const charT*, size_type), as defined by the standard (§21.4.6.2/§21.4.6.3).
So basically, append makes a copy of str (let's call the copy strtemp), appends n characters of str2 to strtemp, and then replaces str with strtemp.
For the case that str2 is str, nothing changes, as the string is enlarged when the temporary copy is assigned, not before.
Even though it is not explicitly stated in the standard, it is guaranteed (if the implementation is exactly as stated in the standard) by the definition of std::basic_string<T>::append.
Thus, this is not undefined behavior.
This is complicated.
One thing that can be said for certain. If you use iterators:
std::string str = "abcdef";
str.append(str.begin(), str.end());
then you are guaranteed to be safe. Yes, really. Why? Because the specification states that the behavior of the iterator functions is equivalent to calling append(basic_string(first, last)). That obviously creates a temporary copy of the string. So if you need to insert a string into itself, you're guaranteed to be able to do it with the iterator form.
Granted, implementations don't have to actually copy it. But they do need to respect the standard specified behavior. An implementation could choose to make a copy only if the iterator range is inside of itself, but the implementation would still have to check.
All of the other forms of append are defined to be equivalent to calling append(const charT *s, size_t len). That is, your call to append above is equivalent to you doing append(str.data(), str.size()). So what does the standard say about what happens if s is inside of *this?
Nothing at all.
The only requirement of s is:
s points to an array of at least n elements of charT.
Since it does not expressly forbid s pointing into *this, then it must be allowed. It would also be exceedingly strange if the iterator version allows self-assignment, but the pointer&size version did not.

Xcode C++ development, clarification needed

I absolutely love the way Xcode offers insight into possible available member functions of the language and would prefer to use it relative to, say, text mate, if not for an oddity i noticed today.
When string s = "Test string"; the only available substr signature is as shown
From what i understand however, and what i see online the signature should be
string substr ( size_t pos = 0, size_t n = npos ) const;
Indeed s.substr(1,2); is both understood and works in Xcode.
Why does it not show when i try to method complete? (Ctrl-Space)
Xcode is performing the completion correctly, but it's not what you expect. You've actually answered the question yourself unknowingly. The function signature for string's substr() method, just as you said, is:
string substr ( size_t pos = 0, size_t n = npos ) const;
All arguments to substr() have default assignments, therefore to Xcode, s.substr() (with no arguments) is the valid code completion to insert because it's really s.substr(0, s.npos). You can confirm this with any number of standard C++ functions with default arguments. The easiest place to see this is with any STL container constructor.
Take for instance a vector. We all know that vectors can take an Allocator, but the default argument assigned Allocator is "good enough" for most casual uses. Sure enough, two of the signatures for vector constructors are:
explicit vector ( const Allocator& = Allocator() );
explicit vector ( size_type n, const T& value= T(), const Allocator& = Allocator() );
In both cases, the Allocator argument has a default assignment, and in the second, the T default value has a default assignment. Now, take a look at what Xcode suggests when constructing a vector:
The suggestion with no argument list is actually the constructor that takes just an Allocator. The suggestion that takes just a size_type is actually the constructor that takes a size_type, T, and Allocator.
Depending on how you think about this, it may or may not be an Xcode bug. Ideally, you want to see completions with default arguments for simpler functions like substr(), but for STL container constructors, you probably almost never want to see them. Perhaps it could be an option, but I wouldn't expect to see this corrected. I'd happily dup a radar with you though.

Can std::string overload "substr" for rvalue *this and steal resources?

It just occurred to me I noticed that std::string's substr operation could be much more efficient for rvalues when it could steal the allocated memory from *this.
The Standard library of N3225 contains the following member function declaration of std::string
basic_string substr(size_type pos = 0, size_type n = npos) const;
Can an implementation that could implement an optimized substr for rvalues overload that and provide two versions, one of which could reuse the buffer for rvalue strings?
basic_string substr(size_type pos = 0) &&;
basic_string substr(size_type pos, size_type n) const;
I imagine the rvalue version could be implemented as follows, reusing the memory of *this an setting *this to a moved-from state.
basic_string substr(size_type pos = 0) && {
basic_string __r;
__r.__internal_share_buf(pos, __start + pos, __size - pos);
__start = 0; // or whatever the 'empty' state is
return __r;
}
Does this work in an efficient fashion on common string implementations or would this take too much housekeeping?
Firstly, an implementation cannot add an overload that steals the source, since that would be detectable:
std::string s="some random string";
std::string s2=std::move(s).substr(5,5);
assert(s=="some random string");
assert(s2=="rando");
The first assert would fail if the implementation stole the data from s, and the C++0x wording essentially outlaws copy on write.
Secondly, this wouldn't necessarily be an optimization anyway: you'd have to add additional housekeeping in std::string to handle the case that it's a substring of a larger string, and it would mean keeping large blocks around when there was no longer any strings referencing the large string, just some substring of it.
Yes, and maybe it should be proposed to the standards committee, or maybe implemented in a library. I don't really know how valuable the optimization would be. And that would be an interesting study all on its own.
When gcc grows support for r-value this, someone ought to try it and report how useful it is.
There are a few string classes out there implementing copy-on-write. But I wouldn't recommend adding yet another string type to your project unless really justified.
Check out the discussion in Memory-efficient C++ strings (interning, ropes, copy-on-write, etc)