Why call basic_string::substr with no arguments? - c++

If s1 and s2 are strings, then (as far as I can tell)
s1 = s2.substr();
means exactly the same as
s1 = s2;
Why would somebody want to call substr() without any arguments?
Edit: Another way to phrase the same question:
Why does the standard define substr thus:
basic_string substr( size_type pos = 0,
size_type count = npos ) const;
rather than thus:
basic_string substr( size_type pos,
size_type count = npos ) const;

The answer is, just for the heck of it.
As you rightly noticed, it has no advantage (and sometimes a speed disadvantage) to just creating a copy.
Speculating why the first argument is defaulted at all, I guess that it was meant as a way to force un-sharing of ancient COW strings (not allowed by current standards). Or someone was over-zealous when adding default arguments. It happens to the best of us.

In some cases, the logic might flow better with substr().
Imagine a case where you have a buffer as a string pointer and a pointer to some string metrics object.
if (metrics) {
substring = buffer->substr(metrics->pos(), metrics->len());
} else {
substring = buffer->substr();
}
reads better than
if (metrics) {
substring = buffer->substr(metrics->pos(), metrics->len());
} else {
substring = *buffer;
}

Related

What is the correct way to “clear" a std::string_view?

I see this post: c++ - Why doesn't std::string_view have assign() and clear() methods? - Stack Overflow, so string_view does not contain clear function.
But in my case, I have a string_view as a class member variable, and sometimes, I would like to reset it to an empty string. Currently, I'm using this way:
sv = "";
Which looks OK, but I'd see other suggestions, thanks!
An empty "" string is still a string of length zero. While that will work, data() will not return a nullptr the way it would for a truly empty string_view. If you do want to reset the string_view, it would be best to assign an empty string_view to it: sv = string_view{};.
If what you want is to set the size of the string_view to zero, then what you're doing is OK (at the cost of a byte somewhere in your binary).
There are alternatives in the post that you referenced:
sv = {}
sv = sv.substr(0, 0)
Yet another way:
sv.remove_prefix(sv.size())

Make C++ std::string end at specified character?

Let's say I am traversing a string of length n. I want it to end at a specific character that fulfils some conditions. I know that C style strings can be terminated at the i'th position by simply assigning the character '\0' at position i in the character array.
Is there any way to achieve the same result in an std::string (C++ style string)? I can think of substr, erase, etc. but all of them are linear in their complexity, which I cannot afford to use.
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
You can use resize:
std::string s = /* ... */;
if (auto n = s.find(c); n != s.npos) {
s.resize(n);
}
The logical answer here is basic_string::resize. What the standard says about this function is:
Effects: Alters the length of the string designated by *this as follows:
If n <= size(), the function replaces the string designated by *this with a string of length n whose elements are a copy of the initial elements of the original string designated by *this.
If n > size(), the function replaces the string designated by *this with a string of length n whose first size() elements are a copy of the original string designated by *this, and whose remaining elements are all initialized to c.
Now, that looks very much like linear time. However, the standard does not specifically state that things will happen this way. They only state that it will be "as if" things happen this way. Therefore, an implementation is completely free to implement the shrinking version of resize by shifting one pointer and writing a NUL character. Nothing in the standard would forbid such an implementation.
So the real question is... are standard library implementations written by complete morons? It's certainly possible that they are. But it's probably wise not to assume so.
Personally, I'd just use resize on the assumption that the library implementers know what they're doing. After all, if they can't write an optimization as simple as that, then who knows what other things they're doing wrong? If you can't trust your standard library implementation not to do stupid things, then you shouldn't be using it in performance-critical code.
is there any "end" character for an std::string?
No. It is possible to define a std::string that is not null terminated. You won't be able to do a few things for such strings, such as treat the return value of std::string:data() as a null terminated C string 1, but a std::string can be constructed that way.
Can I make the end iterator point to the current character somehow?
To get a std::string::iterator point to a certain character, you'll have to traverse the string.
E.g.
std::string str = "This is a string";
auto iter = str.begin();
auto end = iter;
while ( end != str.end() && *end != 'r' )
++end;
After that, the range defined by iter and end contains the string "This is a st".
If that is not acceptable, you'll have to adapt your code to check the value of the character for every step.
std::string str = "This is a string";
auto iter = str.begin();
// Break when 'r' is encountered or end of string is reached.
while ( iter != str.end() && *iter != 'r' )
{
// Use *iter
...
}
1 Thanks are due to #Cubbi for pointing out an error in what I stated. std::string::data() can return a char const* that is not null terminated if using a version of C++ earlier than C++11. If using C++11 or later, std::string::data() is required to return a null terminated char const*.
std::string does not have an "end character" like c style strings. You can have many null terminators inside a single std::string. If you want to the string to end after a certain character then you need to erase the rest of the characters in the string after that last character.
In your case that would give you something like
string_variable.erase(pos_of_last_character + 1)
TL;DR, is there any "end" character for an std::string? Can I make the end iterator point to the current character somehow?
Not really. std::string uses the std::string::size() function to keep track of the number of characters stored and maintained independently of any sentinel characters like '\0'.
Though these are considered when a std::string is initialized from a const char*.

C++primer 5th about func parameter

Its question is" Give the second parameter of make_plural (§ 6.3.2, p. 224) a default argument of 's'. Test your program by printing singular and plural versions of the words success and failure"
here is the make_plural.
string make_plural(size_t ctr, const string& word, const string& ending )
{
return (ctr > 1) ? word + ending : word;
}
Does it mean that change the 'ending', but ending is the third parameter, isn't it?
This question worries me a lot!
Regards!
That must be a typo.
Looking at the code:
string make_plural(size_t ctr, const string& word, const string& ending )
{
return (ctr > 1) ? word + ending : word;
}
the most reasonable thing would be to have "s" as default for ending, as this is how you make the plural by default (not always, but with "bee" -> "bees" e.g. it works).
A much stronger argument is that in C++ it is not possible (unless you find a magic workaround (*)) to have a default argument for the n-th parameter if the (n+1)-th has no default argument:
foo(int first = 0,int second) // not possible !!
With this example it is maybe not so clear why this isnt allowed, but consider having multiple default values. Lets say you would write:
foo(int first = 0,int second,int third = 0); // actually still not allowed
Then there would be no way to know if
foo(1,2);
is supposed to call
foo(0,1,2);
or
foo(1,2,0);
To resolve this ambiguity some rule had to be invented and for C++ the rule is that default arguments have to be provided from right to left.
(*) If you can change the function and are willing to write some extra code, the workaround is rather trivial. You just have to encapsulate all parameters in a struct that provides creation of parameters with whatever combination of defaults you like.

std::string without null-character

Is a std::string without a null-character in the end valid and can it be acquired like this?:
std::string str = "Hello World";
str.resize(str.size() - 1);
For those who are curious:
I have a 3rd party function taking a string and iterating over the chars (using iterators). Unfortunately the function is buggy (as its a dev-version) and cannot deal with null-characters. I dont have another signature to chose from, I cant modify the function (as I said, 3rd party and we dont want to fork) and at the same time I dont want to reinvent the wheel. As far as I can tell, the function should work as desired without the null-character so I want atleast to give it a try.
The iteration takes place like this:
bool nextChar(CharIntType& c)
{
if (_it == _end) return false;
c = *_it;
++_it;
return true;
}
where _it is initialized to std::string::begin() and _end to std::string::end()
Until C++11, std::string was not required to include a trailing nul until you called c_str().
http://en.cppreference.com/w/cpp/string/basic_string/data
std::string::data()
Returns pointer to the underlying array serving as character storage. The pointer is such that the range [data(); data() + size()) is valid and the values in it correspond to the values stored in the string.
The returned array is not required to be null-terminated.
If empty() returns true, the pointer is a non-null pointer that should not be dereferenced. (until c++11)
The returned array is null-terminated, that is, data() and c_str() perform the same function.
If empty() returns true, the pointer points to a single null character. (since c++11)
From this we can confirm that std::string::size does not include any nul terminator, and that std::string::begin() and std::string::end() describe the ranges you are actually looking for.
We can also determine this by the simple fact that std::string::back() doesn't return a nul character.
#include <iostream>
#include <string>
int main() {
std::string s("hello, world");
std::cout << "s.front = " << s.front() << " s.back = " << s.back() << '\n';
return 0;
}
http://ideone.com/nUX0AB
While it is possible to have non null terminated strings I would not recommend it, strings are null terminated for a good reason, i would actually recommend in this instance that you either go ahead and write the function properly or get in touch with the third party and have them fix it.
To answer your questions yes a std::string is valid if it is not null terminated, to achieve this you can use the overload of string copy with a maximum length loaded, once again i do not recommend this.
See this page for more information:
http://c2.com/cgi/wiki?NonNullTerminatedString
This is a very late answer but I just post it so that anyone who comes later can use it for their reference. If you write a null terminated string into the string.data() array, it will terminate the string and would not let you to continue concatenate the string if you need to. The way to solve it is already answer in the question.
str.resize(str.size() - 1);
This would solve the problem, I have tested out in my code.

Casting string type with GetDlgItemText() for use as string buffer in C++

I am stumped by the behaviour of the following in my Win32 (ANSI) function:
(Multi-Byte Character Set NOT UNICODE)
void sOut( HWND hwnd, string sText ) // Add new text to EDIT Control
{
int len;
string sBuf, sDisplay;
len = GetWindowTextLength( GetDlgItem(hwnd, IDC_EDIT_RESULTS) );
if(len > 0)
{
// HERE:
sBuf.resize(len+1, 0); // Create a string big enough for the data
GetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sBuf.data(), len+1 );
} // MessageBox(hwnd, (LPSTR)sBuf.c_str(), "Debug", MB_OK);
sDisplay = sBuf + sText;
sDisplay = sDisplay + "\n\0"; // terminate the string
SetDlgItemText( hwnd, IDC_EDIT_RESULTS, (LPSTR)sDisplay.c_str() );
}
This should append text to the control with each call.
Instead, all string concatenation fails after the call to GetDlgItemText(), I am assuming because of the typecast?
I have used three string variables to make it really obvious. If sBuf is affected then sDisplay should not be affected.
(Also, why is len 1 char less than the length in the buffer?)
GetDlgItemText() corretly returns the content of the EDIT control, and SetDlgItemText() will correctly set any text in sDisplay, but the concatenation in between is just not happening.
Is this a "hidden feature" of the string class?
Added:
Yes it looks like the problem is a terminating NUL in the middle. Now I understand why the len +1. The function ensures the last char is a NUL.
Using sBuf.resize(len); will chop it off and all is good.
Added:
Charles,
Leaving aside the quirky return length of this particular function, and talking about using a string as a buffer:
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself.
That's precisely what's needed isn't it?
Further, it requires that the program must not alter any of the values of that array.
As I understand it that is going to change along with the guarantee that all bytes are contiguous. I forget where I read a long article on this, but MS already implements this it asserted.
What I don't like about using a vector is that the bytes are copied twice before I can return them: once into the vector and again into the string. I also need to instantiate a vector object and a string object. That is a lot of overhead. If there were some string friendly of working with vectors (or CStrings) without resorting to old C functions or sopying characters one by one, I would use them. The string is very syntax friendly in that way.
The data() function on a std::string returns a const char*. You are not allowed to right into the buffer returned by it, it may be a duplicated buffer.
What you could do instead is to used a std::vector<char> as a temporary buffer.
E.g. (untested)
std::vector<char> sBuf( len + 1 );
GetDlgItemText( /* ... */, &sBuf[0], len + 1 );
std::string newText( &sBuf[0] );
newText += sText;
Also, the string you pass to SetDlgItemText should be \0 terminated so you should used c_str() not data() for this.
SetDlgItemText( /* ... */, newText.c_str() );
Edit:
OK, I've just checked the contract for GetWindowTextLength and GetDlgItemText. Check my edits above. Both will include the space for a null terminator so you need to chop it off the end of your string otherwise concatenation of the two strings will include a null terminator in the middle of the string and the SetDlgItemText call will only use the first part of the string.
There is a further complication in that GetWindowTextLength isn't guaranteed to be accurate, it only guarantees to return a number big enough for a program to create a buffer for storing the result. It is extremely unlikely that this will actually affect a dialog box item owned by the calling code but in other situations the actual text may be shorter than the returned length. For this reason you should search for the first \0 in the returned text in any case.
I've opted to just use the std::string constructor that takes a const char* so that it finds the first \0 correctly.
The standard describes the return value of basic_string::data() to be a pointer to an array whose members equal the elements of the string itself. Further, it requires that the program must not alter any of the values of that array. This means that the return value of data() may or may not be a copy of the string's internal representation and even if it isn't a copy you still aren't allowed to write to it.
I am far away from the win32 api and their string nightmare, but there is something in the code that you can check. Standard C++ strings do not need to be null terminated and nulls can happen anywhere within the string. I won't comment on the fact that you are casting away constantness with your C-style cast, which is a problem on its own, but rather on the strange effect you are
When you initially create the string you allocate extra space for the null (and initialize all elements to '\0') and then you copy the elements. At that point your string is len+1 in size and the last element is a null. After that you append some other string, and what you get is a string that will still have a null character at position len. When you retrieve the data with either data() (does not guarantee null termination!) or c_str() the returned buffer will still have the null character at len position. If that is passed to a function that stops on null (takes a C style string), then even if the string is complete, the function will just process the first len characters and forget about the rest.
#include <string>
#include <cstdio>
#include <iostream>
int main()
{
const char hi[] = "Hello, ";
const char all[] = "world!";
std::string result;
result.resize( sizeof(hi), 0 );
// simulate GetDlgItemText call
std::copy( hi, hi+sizeof(hi), const_cast<char*>(result.data()) ); // this is what your C-style cast is probably doing
// append
result.append( all );
std::cout << "size: " << result.size() // 14
<< ", contents" << result // "Hello, \0world!" - dump to a file and edit with a binary editor
<< std::endl;
std::printf( "%s\n", result.c_str() ); // "Hello, "
}
As you can see, printf expects a C-style string and will stop when the first null character is found, so that it can seem as if the append operation never took place. On the other hand, c++ streams do work properly with std::string and will dump the whole content, checking that the strings were actually appended.
A patch to your append operation disappearing would be removing the '\0' from the initial string (reserve only len space in the string). But that is not really a good solution, you should never use const_cast (there are really few places where it can be required and this is not one of them), the fact that you don't see it is even worse: using C style casts is making your code look nicer than it is.
You have commented on another answer that you do not want to add std::vector (which would provide with a correct solution as &v[0] is a proper mutable pointer into the buffer), of course, not adding the extra space for the '\0'. Consider that this is part of an implementation file, and the fact that you use or not std::vector will not extend beyond this single compilation unit. Since you are already using some STL features, you are not adding any extra requirement to your system. So to me that would be the way to go. The solution provided by Charles Bailey should work provided that you remove the extra null character.
This is NOT an answer. I have added it here as an answer only so that I can use formatting in a long going discussion about const_cast.
This is an example where using const_cast can break a running application:
#include <iostream>
#include <map>
typedef std::map<int,int> map_type;
void dump( map_type const & m ); // implemented somewhere else for concision
int main() {
map_type m;
m[1] = 10;
m[2] = 20;
m[3] = 30;
map_type::iterator it = m.find(2);
const_cast<int&>(it->first) = 10;
// At this point the order invariant of the container is broken:
dump(); // (1,10),(10,20),(3,30) !!! unordered by key!!!!
// This happens with g++-4.0.1 in MacOSX 10.5
if ( m.find(3) == m.end() ) std::cout << "key 3 not found!!!" << std::endl;
}
That is the danger of using const_cast. You can get away in some situations, but in others it will bite back, and probably hard. Try to debug in thousands of lines where the element with key 3 was removed from the container. And good luck in your search, for it was never removed.