std::string& vs boost::string_ref - c++

Does it matter anymore if I use boost::string_ref over std::string& ? I mean, is it really more efficient to use boost::string_ref over the std version when you are processing strings ? I don't really get the explanation offered here: http://www.boost.org/doc/libs/1_61_0/libs/utility/doc/html/string_ref.html . What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory, and since c++11, with move semantics the copy operations noted in the article above are not going to happen. So, which one is more efficient ?

The use case for string_ref (or string_view in recent Boost and C++17) is for substring references.
The case where
the source string happens to be std::string
and the full length of a source string is referenced
is a (a-typical) special case, where it does indeed resemble std::string const&.
Note also that operations on string_ref (like sref.substring(...)) automatically return more string_ref objects, instead of allocating a new std::string.

I have never used it be it seems to me that its purpose is to provide an interface similar to std::string but without having to allocate a string for manipulation. Take the example given extract_part(): it is given a hard-coded C array "ABCDEFG", but because the initial function takes a std::string an allocation takes place (std::string will have its own version of "ABCDEFG"). Using string_ref, no allocation occurs, it uses the reference to the initial "ABCDEFG". The constraint is that the string is read-only.

This answer uses the new name string_view to mean the same as string_ref.
What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory
A string allocates, owns, and manages its own memory. A string_view is a handle to some memory that was already allocated. The memory is managed by some other mechanism, unrelated to the string_view.
If you already have some text data, for example in a char array, then the additional memory allocation involved in constructing a string might be redundant. A string_view could be more efficient because it would allow you to operate directly on the original data in the char array. However, it would not permit the data to be modified; string_view allows no non-const access, because it doesn't own the data it refers to.
and since c++11, with move semantics the copy operations noted in the article above are not going to happen.
You can only move from an object that is ready to be discarded. Copying still serves a purpose and is necessary in many cases.
The example in the article constructs two new strings (not copies) and also constructs two copies of existing strings. In C++98 the copies could already be elided by RVO without move semantics, so they're not a big deal. By using string_view it avoids constructing the two new strings. Move semantics are irrelevant here.
In the call to extract_part("ABCDEFG") a string_view is constructed which refers to the char array represented by the string literal. Constructing a string here would have involved a memory allocation and a copy of the char array.
In the call to bar.substr(2,3) a string_view is constructed which refers to parts of the data already referred to by the first string_view. Using a string here would have involved another memory allocation and copy of part of the data.
So, which one is more efficient?
This is a bit like asking if a hammer is more efficient than a screwdriver. They serve different purposes, so it depends what it is you're trying to accomplish.
You need to be careful when using string_view that the memory it refers to remains valid throughout its lifetime.

If you stick to std::string it does not matter, but boost::string_ref also supports const char*. That is, do you intend to call your string processing function foo with std::string only?
void foo(const std::string&);
foo("won't work"); // no support for `const char*`
Since boost::string_ref is constructable from const char*, it is more flexible since it works with both const char* and std::string.
The proposal N3442 might be helpful.

In short: The main benefit of std::string_view over const std::string& is that you can pass both const char* and std::string objects without doing a copy. As others have said, it also allows you to pass substrings without copying, although (in my experience) this is somewhat less often important.
Consider the following (silly) function (yes I know you could just call s.at(2)):
char getThird(std::string s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This function works, but the string is passed by value. This means the whole length of the string is copied even though we don't look at all of it, and it also (often) incurs a dynamic memory allocation. Doing this in a tight loop can be very expensive. One solution to this is to pass the string by const reference instead:
char getThird(const std::string& s);
This works a lot better if you have a std::string variable and you pass it as a parameter to getThird. But now there's a problem: what if you have a null-terminated const char* string? When you call this function, a temporary std::string will get constructed, so you still get still get the copy and dynamic memory allocation.
Here's another attempt:
char getThird(const char* s)
{
if (std::strlen(s) < 3) throw std::runtime_error("String too short");
return s[2];
}
This will obviously now work fine for const char* variables. It will also work for std::string variables, but calling it is a little awkward: getThird(myStr.c_str()). What's more, std::string supports embedded null characters, and getThird will misinterpret the string as ended at the first of these. At worst this could cause a security vulnerability - imagine if the function were called checkStringForBadHacks!
Another problem is simply that it's annoying to write a function in terms of old null-terminated strings instead of std::string objects with their handy methods. Did you notice, for example, that this function looks at the whole length of the string even though only the first few characters are important? It's hidden in std::strlen, which iterates over all characters looking for the null terminator. We could replace that with a manual check that the first three characters aren't null, but you can see this is a lot less convenient than the other versions.
Step in std::string_view (or boost::string_view, previously known as boost::string_ref):
char getThird(std::string_view s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This gives you the nice methods you expect from a proper string class, like .size(), and it works in both the situations discussed above, plus another:
It works with std::string objects, which can be implicitly be converted to std::string_view objects.
It works with const char* null-terminated strings, which can also be implicitly be converted to std::string_view objects.
This does have the potential disadvantage that constructing the std::string_view requires iterating over the whole string to find the length, even if the function that uses it never needs it (as is the case here). However, if a caller is using a const char* as a parameter to several functions (or one function in a loop) that take std::string_view objects it could always manually construct that object beforehand. This could even give a performance increase, because if that function(s) do need the length then it is precomputed once and reused.
As other answers have mentioned, it also avoids a copy when you only want to pass a substring. For example, this is very useful in parsing. But std::string_view is justified even without this feature.
It's worth noting that there is a case where the original function signature, taking a std::string by value, may actually be better than a std::string_view. That's where you were going to make a copy of the string anyway, for example to store in some other variable or to return from the function. Imagine this function:
std::string changeThird(std::string s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
s[2] = c;
return s;
}
// vs.
std::string changeThird(std::string_view s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
std::string result = s;
result[2] = c;
return result;
}
Note that both of these involve exactly one copy: In the first case this is done implicitly when the parameter s is constructed from whatever is passed in (including if it is another std::string). In the second case we do it explicitly when we create result. But the return statement does not do a copy, because uses move semantics (as if we had done std::move(result)), or more likely uses the return value optimisation.
The reason the first version can be better is that it is actually possible for it to perform zero copies, if the caller moves the argument:
std::string something = getMyString();
std::string other = changeThird(std::move(something), "x");
In this case, the first changeThird does not involve any copy at all, whereas the second one does.

Related

Why is `std::string_view` not implemented differently?

Given the following code we can see that std::string_view is invalidated when string grows beyond capacity (here SSO is in effect initially then contents are put on the heap)
#include <iostream>
#include <cassert>
#include <string>
using std::cout;
using std::endl;
int main() {
std::string s = "hi";
std::string_view v = s;
cout << v << endl;
s = "this is a long long long string now";
cout << v << endl;
}
output:
hi
#
so if I store a string_view to a string then change the contents of the string I can be in big trouble.
Would it be possible, given the existing std::string implementations to make a smarter string_view? which would not face such a drawback? We could store a pointer to the string object itself and then determine if the string is in SSO more or not and work accordingly.(Not sure how this would work with literal strings though, so maybe that is why it was not done this way?)
I am aware that string_view is akin to storing the return value of string::c_str() but given we have this wrapper around std::string I do not think this gotcha would occur to a lot of people using this feature. Most disclaimers are to make sure the pointed to std::string is within scope but this is a different issue altogether.
string_view knows nothing about string. It is not a "wrapper" around a string. It has no idea that std::string even exists as a type; the conversion from string to string_view happens within std::string. string_view has no association with or reliance on std::string.
In fact, that is the entire purpose of string_view: to be able to have a non-modifiable sized string without knowing how it is allocated or managed. That it can reference any string type that stores its characters contiguously is the point of the thing. It allows you to create an interface that takes a string_view without knowing or caring whether the caller is using std::string, CString, or any other string type.
Since the owning string's behavior is not string_view's business, there is no possible mechanism for string_view to be told when the string it references is no longer valid.
We could store a pointer to the string object itself and then determine if the string is in SSO more or not and work accordingly.
For the sake of argument, let us ignore that string_view is not supposed to know or care whether its characters come from std::string. Let's assume that string_view only works with std::string (even though that makes the type completely worthless).
Even then, this would not work. Or rather, it would only work if the type was functionally no different from a std::string const&.
If string_view stores a pointer to the first character and a size, then any modification to the std::string might change this. It could change the size even without breaking small-string optimization. It could change the size without causing reallocation. The only way to correct this is to have the string_view always ask the std::string it references what its character data and size are.
And that's no different from just using a std::string const& directly.

C++: What does a const reference to the return value of a function mean?

Here's a snippet from my C++ code:
std::queue<std::string> get_file_names(const std::string &indir)
{
std::queue<std::string> file_names;
fs::recursive_directory_iterator end;
for (fs::recursive_directory_iterator it(indir); it != end; it++) {
const std::string &extn = it->path().extension().string();
if (extn == ".zip") {
const std::string &file_name = it->path().string();
file_names.push(file_name);
}
}
return file_names;
}
Is it a good practice to make every string you won't modify a const reference? I have trouble understanding how can such a reference exist in this context at all. Like the return value of it->path().string() above. How can it be assigned to a reference that can be later used outside of the scope of the function when pushed back to a vector?
I feel like it has to do something with std::move.
Your code:
const std::string &file_name = it->path().string();
extends the lifetime of the temporary std::string returned by std::filesystem::path::string(). Since you've marked that as const, it can't be moved into file_names, it must be copied. Assuming you want a move, you would write:
auto&& file_name = // ...
file_names.push(std::move(file_name));
Notice that std::queue has a push() overload for r-value references.
Modern C++ provides a lot of opportunities for compilers to optimize, so avoiding questions/"confusion" about dangling references (the auto&& syntax is "new" in C++11) might be a better approach:
auto file_name = // ...
file_names.push(std::move(file_name));
Writing "natural" code that "looks and behaves like the ints" is often a good approach. In the unlikely situation you find that this is really a performance bottleneck, you can revisit; write your code for clarity first.
No I wouldn't recommend to ever use const std::string&. If you want to have a viewer to a string then use std::string_view instead. std::string_view is not suitable only in the case with old C-style API where function accepts const char* as input without an option to submit its size; hopefully these APIs die out eventually. All decent API always add option to submit const char* coupled with its size as an additional option.
If you want a std::string then just use std::string or if you want to explicitly state that you don't intend to change it then just make it const std::string.
In your case as pointed by other answers, the returned object by the path's method .string() is a std::string so capturing it as const std::string& explicitly is just non-sensical.
If you stored it as std::string then at least you could've moved it into the output std::queue<std::string> file_names.
Edit: about why old C-style strings aren't good and that length should be forwarded. Please check this article
https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/
Turns out GTA's loading times very crazy slow (several minutes, 5-6 on average) because it was stuck computing strlen over and over again while reading a 10mb json file.
Explanations
How can it be assigned to a reference that can be later used outside of the scope of the function when pushed back to a vector?
The variable file_names is of type std::queue<std::string> (and not std::queue<std::string&> - that is by the way not possible this way, but by using std::reference_wrapper). So it does not store "references to strings" but the actual "strings".
If you push a string reference, actually a copy of the referenced string will be pushed.
Is it a good practice to make every string you won't modify a const reference?
Regarding "best practice" consider using std::string_view in exchange for const string references (at least since C++17).
I feel like it has to do something with std::move.
It does not. In this case it has something to do with the type argument of the std::queue.
Bonus
You can store references within a (STL) container by using a reference_wrapper as type argument.

Does using string_view lead to an unnecessary string copy in this scenario?

What I'm trying to do is have my class accept a string during construction. I read that string_view was a replacement for const string& so naturally I wrote a constructor like this. This allows me to accept c++ and c strings.
Url::Url(boost::string_view raw_url)
: url_(static_cast<std::string>(raw_url)) {
The problem which might be here is that when an rvalue is passed, there is a unnecessary copy instead of a move. Is the solution to make another constructor which takes string&&? What is the best practice here?
Note: I'll write the answer considering std::string_view, but boost::string_view should be similar.
I read that string_view was a replacement for const string&
It is useful to understand the reason why std::string_view was added in the first place.
What is the problem with const string&?
It is certainly efficient if you pass an std::string object and you don't gain anything with std::string_view. But suppose now that you are passing a char* containing a large string. In that case a temporary std::string object will be created (and the whole string referenced by that char* will be copied) just so the function receives an std::string. That is when std::string_view shines. If you pass a char* or a std::string (or anything that can be converted to std::string_view) to a function accepting std::string_view (by value, no need to accept a std::string_view by reference) then the new std::string_view object that is created is very cheap, since it is only "a view" and it does not copy the underlying string.
But your case is different. Since you are copying the string anyway, then your function should just accept a string by value and move the string inside the function. Something such as
Url::Url(std::string raw_url)
: url_(std::move(raw_url)) {
There is even a clang-tidy warning to tell you this.
The advantage is that if the user of your function pass an lvalue you make a copy (you need it anyway) and a move, but if they pass an rvalue then there is no copy and only a move.

Function that takes a char array as a parameter

There is a function I want to use that takes char str[] as a parameter. I want to call the function giving a string input.
void someFunction (char str[]) {
/* ... */
}
// Works.
someFunction("1010101");
// Does not work.
string someString;
someFunction(someString);
How can I get the second call to work?
EDIT: I cannot change the function's input parameters.
Depends on the nature of the string manipulations. If you read but don't write the string, change the prototype to const char str[] and use someString.c_str(), like others are suggesting.
If you change the characters but not the length of the string, use &*someString.begin().
If you extend/truncate the string, it's easier to pass a string& and work in terms of the string object. Less trouble, honestly.
You should be able to do:
someFunction(const_cast<char*>(someString.c_str()));
Although I'm not sure what will happen if str gets modified.
It's probably best if you just modify the original function to take a different parameter type.
What you want for std::string is void someFunction(std::string& str);
There's a reason for the issue -- a std::string's data is not guaranteed to be contiguous memory (at least, before C++11). Therefore, manipulating its buffer as a contiguous allocation (char[]) is a very bad idea.
casting away the const of std::string::c_str() is also a bad idea. One immediate problem you may face is that a std::string implementation may share backing string allocations with other std::string instances (copy-on-write), and you will end up modifying the values of other std::strings. Of course, there are many other bad things that could go wrong in their own implementation-defined ways -- the standard left this very flexible for the implementors of standard libraries.
EDIT: I cannot change the function's input parameters.
Use a std::vector instead.
You could have your function take a std::string instead:
void someFunction (std::string &str) {

std::string vs string literal for functions

I was wondering, I normally use std::string for my code, but when you are passing a string in a parameter for a simply comparison, is it better to just use a literal?
Consider this function:
bool Message::hasTag(string tag)
{
for(Uint tagIndex = 0; tagIndex < m_tags.size();tagIndex++)
{
if(m_tags[tagIndex] == tag)
return 0;
}
return 1;
}
Despite the fact that the property it is making a comparison with is a vector, and whatever uses this function will probably pass strings to it, would it still be better to use a const char* to avoid creating a new string that will be used like a string literal anyway?
If you want to use classes, the best approach here is a const reference:
bool Message::hasTag(const string& tag);
That way, redudant copying can be minimized and it's made clear that the method doesn't intend to modify the argument. I think a clever compiler can emit pretty good code for the case when this is called with a string literal.
Passing a character pointer requires you to use strcmp() to compare, since if you start comparing pointers directly using ==, there will be ... trouble.
Short answer: it depends.
Long answer: std::string is highly useful because it provides a lot of utility functions for strings (searching for substrings, extracting substrings, concatenating strings etc.). It also manages the memory for you, so the ownership of the string cannot be confused.
In your case, you don't need either. You just need to know whether any of the objects in m_tags matches the given string. So for your case, writing the function using a const char *s is perfectly sufficient.
However, as a foot note: you almost always want to prefer std::string over (const) char * when talking about return values. That's because C strings have no ownership semantics at all, so a function returning a const char * needs to be documented very carefully, explaining who owns the pointed to memory (caller or callee) and, in case the callee gets it, how to free it (delete[], delete, free, something else).
I think it would be enough to pass an reference rather than value of string. I mean:
bool Message::hasTag(const string& tag)
That would copy only the reference to the original string value. Which must be created somwhere anyway, but outside of the function. This function would not copy its parameter whatsoever.
Since m_tags is a vector of strings anyway (I suppose), const string& parameter would be better idea.