I am not 100% that the following code is semantically correct:
#include <iostream>
#include <experimental/string_view>
int main()
{
std::string str = "lvalue string";
std::experimental::string_view view_lvalue(str);
std::experimental::string_view view_rvalue(std::string{"rvalue string"});
std::cout << view_lvalue << '\n' << view_rvalue << '\n';
}
Live on Wandbox
Question: Can I legally bind a rvalue to std::experimental::basic_string_view, or is it just UB? If yes, how does it work? As far as I know, a rvalue does not bind to a const reference (which I assume the view holds to the original string) via the constructor, so I thought that at the end of the statement std::experimental::string_view view_rvalue(std::string{"rvalue string"}); the reference will be dangling. Does string_view use a more sophisticated approach?
I am asking this because I am trying to write a similar view for some matrix class, and don't yet know how to deal with rvalues (I can disable them of course, but I don't think it's the best approach).
If cpprefernce is correct then this is UB. std::string_view has
A typical implementation holds only two members: a pointer to constant CharT and a size.
And the constructor has
Constructs a view of the first str.size() characters of the character array starting with the element pointed by str.data().
So if string_view just points to the underlying char array of the provided string then we will have a dangling pointer once the expression ends and the temporary is destroyed.
As pointed out in the comments one reason this behavior may have been allowed is so you can pass a string_view to a function and construct that string_view from a temporary string
Related
Suppose I have the following code:
void some_function(std::string_view view) {
std::cout << view << '\n';
}
int main() {
some_function(std::string{"hello, world"}); // ???
}
Will view inside some_function be referring to a string which has been destroyed? I'm confused because, considering this code:
std::string_view view(std::string{"hello, world"});
Produces the warning (from clang++):
warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling-gsl]
What's the difference?
(Strangely enough, using braces {} rather than brackets () to initialise the string_view above eliminates the warning. I've no idea why that is either.)
To be clear, I understand the above warning (the string_view outlives the string, so it holds a dangling pointer). What I'm asking is why passing a string into some_function doesn't produce the same warning.
std::string_view is nothing other than std::basic_string_view<char>, so let's see it's documentation on cppreference:
The class template basic_string_view describes an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero.
A typical implementation holds only two members: a pointer to constant CharT and a size.
The part I have highlighted tells us why clang is right about std::string_view view(std::string{"hello, world"});: as others have commented it's because after the declaration is done, std::string{"hello, world"} is destroyed and that underlying pointer that the std::string_view holds dangles.
Clearly that's just a typical implementation, but since we know it is correct, it tells us at least that the standard doesn't require any implmentation to do something special to keep temporaries alive.
some_function(std::string{"hello, world"}); is completely safe, as long as the function doesn't preserve the string_view for later use.
The temporary std::string is destroyed at the end of this full-expression (roughly speaking, at this ;), so it's destroyed after the function returns.
std::string_view view(std::string{"hello, world"}); always produces a dangling string_view, regardless of whether you use () or {}. If the choice of brackets affects compiler warnings, it's a compiler defect.
Is it safe to pass an std::string temporary into an std::string_view parameter?
In general, it isn't necessarily safe. It depends on what the function does. If you don't know, then you shouldn't assume it to be safe.
Knowing the definition of the function as shown, it is safe to call the example function with a temporary string.
Will view inside some_function be referring to a string which has been destroyed?
Not in this case, because the temporary argument string - which the string view refers to - hasn't been destroyed.
What's the difference?
The parameter of the function has shorter lifetime than the lifetime of the temporary passed as the argument. The lifetime of the string view variable is longer than the lifetime of the temporary argument passed to the constructor.
Just as others have said, some_function(std::string{"hello, world"}); is totally safe since it passes it by value and stays in scope until the function ends. If safety is all you are concerned with, that will do, if performance could be an issue, I'll recommend using an rvalue reference here like so:
void some_function(std::string_view&& view)
{
std::cout << "rval reference: " << view << '\n';
}
int main()
{
some_function(std::string{"hello, world"});
}
R-value references are great if you are going to use some_function() mainly for temporary values.
I made a mistake in a socket interface I wrote a while back and I just noticed the problem while looking through the code for a different issue. The socket receives a string of characters and passes it to jsoncpp to complete the json parsing. I can almost understand what is happening here but I can't get my head around it. I would like to grasp what is actually happening under the hood. Here is the minimum example:
#include <iostream>
#include <cstring>
void doSomethingWithAString(const std::string &val) {
std::cout << val.size() << std::endl;
std::cout << val << std::endl;
}
int main()
{
char responseBufferForSocket[10000];
memset(responseBufferForSocket, 0, 10000);
//Lets simulate a response from a socket connection
responseBufferForSocket[0] = 'H';
responseBufferForSocket[1] = 'i';
responseBufferForSocket[2] = '?';
// Now lets pass a .... the address of the first char in the array...
// wait a minute..that's not a const std::string& ... but hey, it's ok it *works*!
doSomethingWithAString(responseBufferForSocket);
return 0;
}
The code above is not causing any obvious issues but I would like to correct it if there is a problem lurking. Obviously the character array is being transformed to a string, but by what mechanism? I guess I have four questions:
Is this string converted on the stack and passed by reference or is it passed by value?
Is it using the operator= overload? A "from c-string" constructor? Some other mechanism?
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
Is this dangerous. :)
compiled with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
std::string has a non explicit constructor (i.e. not marked with the explicit keyword) that takes a const char* parameter and copies characters until the first '\0' (the behaviour is undefined if no such character exists in the string). In other words, it performs a copy of the source data. It's overload #5 on this page.
const char[] implicitly decays to const char*, and you can pass a temporary to a function taking a const reference parameter. This only works if the reference is const, by the way; if you can't use const, pass it by value.
And so, when you pass a const char[] to that function, a temporary object of type std::string is constructed using that constructor, and bound to the parameter. The temporary will remain alive for the duration of the function call, and will be destroyed when it returns.
With all that in mind, let's address your questions:
It's passed by reference, but the reference is to a temporary object.
A constructor, since we're constructing an object. std::string also has an operator= taking a const char* parameter, but that's never used for implicit conversions: you'll need to be explicitly assigning something.
The performance is the same since the same code runs, but you do incur some overhead because the data is copied instead of referenced. If that is an issue, use std::string_view instead.
It's safe as long as you don't try to keep a reference or pointer to the parameter for longer than the function call, because the object might not be alive afterwards (but then you should always keep that in mind with reference parameters). You also need to make sure that the C string you're passing is properly null terminated.
Is this string converted on the stack
The language doesn't specify the storage of temporary objects, but in this case it is probably stored on the stack, yes.
or is it passed by value?
The argument is a reference. Therefore you are "passing by reference".
Is it using the operator= overload?
No. You aren't using operator= there, so why would it?
A "from c-string" constructor?
Yes.
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
No. Whether object is created implicitly or explicitly is irrelevant to efficiency.
Creating a std::string is however potentially less efficient than not creating it which you could achieve by not accepting a reference to a string as the argument. You could use a string view instead.
Is this dangerous.
Not particularly. In some cases implicit conversions can cause a bit of problems when the programmers doesn't notice them, but typically they simplify the language by reducing verbosity.
Consider the following code:
class Foo
{
private:
const string& _bar;
public:
Foo(const string& bar)
: _bar(bar) { }
const string& GetBar() { return _bar; }
};
int main()
{
Foo foo1("Hey");
cout << foo1.GetBar() << endl;
string barString = "You";
Foo foo2(barString);
cout << foo2.GetBar() << endl;
}
When I execute this code (in VS 2013), the foo1 instance has an empty string in its _bar member variable while foo2's corresponding member variable holds the reference to value "You". Why is that?
Update: I'm of course using the std::string class in this example.
For Foo foo1("Hey") the compiler has to perform a conversion from const char[4] to std::string. It creates a prvalue of type std::string. This line is equivalent to:
Foo foo1(std::string("Hey"));
A reference bind occurs from the prvalue to bar, and then another reference bind occurs from bar to Foo::_bar. The problem here is that std::string("Hey") is a temporary that is destroyed when the full expression in which it appears ends. That is, after the semicolon, std::string("Hey") will not exist.
This causes a dangling reference because you now have Foo::_bar referring to an instance that has already been destroyed. When you print the string you then incur undefined behavior for using a dangling reference.
The line Foo foo2(barString) is fine because barString exists after the initialization of foo2, so Foo::_bar still refers to a valid instance of std::string. A temporary is not created because the type of the initializer matches the type of the reference.
You are taking a reference to an object that is getting destroyed at the end of the line with foo1. In foo2 the barString object still exist so the reference remains valid.
Yeah, this is the wonders of C++ and understanding:
The lifetime of objects
That string is a class and literal char arrays are not "strings".
What happens with implicit constructors.
In any case, string is a class, "Hey" is actually just an array of characters. So when you construct Foo with "Hey" which wants a reference to a string, it performs what is called an implicit conversion. This happens because string has an implicit constructor from arrays of characters.
Now for the lifetime of object issue. Having constructed this string for you, where does it live and what is its lifetime. Well actually for the value of that call, here the constructor of Foo, and anything it calls. So it can call all sorts of functions all over and that string is valid.
However once that call is over, the object expires. Unfortunately you have stored within your class a const reference to it, and you are allowed to. The compiler doesn't complain, because you may store a const reference to an object that is going to live longer.
Unfortunately this is a nasty trap. And I recall once I purposely gave my constructor, that really wanted a const reference, a non-const reference on purpose to ensure exactly that this situation did not occur (nor would it receive a temporary). Possibly not the best workaround, but it worked at the time.
Your best option really most of the time is just to copy the string. It is less expensive than you think unless you really process lots and lots of these. In your case it probably won't actually copy anything, and the compiler will secretly move the copy it made anyway.
You can also take a non-const reference to a string and "swap" it in
With C++11 there is a further option of using move semantics, which means the string passed in will become "acquired", itself invalidated. This is particularly useful when you do want to take in temporaries, which yours is an example of (although mostly temporaries are constructed through an explicit constructor or a return value).
The problem is that in this code:
Foo foo1("Hey");
From the string literal "Hey" (raw char array, more precisely const char [4], considering the three characters in Hey and the terminating \0) a temporary std::string instance is created, and it is passed to the Foo(const string&) constructor.
This constructor saves a reference to this temporary string into the const string& _bar data member:
Foo(const string& bar)
: _bar(bar) { }
Now, the problem is that you are saving a reference to a temporary string. So when the temporary string "evaporates" (after the constructor call statement), the reference becomes dangling, i.e. it references ("points to...") some garbage.
So, you incur in undefined behavior (for example, compiling your code using MinGW on Windows with g++, I have a different result).
Instead, in this second case:
string barString = "You";
Foo foo2(barString);
your foo2::_bar reference is associated to ("points to") the barString, which is not temporary, but is a local variable in main(). So, after the constructor call, the barString is still there when you print the string using cout << foo2.GetBar().
Of course, to fix that, you should consider using a std::string data member, instead of a reference.
In this way, the string will be deep-copied into the data member, and it will persist even if the input source string used in the constructor is a temporary (and "evaporates" after the constructor call).
In what ways can you use the return values from things like boost::algorithm::join?
std::stringstream ss;
ss<<"quack";
std::cout << ss.str().c_str() << std::endl; // bad idea
This is a bad idea, explained in sbi's comment in https://stackoverflow.com/a/1430774/
std::vector<std::string> v;
v.push_back("foo");
v.push_back("bar");
std::cout << boost::algorithm::join(v,"-").c_str() << std::endl; // what about this?
That made me wonder if this has the same problem?
Could someone give an explanation of the scope of such return values?
Since you are not storing the reference to the char* there is no problem with both expressions:
From the standard.. http://isocpp.org/std/the-standard
Temporary objects are destroyed as the last step in evaluating the
full-expression (1.9) that (lexically) contains the point where they
were created. [12.2/3]
So in both cases above you use the char* pointer in the expression. The boost::algorithm::join and stringstream.str() are available till the end of the expression and so is the c_str pointer.
sbi comment in the link you sent referred to taking c_str() from a temporary string in one expression storing it in a const char* and passing that to a C function in a second statement.
Also I usually try use c_str only when calling C style functions or external library functions that require const char*.
In the case of an ostream<< it already accepts std::string and it takes 2 sec's to add operator<< functions to support new types.
std::string::c_str() returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object.
In C++98 it was required that "a program shall not alter any of the characters in this sequence". This was encouraged by returning a const char* .
IN C++11, the "pointer returned points to the internal array currently used by the string object to store the characters that conform its value", and I believe the requirement not to modify its contents has been dropped. Is this true?
Is this code OK in C++11?
#include<iostream>
#include<string>
#include<vector>
using namespace std;
std::vector<char> buf;
void some_func(char* s)
{
s[0] = 'X'; //function modifies s[0]
cout<<s<<endl;
}
int main()
{
string myStr = "hello";
buf.assign(myStr.begin(),myStr.end());
buf.push_back('\0');
char* d = buf.data(); //C++11
//char* d = (&buf[0]); //Above line for C++98
some_func(d); //OK in C++98
some_func(const_cast<char*>(myStr.c_str())); //OK in C++11 ?
//some_func(myStr.c_str()); //Does not compile in C++98 or C++11
cout << myStr << endl; //myStr has been modified
return 0;
}
3 Requires: The program shall not alter any of the values stored in the character array.
That requirement is still present as of draft n3337 (The working draft most similar to the published C++11 standard is N3337)
In C++11, yes the restriction for c_str() is still in effect. (Note that the return type is const, so no particular restriction is actually required for this function. The const_cast in your program is a big red flag.)
But as for operator[], it appears to be effect only due to an editorial error. Due to a punctuation change slated for C++14, you may modify it. So the interpretation is sort of up to you. Of course doing this is so common that no library implementation would dare break it.
C++11 phrasing:
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value
charT(); the referenced value shall not be modified.
C++14 phrasing:
Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.
You can pass c_str() as a read-only reference to a function expecting a C string, exactly as its signature suggests. A function expecting a read-write reference generally expects a given buffer size, and to be able to resize the string by writing a NUL within that buffer, which std::string implementations don't in fact support. If you want to do that, you need to resize the string to include your own NUL terminator, then pass & s[0] which is a read-write reference, then resize it again to remove your NUL terminator and hand the responsibility of termination back to the library.
I'd say that if c_str() returns a const char * then its not ok, even if it can be argued to be a gray area by a language lawyer.
The way I see it is simple. The signature of the method states that the pointer it returns should not be used to modify anything.
In addition, as other commenters have pointed out, there are other ways to do the same thing that do not violate any contracts. So it's definitely not ok to do so.
That said, Borgleader has found that the language still says it isn't.
I have verified that this is in the published C++11 standard
Thank you
what's wrong with &myStr.front()?
string myStr = "hello";
char* p1 = const_cast<char*>(myStr.c_str());
char* p2 = &myStr.front();
p1[0] = 'Y';
p2[1] = 'Z';
It seems that pointers p1 and p2 are exactly the same. Since "The program shall not alter any of the values stored in the character array", it would seem that the last two lines above are both illegal, and possibly dangerous.
At this point, the way I would answer my own question is that it is safest to copy the original std::string into a vector and then pass a pointer to the new array to any function that might possibly change the characters.
I was hoping that that this step might no longer be necessary in C++11, for the reasons I gave in my original post.