I'm teaching myself C++, and in the process I'm writing simple little programs to learn basic ideas. With respect to "pass-by-reference", I'm confused why the following piece of code works (some of the code is just there to practice overloading constructors):
#include <iostream>
#include <string>
using namespace std;
class Dude
{
public:
string x;
Dude(); // Constructor 1
Dude(const string &a); // Constructor 2
};
Dude::Dude() : x("hi") {}
Dude::Dude(const string &a) : x(a) {}
int main()
{
Dude d1;
Dude d2 = Dude("bye");
cout << d1.x << endl;
cout << d2.x << endl;
return 0;
}
In "main()", I create an object "d2" of type "Dude", and use Constructor 2 to set "x" to be the string "bye".
But in Constructor 2's declaration, I told it to accept an address of a string, not a string itself. So why can I pass it "bye" (which is a string). Why don't I have to create a variable string, and then pass the address of that string to Constructor 2 of Dude?
This actually illustrates one of the coolest and most useful features of C++: Temporary variables. Since you specified that the string reference is const, the compiler allows you to pass a reference to a temporary value to that function. So, here's what's happening behind the scenes with Dude d2 = Dude("bye");:
The compiler determines that the best constructor to use is Dude::Dude(const string &). How this choice is made is a whole different topic.
However, in order to use that constructor you need a string value. Now, "bye" is a const char[4], but the compiler can trivially convert that to a const char *, and that can be turned into a string. So, an anonymous temporary variable (call it temp1) is created.
string::string(const char *) is invoked with "bye", and the result is stored in temp1
Dude::Dude(const string&) is invoked with a reference to temp1. The result is assigned to d2 (actually, it is assigned to another temporary variable and the copy constructor for Dude is invoked with a const reference to it and that is assigned to d2. But in this case the result is the same.)
temp1 is discarded. This is where the string destructor string::~string() is run on temp1
Control passes to the next statement
I think you're misunderstanding what the & operator does in this context. Taking the address of a variable (&var) is different from signifying that a parameter is to be passed as a reference (as you have, in const string &a).
What your code is actually doing is implicitly creating a new string object that's initialized with the string "bye", and then that object is passed by reference to the Dude constructor. That is, your code is essentially:
Dude d2 = Dude(string("bye"));
and then the constructor receives that string object by reference and assigns it to x via a copy constructor.
In this case, string has a constructor which takes a const char* and is not declared explicit, so the compiler will create a temporary string (created with string("bye"), the aforementioned constructor) and then your const string& is set to refer to that temporary.
Two things:
1) There's no such thing as an "address" in your code. const string& means "constant reference to a string".
You're possibly confused by the fact that the symbol & is also used in an entirely different context as the "address-of" operator to create a pointer: T x; T * p = &x;. But that has nothing to do with references.
2) You're not actually necessarily using the constructor that you claim for d2; rather, you're creating a temporary object with your constructor #2, and then you construct d2 via the copy constructor from the temporary. The direct construction reads Dude d2("bye");.
When you call second constructor with a string argument, a temporary variable which references a copy of the string will be created and passed to the constructor.
Constructor 2 is not taking an address to a string, const string& a means a constant reference to an std::string object. The reason why you can pass the constructor a string literal is because the std::string class contains a non-explicit constructor that takes a const char *. So the compiler implicitly converts your string literal to an std::string first before calling Constructor 2.
So the following 2 lines are equivalent
Dude d2 = Dude("bye");
Dude d2 = Dude( std::string("bye") );
Also, when writing constructors, prefer initializing member variables in the initializer list instead of within the body of the constructor
Dude(const string &a) : x(a) {}
temporaries can be bound to a const reference, probably for this reason.
When you call Dude("bye"), the compiler sees if that is a perfect match (char[4]) for any constructors. Nope. Then it checks certain conversions (char*) still nope. Then it checks user conversions, and finds that std::string can be implicitly constructed from a char* So it creates a std::string from the char* for you, and passes it by reference to Dude's constructor, which makes a copy. At the end of the statement Dude d2 = Dude("bye"); the temporary string is automatically destroyed. It would be irritating if we had to do the explicit casts ourselves for every single function parameter.
Variables passed to a reference parameter will automatically pass their address instead. This is nice, because it allows us to treat objects with value semantics. I don't have to think about passing it an instance of a string, I can pass it the value "bye".
Constructor #2 accepts a reference to a const string. That allows it to accept a reference to either a pre-existing object or a temporary object (without the const qualifier, a reference to a temporary would not be accepted).
std::string has a constructor that accepts a pointer to char. The compiler is using that to create a temporary std::string object, and then passing a reference to that temporary to your ctor.
Note that the compiler will only (implicitly) do one conversion like this for you. If you need more than one conversion to get from the source data to the target type, you'll need to specify all but one of those conversions explicitly.
While "&" is an addressof operator, when declared in as part of method definition/declaration, it means that the reference is passed to the method. The reference in this case is d2. Note that D2 is not a pointer, it is a reference. In the constructor, "a" represents the string object with contents "hi". This is a typical example of a pass by reference on a method in C++.
Related
I made a mistake in a socket interface I wrote a while back and I just noticed the problem while looking through the code for a different issue. The socket receives a string of characters and passes it to jsoncpp to complete the json parsing. I can almost understand what is happening here but I can't get my head around it. I would like to grasp what is actually happening under the hood. Here is the minimum example:
#include <iostream>
#include <cstring>
void doSomethingWithAString(const std::string &val) {
std::cout << val.size() << std::endl;
std::cout << val << std::endl;
}
int main()
{
char responseBufferForSocket[10000];
memset(responseBufferForSocket, 0, 10000);
//Lets simulate a response from a socket connection
responseBufferForSocket[0] = 'H';
responseBufferForSocket[1] = 'i';
responseBufferForSocket[2] = '?';
// Now lets pass a .... the address of the first char in the array...
// wait a minute..that's not a const std::string& ... but hey, it's ok it *works*!
doSomethingWithAString(responseBufferForSocket);
return 0;
}
The code above is not causing any obvious issues but I would like to correct it if there is a problem lurking. Obviously the character array is being transformed to a string, but by what mechanism? I guess I have four questions:
Is this string converted on the stack and passed by reference or is it passed by value?
Is it using the operator= overload? A "from c-string" constructor? Some other mechanism?
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
Is this dangerous. :)
compiled with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
std::string has a non explicit constructor (i.e. not marked with the explicit keyword) that takes a const char* parameter and copies characters until the first '\0' (the behaviour is undefined if no such character exists in the string). In other words, it performs a copy of the source data. It's overload #5 on this page.
const char[] implicitly decays to const char*, and you can pass a temporary to a function taking a const reference parameter. This only works if the reference is const, by the way; if you can't use const, pass it by value.
And so, when you pass a const char[] to that function, a temporary object of type std::string is constructed using that constructor, and bound to the parameter. The temporary will remain alive for the duration of the function call, and will be destroyed when it returns.
With all that in mind, let's address your questions:
It's passed by reference, but the reference is to a temporary object.
A constructor, since we're constructing an object. std::string also has an operator= taking a const char* parameter, but that's never used for implicit conversions: you'll need to be explicitly assigning something.
The performance is the same since the same code runs, but you do incur some overhead because the data is copied instead of referenced. If that is an issue, use std::string_view instead.
It's safe as long as you don't try to keep a reference or pointer to the parameter for longer than the function call, because the object might not be alive afterwards (but then you should always keep that in mind with reference parameters). You also need to make sure that the C string you're passing is properly null terminated.
Is this string converted on the stack
The language doesn't specify the storage of temporary objects, but in this case it is probably stored on the stack, yes.
or is it passed by value?
The argument is a reference. Therefore you are "passing by reference".
Is it using the operator= overload?
No. You aren't using operator= there, so why would it?
A "from c-string" constructor?
Yes.
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
No. Whether object is created implicitly or explicitly is irrelevant to efficiency.
Creating a std::string is however potentially less efficient than not creating it which you could achieve by not accepting a reference to a string as the argument. You could use a string view instead.
Is this dangerous.
Not particularly. In some cases implicit conversions can cause a bit of problems when the programmers doesn't notice them, but typically they simplify the language by reducing verbosity.
Let's consider the following functions:
void processString1(const string & str) { /** **/}
void processString2(string && str) { /** **/}
processString1("Hello");
processString2("Hello");
As I assume, processString1 will invoke copy constructor and processString2 a move constructor of string. What is more efficient?
Your understanding is misguided here.
First, neither of those functions take a value - they both take a reference. So, when an object is passed to either one, no constructor is called - whatever object is passed is simply bound to a reference.
However, your function calls pass a C-string - and there is an implicit conversion from a C-string to a std::string.
Thus, each one will construct a TEMPORARY std::string our of the C-string "Hello".
That temporary object will bind to a reference-to-const in the first case and a rval-reference-to-non-const in the second case.
The language guarantees that the lifetime of the temporary will exist for at least the lifetime of the function call.
Neither function call does any construction - the only construction happens when the C-string is implicitly converted to an instance of std::string.
In another question a user made a comment that returning a const std::string loses move construction efficiency and is slower.
Is it really true that assigning a string of return of this method:
const std::string toJson(const std::string &someText);
const std::string jsonString = toJson(someText);
... is really slower than the non-const version:
std::string toJson(const std::string &str);
std::string jsonString = toJson(someText);
And what is the meaning of move-construction efficiency in this context?
I've never heard of that limitation before and do not remember having seen that in the profiler. But I'm curious.
Edit: There is a suggested question asking: What is move semantics?. While some of the explanations of course relate to efficiency, it explains what move semantics means, but does not address why returning a const value can have negative side effects regarding performance.
Consider the following functions:
std::string f();
std::string const g();
There is no difference between:
std::string s1 = f();
std::string s2 = g();
We have guaranteed copy elision now, in both of these cases we're constructing directly into the resulting object. No copy, no move.
However, there is a big difference between:
std::string s3, s4;
s3 = f(); // this is move assignment
s4 = g(); // this is copy assignment
g() may be an rvalue, but it's a const rvalue. It cannot bind to the string&& argument that the move assignment operator takes, so we fall back to the copy assignment operator whose string const& parameter can happily accept an rvalue.
Copying is definitely slower than moving for types like string, where moving is constant time and copying is linear and may require allocation.
Don't return const values.
On top of that, for non-class types:
int f();
int const g();
These two are actually the same, both return int. It's an odd quirk of the language that you cannot return a const prvalue of non-class type but you can return a const prvalue of class type. Easier to just pretend you can't do the latter either, since you shouldn't.
Without reading the specification or anything else, if we just think about it logically...
For example, lets say you have
// Declare the function
std::string const my_function();
// Initialize a non-constant variable using the function
std::string my_string = my_function();
The value returned by the function could be copied to a temporary object, the value from inside the function is then destructed. The temporary object (which is constant) is then copied to the my_string object, and then the temporary object is destructed. Two copies and two destructions. Sounds a little excessive, don't you think? Especially considering that both the value inside the function and the temporary object will be destructed, so they don't really need to keep their contents.
Wouldn't it be better if the copying could be elided, perhaps both of them? Then what could happen is that the value from inside the function is moved directly into the my_string object. The const status of anything doesn't matter, since the objects being moved from will be destructed next anyway.
The latter is what modern compiler do, they move even if the function is declared to return a const value. And even if the value or object inside the function is const as well.
Statements like this have certain meaning in terms of initialization,
std::string getString();
const std::string getConstantString();
std::string str = getString(); // 1
const std::string str = getConstantString(); //2
Both initialization statements 1 and 2 come under copy initialization. Now it depends on cv-qualification (const and volatile) of return type, there are two possibilities, if return type is cv-unqualified and move constructor available for class then object will be move initialized as in statement 1, and if return type is cv-qualified then object will be copy initialized as in statement 2.
But there is an optimization called copy-elision(ignores cv-qualification) and due to copy-elision, The objects are constructed directly into the storage where they would otherwise be copied/moved to.
There are two type of copy-elision, NRVO, "named return value optimization" and RVO, "return value optimization", but from c++17 Return value optimization is mandatory and no longer considered as copy elision.
Please see following link copy-elision
for more details.
Consider the following code:
class Foo
{
private:
const string& _bar;
public:
Foo(const string& bar)
: _bar(bar) { }
const string& GetBar() { return _bar; }
};
int main()
{
Foo foo1("Hey");
cout << foo1.GetBar() << endl;
string barString = "You";
Foo foo2(barString);
cout << foo2.GetBar() << endl;
}
When I execute this code (in VS 2013), the foo1 instance has an empty string in its _bar member variable while foo2's corresponding member variable holds the reference to value "You". Why is that?
Update: I'm of course using the std::string class in this example.
For Foo foo1("Hey") the compiler has to perform a conversion from const char[4] to std::string. It creates a prvalue of type std::string. This line is equivalent to:
Foo foo1(std::string("Hey"));
A reference bind occurs from the prvalue to bar, and then another reference bind occurs from bar to Foo::_bar. The problem here is that std::string("Hey") is a temporary that is destroyed when the full expression in which it appears ends. That is, after the semicolon, std::string("Hey") will not exist.
This causes a dangling reference because you now have Foo::_bar referring to an instance that has already been destroyed. When you print the string you then incur undefined behavior for using a dangling reference.
The line Foo foo2(barString) is fine because barString exists after the initialization of foo2, so Foo::_bar still refers to a valid instance of std::string. A temporary is not created because the type of the initializer matches the type of the reference.
You are taking a reference to an object that is getting destroyed at the end of the line with foo1. In foo2 the barString object still exist so the reference remains valid.
Yeah, this is the wonders of C++ and understanding:
The lifetime of objects
That string is a class and literal char arrays are not "strings".
What happens with implicit constructors.
In any case, string is a class, "Hey" is actually just an array of characters. So when you construct Foo with "Hey" which wants a reference to a string, it performs what is called an implicit conversion. This happens because string has an implicit constructor from arrays of characters.
Now for the lifetime of object issue. Having constructed this string for you, where does it live and what is its lifetime. Well actually for the value of that call, here the constructor of Foo, and anything it calls. So it can call all sorts of functions all over and that string is valid.
However once that call is over, the object expires. Unfortunately you have stored within your class a const reference to it, and you are allowed to. The compiler doesn't complain, because you may store a const reference to an object that is going to live longer.
Unfortunately this is a nasty trap. And I recall once I purposely gave my constructor, that really wanted a const reference, a non-const reference on purpose to ensure exactly that this situation did not occur (nor would it receive a temporary). Possibly not the best workaround, but it worked at the time.
Your best option really most of the time is just to copy the string. It is less expensive than you think unless you really process lots and lots of these. In your case it probably won't actually copy anything, and the compiler will secretly move the copy it made anyway.
You can also take a non-const reference to a string and "swap" it in
With C++11 there is a further option of using move semantics, which means the string passed in will become "acquired", itself invalidated. This is particularly useful when you do want to take in temporaries, which yours is an example of (although mostly temporaries are constructed through an explicit constructor or a return value).
The problem is that in this code:
Foo foo1("Hey");
From the string literal "Hey" (raw char array, more precisely const char [4], considering the three characters in Hey and the terminating \0) a temporary std::string instance is created, and it is passed to the Foo(const string&) constructor.
This constructor saves a reference to this temporary string into the const string& _bar data member:
Foo(const string& bar)
: _bar(bar) { }
Now, the problem is that you are saving a reference to a temporary string. So when the temporary string "evaporates" (after the constructor call statement), the reference becomes dangling, i.e. it references ("points to...") some garbage.
So, you incur in undefined behavior (for example, compiling your code using MinGW on Windows with g++, I have a different result).
Instead, in this second case:
string barString = "You";
Foo foo2(barString);
your foo2::_bar reference is associated to ("points to") the barString, which is not temporary, but is a local variable in main(). So, after the constructor call, the barString is still there when you print the string using cout << foo2.GetBar().
Of course, to fix that, you should consider using a std::string data member, instead of a reference.
In this way, the string will be deep-copied into the data member, and it will persist even if the input source string used in the constructor is a temporary (and "evaporates" after the constructor call).
I am learning C++ from the beginning and I don't get the whole strings topic.
What is the difference between the following three codes?
std::string s = std::string("foo");
std::string s = new std::string("foo");
std::string s = "foo";
std::string s = std::string("foo");
This creates a temporary std::string object containing "foo", then assigns it to s. (Note that compilers may elide the temporary. The temporary elison in this case is explicitly allowed by the C++ standard.)
std::string s = new std::string("foo");
This is a compiler error. The expression new std::string("foo") creates an std::string on the free store and returns a pointer to an std::string. It then attempts to assign the returned pointer of type std::string* to s of type std::string. The design of the std::string class prevents that from happening, so the compile fails.
C++ is not Java. This is not how objects are typically created, because if you forget to delete the returned std::string object you will leak memory. One of the main benefits of using std::string is that it manages the underlying string buffer for you automatically, so new-ing it kind of defeats that purpose.
std::string s = "foo";
This is essentially the same as #1. It technically initializes a new temporary string which will contain "foo", then assigns it to s. Again, compilers will typically elide the temporary (and in fact pretty much all non-stupid compilers nowadays do in fact eliminate the temporary), so in practice it simply constructs a new object called s in place.
Specifically it invokes a converting constructor in std::string that accepts a const char* argument. In the above code, the converting constructor is required to be non-explicit, otherwise it's a compiler error. The converting constructor is in fact non-explicit for std::strings, so the above does compile.
This is how std::strings are typically initialized. When s goes out of scope, the s object will be destroyed along with the underlying string buffer. Note that the following has the same effect (and is another typical way std::strings are initialized), in the sense that it also produces an object called s containing "foo".
std::string s("foo");
However, there's a subtle difference between std::string s = "foo"; and std::string s("foo");, one of them being that the converting constructor can be either explicit or non-explicit in the above case.
std::string s = std::string("foo");
This is called copy initialization. It is functionally the same as direct initialization
std::string s( "foo" );
but the former does require that the copy constructor is available and compilers may create a temporary object but most will elide the temporary and directly construct s to contain "foo".
std::string s = new std::string("foo");
This will not compile because new returns a pointer. To make it work you'd need the type of s to be a std::string *. Then the line dynamically allocates an std::string object and stores the pointer in s. You'll need to delete it once you're done using it.
std::string s = "foo";
This is almost the same as first. It is copy initialization but it has an added constraint. It requires that the std::string class contains a non-explicit constructor that takes a const char *. This allows the compiler to implicitly construct a temporary std::string object. After that the semantics are identical to case 1.
Creates a temporary string object and copies the value to s
Does not compile, new std::string("foo") returns a pointer to some newly allocated memory.
For this to work, you should declare s as a pointer to a string std::string* s.
Constructs a string from a C-string.
You should use the third option in most - if not all - cases.
1 will create a temporary variable (right hand side), then call the assignment operator to assign the value to s
2 will create an instance of std::string on the heap and return a pointer to it, and will fail in the assignment because you can't assign a pointer to a non-pointer type
3 will build a std::string and initialize it from a const char*
On the number 1, you are creating a temporary string using the constructor and then assigning it to s.
Number 2 doesn't even compile.
On number 3, you are creating a new string and then assign a value to it.