C++ string declaration - c++

I am learning C++ from the beginning and I don't get the whole strings topic.
What is the difference between the following three codes?
std::string s = std::string("foo");
std::string s = new std::string("foo");
std::string s = "foo";

std::string s = std::string("foo");
This creates a temporary std::string object containing "foo", then assigns it to s. (Note that compilers may elide the temporary. The temporary elison in this case is explicitly allowed by the C++ standard.)
std::string s = new std::string("foo");
This is a compiler error. The expression new std::string("foo") creates an std::string on the free store and returns a pointer to an std::string. It then attempts to assign the returned pointer of type std::string* to s of type std::string. The design of the std::string class prevents that from happening, so the compile fails.
C++ is not Java. This is not how objects are typically created, because if you forget to delete the returned std::string object you will leak memory. One of the main benefits of using std::string is that it manages the underlying string buffer for you automatically, so new-ing it kind of defeats that purpose.
std::string s = "foo";
This is essentially the same as #1. It technically initializes a new temporary string which will contain "foo", then assigns it to s. Again, compilers will typically elide the temporary (and in fact pretty much all non-stupid compilers nowadays do in fact eliminate the temporary), so in practice it simply constructs a new object called s in place.
Specifically it invokes a converting constructor in std::string that accepts a const char* argument. In the above code, the converting constructor is required to be non-explicit, otherwise it's a compiler error. The converting constructor is in fact non-explicit for std::strings, so the above does compile.
This is how std::strings are typically initialized. When s goes out of scope, the s object will be destroyed along with the underlying string buffer. Note that the following has the same effect (and is another typical way std::strings are initialized), in the sense that it also produces an object called s containing "foo".
std::string s("foo");
However, there's a subtle difference between std::string s = "foo"; and std::string s("foo");, one of them being that the converting constructor can be either explicit or non-explicit in the above case.

std::string s = std::string("foo");
This is called copy initialization. It is functionally the same as direct initialization
std::string s( "foo" );
but the former does require that the copy constructor is available and compilers may create a temporary object but most will elide the temporary and directly construct s to contain "foo".
std::string s = new std::string("foo");
This will not compile because new returns a pointer. To make it work you'd need the type of s to be a std::string *. Then the line dynamically allocates an std::string object and stores the pointer in s. You'll need to delete it once you're done using it.
std::string s = "foo";
This is almost the same as first. It is copy initialization but it has an added constraint. It requires that the std::string class contains a non-explicit constructor that takes a const char *. This allows the compiler to implicitly construct a temporary std::string object. After that the semantics are identical to case 1.

Creates a temporary string object and copies the value to s
Does not compile, new std::string("foo") returns a pointer to some newly allocated memory.
For this to work, you should declare s as a pointer to a string std::string* s.
Constructs a string from a C-string.
You should use the third option in most - if not all - cases.

1 will create a temporary variable (right hand side), then call the assignment operator to assign the value to s
2 will create an instance of std::string on the heap and return a pointer to it, and will fail in the assignment because you can't assign a pointer to a non-pointer type
3 will build a std::string and initialize it from a const char*

On the number 1, you are creating a temporary string using the constructor and then assigning it to s.
Number 2 doesn't even compile.
On number 3, you are creating a new string and then assign a value to it.

Related

Passing a char array to a function that expects a const std::string reference

I made a mistake in a socket interface I wrote a while back and I just noticed the problem while looking through the code for a different issue. The socket receives a string of characters and passes it to jsoncpp to complete the json parsing. I can almost understand what is happening here but I can't get my head around it. I would like to grasp what is actually happening under the hood. Here is the minimum example:
#include <iostream>
#include <cstring>
void doSomethingWithAString(const std::string &val) {
std::cout << val.size() << std::endl;
std::cout << val << std::endl;
}
int main()
{
char responseBufferForSocket[10000];
memset(responseBufferForSocket, 0, 10000);
//Lets simulate a response from a socket connection
responseBufferForSocket[0] = 'H';
responseBufferForSocket[1] = 'i';
responseBufferForSocket[2] = '?';
// Now lets pass a .... the address of the first char in the array...
// wait a minute..that's not a const std::string& ... but hey, it's ok it *works*!
doSomethingWithAString(responseBufferForSocket);
return 0;
}
The code above is not causing any obvious issues but I would like to correct it if there is a problem lurking. Obviously the character array is being transformed to a string, but by what mechanism? I guess I have four questions:
Is this string converted on the stack and passed by reference or is it passed by value?
Is it using the operator= overload? A "from c-string" constructor? Some other mechanism?
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
Is this dangerous. :)
compiled with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
std::string has a non explicit constructor (i.e. not marked with the explicit keyword) that takes a const char* parameter and copies characters until the first '\0' (the behaviour is undefined if no such character exists in the string). In other words, it performs a copy of the source data. It's overload #5 on this page.
const char[] implicitly decays to const char*, and you can pass a temporary to a function taking a const reference parameter. This only works if the reference is const, by the way; if you can't use const, pass it by value.
And so, when you pass a const char[] to that function, a temporary object of type std::string is constructed using that constructor, and bound to the parameter. The temporary will remain alive for the duration of the function call, and will be destroyed when it returns.
With all that in mind, let's address your questions:
It's passed by reference, but the reference is to a temporary object.
A constructor, since we're constructing an object. std::string also has an operator= taking a const char* parameter, but that's never used for implicit conversions: you'll need to be explicitly assigning something.
The performance is the same since the same code runs, but you do incur some overhead because the data is copied instead of referenced. If that is an issue, use std::string_view instead.
It's safe as long as you don't try to keep a reference or pointer to the parameter for longer than the function call, because the object might not be alive afterwards (but then you should always keep that in mind with reference parameters). You also need to make sure that the C string you're passing is properly null terminated.
Is this string converted on the stack
The language doesn't specify the storage of temporary objects, but in this case it is probably stored on the stack, yes.
or is it passed by value?
The argument is a reference. Therefore you are "passing by reference".
Is it using the operator= overload?
No. You aren't using operator= there, so why would it?
A "from c-string" constructor?
Yes.
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
No. Whether object is created implicitly or explicitly is irrelevant to efficiency.
Creating a std::string is however potentially less efficient than not creating it which you could achieve by not accepting a reference to a string as the argument. You could use a string view instead.
Is this dangerous.
Not particularly. In some cases implicit conversions can cause a bit of problems when the programmers doesn't notice them, but typically they simplify the language by reducing verbosity.

Function call parameter, char * vs string default constructor

While calling a function/method in C++11 and above, which one is better (if any difference)?
Lets assume this function/method:
void func(std::string s) { ... }
Which one is best between the following?
func(std::string())
or
func("")
And more generally, is there any advantage to always call the constructor explicitly during initialization or parameter passing?
It's better to call the default constructor, because it's guaranteed to not do any unnecessary work.
When passing an empty string literal, it could be that the string implementation does some work processing that string (compute its length for example). An empty string literal isn't a magic bullet that can be treated differently from non-empty string literals. It's type is const char[1], which decays into const char*, and that's it - the std::string constructor dealing with this literal will end up doing more work than necessary.
From cppreference for std::string::string():
Default constructor. Constructs empty string (zero size and unspecified capacity). If no allocator is supplied, allocator is obtained from a default-constructed instance.
... and for std::string::string(const char*):
Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. [...]
For further reading, see also this short article.
I would like to compare func(std::string()) with func(""):
func(std::string())
You create an std::string object with default parameter is empty string
Then pass std::string object to func function. You pass it by value, and a new std::string object will be allocated in stack memory, and call a copy constructor to initialized it.
In this case, there are two std::string object is allocated.
func("")
You pass an empty string, so compiler will allocate a std::string object in stack memory, and use std::string(const char*) constructor.
In this case, there is only 1 std::string object allocated.
So, I think for this specific case, func("") maybe better.

Why is my string reference member variable set to an empty string in C++?

Consider the following code:
class Foo
{
private:
const string& _bar;
public:
Foo(const string& bar)
: _bar(bar) { }
const string& GetBar() { return _bar; }
};
int main()
{
Foo foo1("Hey");
cout << foo1.GetBar() << endl;
string barString = "You";
Foo foo2(barString);
cout << foo2.GetBar() << endl;
}
When I execute this code (in VS 2013), the foo1 instance has an empty string in its _bar member variable while foo2's corresponding member variable holds the reference to value "You". Why is that?
Update: I'm of course using the std::string class in this example.
For Foo foo1("Hey") the compiler has to perform a conversion from const char[4] to std::string. It creates a prvalue of type std::string. This line is equivalent to:
Foo foo1(std::string("Hey"));
A reference bind occurs from the prvalue to bar, and then another reference bind occurs from bar to Foo::_bar. The problem here is that std::string("Hey") is a temporary that is destroyed when the full expression in which it appears ends. That is, after the semicolon, std::string("Hey") will not exist.
This causes a dangling reference because you now have Foo::_bar referring to an instance that has already been destroyed. When you print the string you then incur undefined behavior for using a dangling reference.
The line Foo foo2(barString) is fine because barString exists after the initialization of foo2, so Foo::_bar still refers to a valid instance of std::string. A temporary is not created because the type of the initializer matches the type of the reference.
You are taking a reference to an object that is getting destroyed at the end of the line with foo1. In foo2 the barString object still exist so the reference remains valid.
Yeah, this is the wonders of C++ and understanding:
The lifetime of objects
That string is a class and literal char arrays are not "strings".
What happens with implicit constructors.
In any case, string is a class, "Hey" is actually just an array of characters. So when you construct Foo with "Hey" which wants a reference to a string, it performs what is called an implicit conversion. This happens because string has an implicit constructor from arrays of characters.
Now for the lifetime of object issue. Having constructed this string for you, where does it live and what is its lifetime. Well actually for the value of that call, here the constructor of Foo, and anything it calls. So it can call all sorts of functions all over and that string is valid.
However once that call is over, the object expires. Unfortunately you have stored within your class a const reference to it, and you are allowed to. The compiler doesn't complain, because you may store a const reference to an object that is going to live longer.
Unfortunately this is a nasty trap. And I recall once I purposely gave my constructor, that really wanted a const reference, a non-const reference on purpose to ensure exactly that this situation did not occur (nor would it receive a temporary). Possibly not the best workaround, but it worked at the time.
Your best option really most of the time is just to copy the string. It is less expensive than you think unless you really process lots and lots of these. In your case it probably won't actually copy anything, and the compiler will secretly move the copy it made anyway.
You can also take a non-const reference to a string and "swap" it in
With C++11 there is a further option of using move semantics, which means the string passed in will become "acquired", itself invalidated. This is particularly useful when you do want to take in temporaries, which yours is an example of (although mostly temporaries are constructed through an explicit constructor or a return value).
The problem is that in this code:
Foo foo1("Hey");
From the string literal "Hey" (raw char array, more precisely const char [4], considering the three characters in Hey and the terminating \0) a temporary std::string instance is created, and it is passed to the Foo(const string&) constructor.
This constructor saves a reference to this temporary string into the const string& _bar data member:
Foo(const string& bar)
: _bar(bar) { }
Now, the problem is that you are saving a reference to a temporary string. So when the temporary string "evaporates" (after the constructor call statement), the reference becomes dangling, i.e. it references ("points to...") some garbage.
So, you incur in undefined behavior (for example, compiling your code using MinGW on Windows with g++, I have a different result).
Instead, in this second case:
string barString = "You";
Foo foo2(barString);
your foo2::_bar reference is associated to ("points to") the barString, which is not temporary, but is a local variable in main(). So, after the constructor call, the barString is still there when you print the string using cout << foo2.GetBar().
Of course, to fix that, you should consider using a std::string data member, instead of a reference.
In this way, the string will be deep-copied into the data member, and it will persist even if the input source string used in the constructor is a temporary (and "evaporates" after the constructor call).

cstring -> c++ string conversion

If I have a function
void x(std::string const& s)
{
...
}
And I am calling it as x("abc"), will string constructor allocate memory and copy data in it?
The std::string constructor will be called with a const char* argument
There is no telling whether memory would be allocated (dynamically), but the chances are that your standard library implementation has the SSO in place, which means it can store small strings without dynamic allocations.
SSO: Meaning of acronym SSO in the context of std::string
The question is tagged with 'performance', so it's actually a good question IMO.
All compilers I know will allocate a copy of the string on the heap. However, some implementation could make the std::string type intrinsic into the compiler and optimize the heap allocation when an r-value std::string is constructed from a string literal.
E.g., this is not the case here, but MSVC is capable of replacing heap allocations with static objects when they are done as part of dynamic initialization of statics, at least in some circumstances.
Yes, the compiler will generate the necessary code to create a std::string and pass it as argument to the x function.
Constructors which take a single argument, unless marked with the explicit keyword, are used to implicitly convert from the argument type to an instance of the object.
In this example, std::string has a constructor which takes a const char* argument, so the compiler uses this to implicitly convert your string literal into a std::string object. The const reference of that newly created object is then passed to your function.
Here's more information: What does the explicit keyword mean in C++?

C++: using & operator for pass-by-reference

I'm teaching myself C++, and in the process I'm writing simple little programs to learn basic ideas. With respect to "pass-by-reference", I'm confused why the following piece of code works (some of the code is just there to practice overloading constructors):
#include <iostream>
#include <string>
using namespace std;
class Dude
{
public:
string x;
Dude(); // Constructor 1
Dude(const string &a); // Constructor 2
};
Dude::Dude() : x("hi") {}
Dude::Dude(const string &a) : x(a) {}
int main()
{
Dude d1;
Dude d2 = Dude("bye");
cout << d1.x << endl;
cout << d2.x << endl;
return 0;
}
In "main()", I create an object "d2" of type "Dude", and use Constructor 2 to set "x" to be the string "bye".
But in Constructor 2's declaration, I told it to accept an address of a string, not a string itself. So why can I pass it "bye" (which is a string). Why don't I have to create a variable string, and then pass the address of that string to Constructor 2 of Dude?
This actually illustrates one of the coolest and most useful features of C++: Temporary variables. Since you specified that the string reference is const, the compiler allows you to pass a reference to a temporary value to that function. So, here's what's happening behind the scenes with Dude d2 = Dude("bye");:
The compiler determines that the best constructor to use is Dude::Dude(const string &). How this choice is made is a whole different topic.
However, in order to use that constructor you need a string value. Now, "bye" is a const char[4], but the compiler can trivially convert that to a const char *, and that can be turned into a string. So, an anonymous temporary variable (call it temp1) is created.
string::string(const char *) is invoked with "bye", and the result is stored in temp1
Dude::Dude(const string&) is invoked with a reference to temp1. The result is assigned to d2 (actually, it is assigned to another temporary variable and the copy constructor for Dude is invoked with a const reference to it and that is assigned to d2. But in this case the result is the same.)
temp1 is discarded. This is where the string destructor string::~string() is run on temp1
Control passes to the next statement
I think you're misunderstanding what the & operator does in this context. Taking the address of a variable (&var) is different from signifying that a parameter is to be passed as a reference (as you have, in const string &a).
What your code is actually doing is implicitly creating a new string object that's initialized with the string "bye", and then that object is passed by reference to the Dude constructor. That is, your code is essentially:
Dude d2 = Dude(string("bye"));
and then the constructor receives that string object by reference and assigns it to x via a copy constructor.
In this case, string has a constructor which takes a const char* and is not declared explicit, so the compiler will create a temporary string (created with string("bye"), the aforementioned constructor) and then your const string& is set to refer to that temporary.
Two things:
1) There's no such thing as an "address" in your code. const string& means "constant reference to a string".
You're possibly confused by the fact that the symbol & is also used in an entirely different context as the "address-of" operator to create a pointer: T x; T * p = &x;. But that has nothing to do with references.
2) You're not actually necessarily using the constructor that you claim for d2; rather, you're creating a temporary object with your constructor #2, and then you construct d2 via the copy constructor from the temporary. The direct construction reads Dude d2("bye");.
When you call second constructor with a string argument, a temporary variable which references a copy of the string will be created and passed to the constructor.
Constructor 2 is not taking an address to a string, const string& a means a constant reference to an std::string object. The reason why you can pass the constructor a string literal is because the std::string class contains a non-explicit constructor that takes a const char *. So the compiler implicitly converts your string literal to an std::string first before calling Constructor 2.
So the following 2 lines are equivalent
Dude d2 = Dude("bye");
Dude d2 = Dude( std::string("bye") );
Also, when writing constructors, prefer initializing member variables in the initializer list instead of within the body of the constructor
Dude(const string &a) : x(a) {}
temporaries can be bound to a const reference, probably for this reason.
When you call Dude("bye"), the compiler sees if that is a perfect match (char[4]) for any constructors. Nope. Then it checks certain conversions (char*) still nope. Then it checks user conversions, and finds that std::string can be implicitly constructed from a char* So it creates a std::string from the char* for you, and passes it by reference to Dude's constructor, which makes a copy. At the end of the statement Dude d2 = Dude("bye"); the temporary string is automatically destroyed. It would be irritating if we had to do the explicit casts ourselves for every single function parameter.
Variables passed to a reference parameter will automatically pass their address instead. This is nice, because it allows us to treat objects with value semantics. I don't have to think about passing it an instance of a string, I can pass it the value "bye".
Constructor #2 accepts a reference to a const string. That allows it to accept a reference to either a pre-existing object or a temporary object (without the const qualifier, a reference to a temporary would not be accepted).
std::string has a constructor that accepts a pointer to char. The compiler is using that to create a temporary std::string object, and then passing a reference to that temporary to your ctor.
Note that the compiler will only (implicitly) do one conversion like this for you. If you need more than one conversion to get from the source data to the target type, you'll need to specify all but one of those conversions explicitly.
While "&" is an addressof operator, when declared in as part of method definition/declaration, it means that the reference is passed to the method. The reference in this case is d2. Note that D2 is not a pointer, it is a reference. In the constructor, "a" represents the string object with contents "hi". This is a typical example of a pass by reference on a method in C++.