cstring -> c++ string conversion - c++

If I have a function
void x(std::string const& s)
{
...
}
And I am calling it as x("abc"), will string constructor allocate memory and copy data in it?

The std::string constructor will be called with a const char* argument
There is no telling whether memory would be allocated (dynamically), but the chances are that your standard library implementation has the SSO in place, which means it can store small strings without dynamic allocations.
SSO: Meaning of acronym SSO in the context of std::string

The question is tagged with 'performance', so it's actually a good question IMO.
All compilers I know will allocate a copy of the string on the heap. However, some implementation could make the std::string type intrinsic into the compiler and optimize the heap allocation when an r-value std::string is constructed from a string literal.
E.g., this is not the case here, but MSVC is capable of replacing heap allocations with static objects when they are done as part of dynamic initialization of statics, at least in some circumstances.

Yes, the compiler will generate the necessary code to create a std::string and pass it as argument to the x function.

Constructors which take a single argument, unless marked with the explicit keyword, are used to implicitly convert from the argument type to an instance of the object.
In this example, std::string has a constructor which takes a const char* argument, so the compiler uses this to implicitly convert your string literal into a std::string object. The const reference of that newly created object is then passed to your function.
Here's more information: What does the explicit keyword mean in C++?

Related

Passing a char array to a function that expects a const std::string reference

I made a mistake in a socket interface I wrote a while back and I just noticed the problem while looking through the code for a different issue. The socket receives a string of characters and passes it to jsoncpp to complete the json parsing. I can almost understand what is happening here but I can't get my head around it. I would like to grasp what is actually happening under the hood. Here is the minimum example:
#include <iostream>
#include <cstring>
void doSomethingWithAString(const std::string &val) {
std::cout << val.size() << std::endl;
std::cout << val << std::endl;
}
int main()
{
char responseBufferForSocket[10000];
memset(responseBufferForSocket, 0, 10000);
//Lets simulate a response from a socket connection
responseBufferForSocket[0] = 'H';
responseBufferForSocket[1] = 'i';
responseBufferForSocket[2] = '?';
// Now lets pass a .... the address of the first char in the array...
// wait a minute..that's not a const std::string& ... but hey, it's ok it *works*!
doSomethingWithAString(responseBufferForSocket);
return 0;
}
The code above is not causing any obvious issues but I would like to correct it if there is a problem lurking. Obviously the character array is being transformed to a string, but by what mechanism? I guess I have four questions:
Is this string converted on the stack and passed by reference or is it passed by value?
Is it using the operator= overload? A "from c-string" constructor? Some other mechanism?
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
Is this dangerous. :)
compiled with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
std::string has a non explicit constructor (i.e. not marked with the explicit keyword) that takes a const char* parameter and copies characters until the first '\0' (the behaviour is undefined if no such character exists in the string). In other words, it performs a copy of the source data. It's overload #5 on this page.
const char[] implicitly decays to const char*, and you can pass a temporary to a function taking a const reference parameter. This only works if the reference is const, by the way; if you can't use const, pass it by value.
And so, when you pass a const char[] to that function, a temporary object of type std::string is constructed using that constructor, and bound to the parameter. The temporary will remain alive for the duration of the function call, and will be destroyed when it returns.
With all that in mind, let's address your questions:
It's passed by reference, but the reference is to a temporary object.
A constructor, since we're constructing an object. std::string also has an operator= taking a const char* parameter, but that's never used for implicit conversions: you'll need to be explicitly assigning something.
The performance is the same since the same code runs, but you do incur some overhead because the data is copied instead of referenced. If that is an issue, use std::string_view instead.
It's safe as long as you don't try to keep a reference or pointer to the parameter for longer than the function call, because the object might not be alive afterwards (but then you should always keep that in mind with reference parameters). You also need to make sure that the C string you're passing is properly null terminated.
Is this string converted on the stack
The language doesn't specify the storage of temporary objects, but in this case it is probably stored on the stack, yes.
or is it passed by value?
The argument is a reference. Therefore you are "passing by reference".
Is it using the operator= overload?
No. You aren't using operator= there, so why would it?
A "from c-string" constructor?
Yes.
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
No. Whether object is created implicitly or explicitly is irrelevant to efficiency.
Creating a std::string is however potentially less efficient than not creating it which you could achieve by not accepting a reference to a string as the argument. You could use a string view instead.
Is this dangerous.
Not particularly. In some cases implicit conversions can cause a bit of problems when the programmers doesn't notice them, but typically they simplify the language by reducing verbosity.

Function call parameter, char * vs string default constructor

While calling a function/method in C++11 and above, which one is better (if any difference)?
Lets assume this function/method:
void func(std::string s) { ... }
Which one is best between the following?
func(std::string())
or
func("")
And more generally, is there any advantage to always call the constructor explicitly during initialization or parameter passing?
It's better to call the default constructor, because it's guaranteed to not do any unnecessary work.
When passing an empty string literal, it could be that the string implementation does some work processing that string (compute its length for example). An empty string literal isn't a magic bullet that can be treated differently from non-empty string literals. It's type is const char[1], which decays into const char*, and that's it - the std::string constructor dealing with this literal will end up doing more work than necessary.
From cppreference for std::string::string():
Default constructor. Constructs empty string (zero size and unspecified capacity). If no allocator is supplied, allocator is obtained from a default-constructed instance.
... and for std::string::string(const char*):
Constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The length of the string is determined by the first null character. [...]
For further reading, see also this short article.
I would like to compare func(std::string()) with func(""):
func(std::string())
You create an std::string object with default parameter is empty string
Then pass std::string object to func function. You pass it by value, and a new std::string object will be allocated in stack memory, and call a copy constructor to initialized it.
In this case, there are two std::string object is allocated.
func("")
You pass an empty string, so compiler will allocate a std::string object in stack memory, and use std::string(const char*) constructor.
In this case, there is only 1 std::string object allocated.
So, I think for this specific case, func("") maybe better.

std::string& vs boost::string_ref

Does it matter anymore if I use boost::string_ref over std::string& ? I mean, is it really more efficient to use boost::string_ref over the std version when you are processing strings ? I don't really get the explanation offered here: http://www.boost.org/doc/libs/1_61_0/libs/utility/doc/html/string_ref.html . What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory, and since c++11, with move semantics the copy operations noted in the article above are not going to happen. So, which one is more efficient ?
The use case for string_ref (or string_view in recent Boost and C++17) is for substring references.
The case where
the source string happens to be std::string
and the full length of a source string is referenced
is a (a-typical) special case, where it does indeed resemble std::string const&.
Note also that operations on string_ref (like sref.substring(...)) automatically return more string_ref objects, instead of allocating a new std::string.
I have never used it be it seems to me that its purpose is to provide an interface similar to std::string but without having to allocate a string for manipulation. Take the example given extract_part(): it is given a hard-coded C array "ABCDEFG", but because the initial function takes a std::string an allocation takes place (std::string will have its own version of "ABCDEFG"). Using string_ref, no allocation occurs, it uses the reference to the initial "ABCDEFG". The constraint is that the string is read-only.
This answer uses the new name string_view to mean the same as string_ref.
What really confuses me is the fact that std::string is also a handle class that only points to the allocated memory
A string allocates, owns, and manages its own memory. A string_view is a handle to some memory that was already allocated. The memory is managed by some other mechanism, unrelated to the string_view.
If you already have some text data, for example in a char array, then the additional memory allocation involved in constructing a string might be redundant. A string_view could be more efficient because it would allow you to operate directly on the original data in the char array. However, it would not permit the data to be modified; string_view allows no non-const access, because it doesn't own the data it refers to.
and since c++11, with move semantics the copy operations noted in the article above are not going to happen.
You can only move from an object that is ready to be discarded. Copying still serves a purpose and is necessary in many cases.
The example in the article constructs two new strings (not copies) and also constructs two copies of existing strings. In C++98 the copies could already be elided by RVO without move semantics, so they're not a big deal. By using string_view it avoids constructing the two new strings. Move semantics are irrelevant here.
In the call to extract_part("ABCDEFG") a string_view is constructed which refers to the char array represented by the string literal. Constructing a string here would have involved a memory allocation and a copy of the char array.
In the call to bar.substr(2,3) a string_view is constructed which refers to parts of the data already referred to by the first string_view. Using a string here would have involved another memory allocation and copy of part of the data.
So, which one is more efficient?
This is a bit like asking if a hammer is more efficient than a screwdriver. They serve different purposes, so it depends what it is you're trying to accomplish.
You need to be careful when using string_view that the memory it refers to remains valid throughout its lifetime.
If you stick to std::string it does not matter, but boost::string_ref also supports const char*. That is, do you intend to call your string processing function foo with std::string only?
void foo(const std::string&);
foo("won't work"); // no support for `const char*`
Since boost::string_ref is constructable from const char*, it is more flexible since it works with both const char* and std::string.
The proposal N3442 might be helpful.
In short: The main benefit of std::string_view over const std::string& is that you can pass both const char* and std::string objects without doing a copy. As others have said, it also allows you to pass substrings without copying, although (in my experience) this is somewhat less often important.
Consider the following (silly) function (yes I know you could just call s.at(2)):
char getThird(std::string s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This function works, but the string is passed by value. This means the whole length of the string is copied even though we don't look at all of it, and it also (often) incurs a dynamic memory allocation. Doing this in a tight loop can be very expensive. One solution to this is to pass the string by const reference instead:
char getThird(const std::string& s);
This works a lot better if you have a std::string variable and you pass it as a parameter to getThird. But now there's a problem: what if you have a null-terminated const char* string? When you call this function, a temporary std::string will get constructed, so you still get still get the copy and dynamic memory allocation.
Here's another attempt:
char getThird(const char* s)
{
if (std::strlen(s) < 3) throw std::runtime_error("String too short");
return s[2];
}
This will obviously now work fine for const char* variables. It will also work for std::string variables, but calling it is a little awkward: getThird(myStr.c_str()). What's more, std::string supports embedded null characters, and getThird will misinterpret the string as ended at the first of these. At worst this could cause a security vulnerability - imagine if the function were called checkStringForBadHacks!
Another problem is simply that it's annoying to write a function in terms of old null-terminated strings instead of std::string objects with their handy methods. Did you notice, for example, that this function looks at the whole length of the string even though only the first few characters are important? It's hidden in std::strlen, which iterates over all characters looking for the null terminator. We could replace that with a manual check that the first three characters aren't null, but you can see this is a lot less convenient than the other versions.
Step in std::string_view (or boost::string_view, previously known as boost::string_ref):
char getThird(std::string_view s)
{
if (s.size() < 3) throw std::runtime_error("String too short");
return s[2];
}
This gives you the nice methods you expect from a proper string class, like .size(), and it works in both the situations discussed above, plus another:
It works with std::string objects, which can be implicitly be converted to std::string_view objects.
It works with const char* null-terminated strings, which can also be implicitly be converted to std::string_view objects.
This does have the potential disadvantage that constructing the std::string_view requires iterating over the whole string to find the length, even if the function that uses it never needs it (as is the case here). However, if a caller is using a const char* as a parameter to several functions (or one function in a loop) that take std::string_view objects it could always manually construct that object beforehand. This could even give a performance increase, because if that function(s) do need the length then it is precomputed once and reused.
As other answers have mentioned, it also avoids a copy when you only want to pass a substring. For example, this is very useful in parsing. But std::string_view is justified even without this feature.
It's worth noting that there is a case where the original function signature, taking a std::string by value, may actually be better than a std::string_view. That's where you were going to make a copy of the string anyway, for example to store in some other variable or to return from the function. Imagine this function:
std::string changeThird(std::string s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
s[2] = c;
return s;
}
// vs.
std::string changeThird(std::string_view s, char c)
{
if (s.size() < 3) throw std::runtime_error("String too short");
std::string result = s;
result[2] = c;
return result;
}
Note that both of these involve exactly one copy: In the first case this is done implicitly when the parameter s is constructed from whatever is passed in (including if it is another std::string). In the second case we do it explicitly when we create result. But the return statement does not do a copy, because uses move semantics (as if we had done std::move(result)), or more likely uses the return value optimisation.
The reason the first version can be better is that it is actually possible for it to perform zero copies, if the caller moves the argument:
std::string something = getMyString();
std::string other = changeThird(std::move(something), "x");
In this case, the first changeThird does not involve any copy at all, whereas the second one does.

Passing a string to a function in C++(object vs string)

What is the difference between the following two in C++?
fun(L"text1")
VS
std::wstring var = "text1"
fun(var)
In the first case, it is being passed as an object while the second case passes it as a wstring.
How should fun() be defined to handle both?
EDIT:
I have two function definitions
fun(void*)
fun(std::wstring)
std::wstring t = "bla";
fun(t);
fun(L"msg");
When fun(t) is called it goes to the definition of fun(std::wstring)
But when fun(L"msg") is called it goes to fun(Void*). Instead I want it to goto fun(std::wstring)
The first is passed as a wide-string literal. In the second case, you pass by value (and hence copy) or by reference an std::wstring object.
To handle both, you have to define two overloads of your function:
void fun(const wchar_t* s);
void fun(const std::wstring& s);
or you can just define the wstring version, because the literal will implicitly convert to a wstring.
To handle both you should define the fun method as:
void fun(const std::wstring& str);
then in both cases you would be passing a const reference to a wstring because the compiler is allowed to implicitly cast one type to another if the type being cast to has a constructor that takes one argument of the type being cast from unless that constructor is marked as explicit.
Example:
class wstring
{
public:
// constructor not marked as explicit and takes one argument of type whar_t*
wstring(const wchar_t* str);
};
wstring myString = L"hello world"; // implicit cast from wchar_t* to wstring
The only difference between the two examples you've given is that in the first you are passing an rvalue (which you can only bind to a const reference) and in the second you are passing an lvalue (which you can bind to both const and non-const reference).
In given examples there is no much difference, because compiler will generate constant data containing your literal and use it in both cases.
In the first case, raw literal will be used from string table of your module. This is as fast as possible code without heap allocation (actually no allocations).
In the second case, compiler will allocate string buffer in the heap, which results into malloc() call and strcpy(). This will increase time of your code and cause more memory fragmentation.
You should use std string classes only when you really need to use their useful methods, otherwise, TCHAR buffers are just excellent choice.

C++ string declaration

I am learning C++ from the beginning and I don't get the whole strings topic.
What is the difference between the following three codes?
std::string s = std::string("foo");
std::string s = new std::string("foo");
std::string s = "foo";
std::string s = std::string("foo");
This creates a temporary std::string object containing "foo", then assigns it to s. (Note that compilers may elide the temporary. The temporary elison in this case is explicitly allowed by the C++ standard.)
std::string s = new std::string("foo");
This is a compiler error. The expression new std::string("foo") creates an std::string on the free store and returns a pointer to an std::string. It then attempts to assign the returned pointer of type std::string* to s of type std::string. The design of the std::string class prevents that from happening, so the compile fails.
C++ is not Java. This is not how objects are typically created, because if you forget to delete the returned std::string object you will leak memory. One of the main benefits of using std::string is that it manages the underlying string buffer for you automatically, so new-ing it kind of defeats that purpose.
std::string s = "foo";
This is essentially the same as #1. It technically initializes a new temporary string which will contain "foo", then assigns it to s. Again, compilers will typically elide the temporary (and in fact pretty much all non-stupid compilers nowadays do in fact eliminate the temporary), so in practice it simply constructs a new object called s in place.
Specifically it invokes a converting constructor in std::string that accepts a const char* argument. In the above code, the converting constructor is required to be non-explicit, otherwise it's a compiler error. The converting constructor is in fact non-explicit for std::strings, so the above does compile.
This is how std::strings are typically initialized. When s goes out of scope, the s object will be destroyed along with the underlying string buffer. Note that the following has the same effect (and is another typical way std::strings are initialized), in the sense that it also produces an object called s containing "foo".
std::string s("foo");
However, there's a subtle difference between std::string s = "foo"; and std::string s("foo");, one of them being that the converting constructor can be either explicit or non-explicit in the above case.
std::string s = std::string("foo");
This is called copy initialization. It is functionally the same as direct initialization
std::string s( "foo" );
but the former does require that the copy constructor is available and compilers may create a temporary object but most will elide the temporary and directly construct s to contain "foo".
std::string s = new std::string("foo");
This will not compile because new returns a pointer. To make it work you'd need the type of s to be a std::string *. Then the line dynamically allocates an std::string object and stores the pointer in s. You'll need to delete it once you're done using it.
std::string s = "foo";
This is almost the same as first. It is copy initialization but it has an added constraint. It requires that the std::string class contains a non-explicit constructor that takes a const char *. This allows the compiler to implicitly construct a temporary std::string object. After that the semantics are identical to case 1.
Creates a temporary string object and copies the value to s
Does not compile, new std::string("foo") returns a pointer to some newly allocated memory.
For this to work, you should declare s as a pointer to a string std::string* s.
Constructs a string from a C-string.
You should use the third option in most - if not all - cases.
1 will create a temporary variable (right hand side), then call the assignment operator to assign the value to s
2 will create an instance of std::string on the heap and return a pointer to it, and will fail in the assignment because you can't assign a pointer to a non-pointer type
3 will build a std::string and initialize it from a const char*
On the number 1, you are creating a temporary string using the constructor and then assigning it to s.
Number 2 doesn't even compile.
On number 3, you are creating a new string and then assign a value to it.