Assign a nullptr to a std::string is safe? - c++

I was working on a little project and came to a situation where the following happened:
std::string myString;
#GetValue() returns a char*
myString = myObject.GetValue();
My question is if GetValue() returns NULL myString becomes an empty string? Is it undefined? or it will segfault?

Interesting little question. According to the C++11 standard, sect. 21.4.2.9,
basic_string(const charT* s, const Allocator& a = Allocator());
Requires: s shall not be a null pointer.
Since the standard does not ask the library to throw an exception when this particular requirement is not met, it would appear that passing a null pointer provoked undefined behavior.

It is runtime error.
You should do this:
myString = ValueOrEmpty(myObject.GetValue());
where ValueOrEmpty is defined as:
std::string ValueOrEmpty(const char* s)
{
return s == nullptr ? std::string() : s;
}
Or you could return const char* (it makes better sense):
const char* ValueOrEmpty(const char* s)
{
return s == nullptr ? "" : s;
}
If you return const char*, then at the call-site, it will convert into std::string.

My question is if GetValue() returns NULL myString becomes an empty string? Is it undefined? or it will segfault?
It's undefined behavior. The compiler and run time can do whatever it wants and still be compliant.

Update:
Since C++23 adopted P2166, it is now forbidden to construct std::string from nullptr, that is, std::string s = nullptr or std::string s = 0 will no longer be well-formed.

Related

C++ 2440 error - compiler thinks string is const char?

So I have this little snippet where it thinks "abc" isn't a string but rather a const char [4], and so I can't assign it to my object. I've searched but haven't found any working solutions. Thanks in advance.
Tekst t = "abc";
Tekst Tekst::operator=(std::string& _text){
return Tekst(_text);
}
Edit: since this is a major staple of almost every exercise in my Object Oriented Programming class, due to whatever reasons, we can't change anything that's in int main(), so changing Tekst t = "abc"; is a no-go.
Edit 2:Tekst(std::string _text) :text(_text) {};
Compiler doesn't think "abc" is a const char [4]. It is const char [4] and you think that it should be std::string, which is not correct. std::string can be implicitly constructed from const char *, but they are nowhere near the same.
You problem is actually that you're trying to bind a temporary to a non-const reference, which is impossible in C++. You should change the definition of your operator to
Tekst Tekst::operator=(const std::string& _text){
// ^ const here
return Tekst(_text);
}
This will make your operator technically valid (as in, it compiles and there is no Undefined Behaviour). However, it does something very non intuitive. Consider the following:
Tekst t;
t = "abc";
In this example, t will not have any "abc" inside. The newly returned object is discarded and t is unchanged.
Most likely, your operator should look like this:
Tekst& Tekst::operator=(const std::string& _text){
this->text = _text; //or however you want to change your object
return *this;
}
Refer to the basic rules and idioms for operator overloading for more information about what is and what isn't expected in each operator.
On a semi-related note, you can have std::string from literal in C++14 and up:
#include <string>
using namespace std::string_literals;
int main() {
auto myString = "abc"s;
//myString is of type std::string, not const char [4]
}
However, this wouldn't help with your case, because the main problem was binding a temporary to non-const reference.
Thanks to #Ted Lyngmo, "This should help: godbolt.org/z/j_RTHu", turns out all i had to do was add a separate constructor for taking in const char*
Tekst(const char* cstr) : Tekst(std::string(cstr)) {}
Tekst t = "abc"; is just syntax sugar for Tekst t("abc"); which means it does not even consider your operator= at all, it uses the class's constructor instead.
In order for Tekst t = "abc"; to compile, you need a constructor that accepts either a const char* or a const std::string& as input. However, the constructor you showed takes a std::string by value instead of by const reference, so it can't be used for string literals. So you need to add a new constructor for that:
Tekst(const char *_text) :text(_text) {};

Why does string (const char* s, size_t pos, size_t len = npos) work?

It's not listed explicitly in std::string constructor doc, (EDIT: folks here says I should cite actual cppreference not cplusplus.com) but apparently it works. That means it's like the equivalent of strncpy, isn't it?
Does it work because it implicitly first initializes another std::string object that's a copy of the const char* string passed in? Does it mean it does extra work of copying the entire string though, even if it eventually only extracts a certain length of substring?
Also it seems such construction is kind of like string (const char* s+pos, size_t len) except the reference says here if len is greater than string length, it causes undefined behavior; yet in string (const char* s, size_t pos, size_t len = npos) if len is longer passed null terminator it's just fine. Presumably that's because, I guess, this internally is dealing with stuff on cpp string object level and the former is messing with pointers.
And why doesn't that behavior gets listed in c++ reference doc?
My guess is it's a kind of weird combination of internally copy to std::string object and then apply string (const string& str, size_t pos, size_t len = npos) to it, so it's not considered "standard". That said, I find this super useful, when I have to take input as char*, while I pretty much don't care about copying the entire string once, yet I can get away doing any malloc and strncpy and neither do I want to write code to branch out in making sure size limit len doesn't go out of bound.
This works because of the presence of constructor:
std::basic_string( const basic_string& other,
size_type pos,
size_type count = std::basic_string::npos,
const Allocator& alloc = Allocator() );
const char * is implicitly convertible to std::basic_string, so the above constructor is called after said conversion when you write (for example) std::string s {"abc", 1, 2};
Live demo
To address your question of efficiency, the implicit conversion from char * to std::basic_string involves construction of a temporary, so yes, the string is copied.

Why isn't it a compile-time error to return a nullptr as a std::string?

Due to a bug, I just found out that this code compiles fine on with Visual Studio 17 and probably on other compilers as well. Now I'm curious why?
#include <iostream>
#include <string>
std::string foo(){
return nullptr;
}
int main(){
auto s = foo();
std::cout << s << std::endl;
}
I could imagine it is because the std::basic_string c'tor could be invoked with a char* and while returning an implicit conversion from ptr to std::string occurs (with NULL as argument and then goes poof). Am I on the right way?
Yes, your assumption is right, checking std::basic_string constructors #5 will be called:
basic_string( const CharT* s,
const Allocator& alloc = Allocator() );
Note that passing nullptr invokes undefined behavior as stated in the standard and the notes :
The behavior is
undefined if [s, s + Traits::length(s)) is not a valid range (for
example, if s is a null pointer).
Why shouldn't it compile? std::string has the following constructor:
string(const CharT* s, const Allocator& alloc = Allocator());
that constructs the string with the contents initialized with a copy of the null-terminated character string pointed to by s. The constructor is not explicit, so the implicit conversion from nullptr to std::string is indeed possible.

What is happening when I initialize a std::string with 0?

For instance, lets say I have
std::string str = 0;
Is the 0 being converted to a const char*?
Is it being coverted to a char* and then to a const char* when it's passed to the constructor?
I understand initializing a char* to 0 is undefined behavior, and as far as I know the same goes for const char*, but I don't understand the process of what's going on when I pass 0 to the std::string constructor.
Edit: I was wrong.
You are correct in your guessing.
If we look at e.g. this std::string constructor reference we can see that the only suitable constructor is number 5. Therefore your definition is equal to
std::string str = std::string(0);
And as noted in the reference:
The behavior is undefined if s does not point at an array of at least Traits::length(s)+1 elements of CharT, including the case when s is a null pointer.
[Emphasis mine]
So yes it constructs a std::string from the null-pointer which is indeed UB.
I understand initializing a char* to 0 is undefined behavior
You understand wrong. A 0 literal can be converted to a null pointer constant of any pointer type. There's nothing undefined there. The issues come when there's overloading involved, and the 0 can be converted not just to a pointer, but to another integral type. But that conversion itself is not problematic on its own.
Which brings us to what std::string str = 0; does. It initializes str, a class type, from 0. So we need to examine constructors, the only applicable one for 0 is this one:
basic_string( const CharT* s,
const Allocator& alloc = Allocator() );
So it indeed initializes str from a null pointer. And that is what's undefined.

returning a reference to mapped char* from operator[]

I have a data structure:
#include <map>
struct array{
map<const char*, char*> data;
//constructor
array(const char* key, char* value = ""){
data.insert(pair<const char*, char*>(key, value));
}
//overloaded operator[] seems to be my problem
char* operator[](const char* key) { return (char*)data[key]; }
};
now, without overloading the assignment operator=, I test-drove
it like this:
array var("first", "second");
var["third"] = "fourth"; //and my compiler (gcc) is angry about this
Now, my compiler returned the following error:
functions.cpp:13:18: error: lvalue required as left operand of assignment
Question: is there anything am failing to understand? How can I
return the address of map::data["key"] from operator[],
so that var["third"] = "fourth"; works properly? Mind you, I don't want to do this with c++'s string type. strictly char*.
is there anything am failing to understand?
You are returning the pointer by value. That means the caller will receive a copy of the pointer. The caller cannot make changes to the pointer within the map, using a copy of the pointer in the map.
how can i return the address of map::data["key"] from operator[], so that var["third"] = "fourth"; works properly ?
Return a reference to it:
char*& operator[](const char* key) { /* ... */ }
In order for that to work, you need to get rid of the redundant cast:
return data[key];
Another problem in your program is that you store non-const char* in your map but you initialize those pointers with string literals, which are const. Such conversion is illegal in c++11, which means that your program is ill-formed. Even before c++11, such conversion has been deprecated since standard c++ has existed.
The danger of doing this is that you may accidentally modify const string through the non-const pointer, which would result in undefined behaviour.
Solution: Use const char* pointers in the map, if the modification of the string contents is not needed. If modification is needed, then instead point to separately allocated char arrays that are copied from the string literals. The simplest way to do the latter is to use std::string as the value type, but if you don't want to do that, then you can manage the arrays yourself.
Third problem in your program is that you appear to assume that var["third"] is guaranteed to find a key that was initialized with "third". That assumption is wrong. Separate - but identical - string literals are not guaranteed to have the same address.
Solution: Use std::string as the key, or use custom a comparison functor that compares the strings based on their content. Hint: Use std::strcmp to implement the functor.
P.S. you don't appear to have any overloads for operator[](...), so it's not "overloaded".
is there anything am failing to understand ?
Yes, I think you're confused about your code and its intention. In strict terms, the answer is that you're trying to use an rvalue in a place where only lvalues are allowed, i.e. on the lhs of an assignment. To fix that immediate problem, you would need to change your operator to:
char*& operator[](const char* key) { return data[key]; }
(i.e. returning a reference to the pointer contained in the map).
This would compile, but I don't think this structure would do what you want it to. For example, modifying the contents strings of map's entries like this:
var["third"][0] = 'a';
would be undefined behaviour if you used string literals to populate it, as you do in your example.
Better to take the advice of the commenters and switch to using std::strings.
To make it compile you would have to change your operator[] signature to:
char*& operator[](const char* key) { return data[key]; }
but then you will get warnings and possible UB, as string literal is const array, while you want to assign it to non const char*. To silence those warning you would have to make additional changes, all char* to const char*:
struct array{
std::map<const char*, const char*> data;
//constructor
array(const char* key, const char* value = ""){
data.insert(std::pair<const char*, const char*>(key, value));
}
//overloaded operator[] seems to be my problem
const char*& operator[](const char* key) { return data[key]; }
};
But maybe this is not what you want.
The best solution is to switch your std::map<const char*, char*> to std::map<std::string, std::string>