Is there a reason why in c++ std::string is not implicitly converted to bool? For example
std::string s = ""
if (s) { /* s in not empty */ }
as in other languages (e.g. python). I think it is tedious to use the empty method.
This probably could be added now that C++11 has added the concepts of explicit conversions and contextual conversion.
When std::string was designed, neither of these was present though. That made classes that supported conversion to bool fairly difficult to keep safe. In particular, that conversion could (and would) happen in lots of cases you almost never wanted it to. For example, if we assume std::string converts to false if empty and otherwise to true, then you could use a string essentially anywhere an integer or pointer was intended.
Rather than telling you about the type mismatch, the compiler would convert the string to bool, and then the bool to an integer (false -> 0, true -> 1).
Things like this happened often enough with many early attempts at string types (and there were many) that the committee apparently decided it was better to keep implicit conversions to an absolute minimum (so about the only implicit conversion supported by string is to create a string object from a C-style string).
There were a number of methods devised for handling conversion to bool more safely. One was converting to void * instead, which prevented some problems, but not others (this was used by iostreams). There was also a "safe bool" idiom (actually, more like a "safe bool" theme, of which there were several variations). While these certainly improved control over what conversions would and wouldn't be allowed, most of them involved a fair amount of overhead (a typical safe bool required a base class of ~50 lines of code, plus derivation from that base class, etc.)
As to how explicit conversion and contextual conversion would help, the basic idea is pretty simple. You can (starting with C++11) mark a conversion function as explicit, which allows it to be used only where an explicit cast to the target type is used:
struct X {
explicit operator bool() { return true; }
};
int main() {
X x;
bool b1 = static_cast<bool>(x); // compiles
bool b2 = x; // won't compile
}
Contextual conversion adds a little to let the conversion to bool happen implicitly, but only in something like an if statement, so using a class with the conversion function above, you'd get:
X x;
if (x) // allowed
int y = x; // would require explicit cast to compile
I'd add that complaints about "orthogonality" seem quite inapplicable here. Although convenient, converting a string to a Boolean doesn't really make a lot of sense. If anything, we should complain about how strange it is for string("0") to convert to 1 (in languages where that happens).
This article mentions some reasons why operator bool() can lead to surprising results.
Note that std::string is just a typedef for std::basic_string<char>. There is also std::wstring for multi-byte characters. An implicit conversion would let you write:
std::string foo = "foo";
std::wstring bar = "bar";
if (foo == bar) {
std::cout << "This will be printed, because both are true!\n";
}
std::string still has to coexist with C-style strings.
A C-style string is by definition "a contiguous sequence of characters terminated by and including the first null character", and is generally accessed via a pointer to its first character. An expression such as "hello, world" is, in most contexts, implicitly converted to a pointer to the first character. Such a pointer may then be implicitly converted to bool, yielding true if the pointer is non-null, false if it's null. (In C, it's not converted to bool, but it can still be used directly as a condition, so the effect is nearly the same.)
So, due to C++'s C heritage, if you write:
if ("") { ... }
the empty string is already treated as true, and that couldn't easily be changed without breaking C compatibility.
I suggest that having a C-style empty string evaluate as true and a C++ empty std::string evaluate as false would be too confusing.
And writing if (!s.empty()) isn't that difficult (and IMHO it's more legible).
The closest thing to what you (and I) want, that I've been able to find, is the following.
You can define the ! operator on std::string's like so:
bool operator!(const std::string& s)
{
return s.empty();
}
This allows you to do:
std::string s;
if (!s) // if |s| is empty
And using a simple negation, you can do:
if (!!s) // if |s| is not empty
That's a little awkward, but the question is, how badly do you want to avoid extra characters? !strcmp(...) is awkward, too, but we still functioned and got used to it, and many of us preferred it because it was faster than typing strcmp(...) == 0.
If anyone discovers a way to do if (s), in C++, please let us know.
Related
In this proposal:
N3830 Scoped Resource - Generic RAII Wrapper for the Standard Library
a scoped_resource RAII wrapper is presented.
On page 4, there is some code like this:
auto hFile = std::make_scoped_resource(
...
);
...
// cast operator makes it seamless to use with other APIs needing a HANDLE
ReadFile(hFile, ...);
The Win32 API ReadFile() takes a raw HANDLE parameter, instead hFile is an instance of scoped_resource, so to make the above code work, there is an implicit cast operator implemented by scoped_resource.
But, wasn't the "modern" recommendation to avoid these kind of implicit conversions?
For example, ATL/MFC CString has an implicit conversion (cast operator) to LPCTSTR (const char/wchar_t*, i.e. a raw C-string pointer), instead STL strings have an explicit c_str() method.
Similarly, smart pointers like unique_ptr have an explicit get() method to access the underlying wrapped pointer; and that recommendation against implicit conversion seems also present in this blog post:
Reader Q&A: Why don’t modern smart pointers implicitly convert to *?
So, are these implicit conversions (like ATL/MFC CString and the newly proposed scoped_resource) good or not for modern C++?
From a coding perspective, I'd say that being able to simply directly pass a RAII wrapper - be it CString or scoped_resource - to a C API expecting a "raw" parameter (like a raw C string pointer, or raw handle), relying on implicit conversions, and without calling some .GetString()/.get() method, seems very convenient.
Below is quote from C++ Primer 5th edition.
Caution: Avoid Overuse of Conversion Functions
As with using overloaded operators, judicious use of conversion operators can
greatly simplify the job of a class designer and make using a class easier.
However, some conversions can be misleading. Conversion operators are
misleading when there is no obvious single mapping between the class type
and the conversion type.
For example, consider a class that represents a Date. We might think it
would be a good idea to provide a conversion from Date to int. However,
what value should the conversion function return? The function might return
a decimal representation of the year, month, and day. For example, July 30, 1989 might be represented as the int value 19800730. Alternatively, the conversion operator might return an int representing the number of days that have elapsed since some epoch point, such as January 1, 1970. Both these conversions have the desirable property that later dates correspond to larger integers, and so either might be useful.
The problem is that there is no single one-to-one mapping between an
object of type Date and a value of type int. In such cases, it is better not
to define the conversion operator. Instead, the class ought to define one or
more ordinary members to extract the information in these various forms.
So, what I can say is that in practice, classes should rarely provide conversion operators. Too often users are more likely to be surprised if a conversion happens automatically than to be helped by the
existence of the conversion. However, there is one important exception to this rule of
thumb: It is not uncommon for classes to define conversions to bool.
Under earlier versions of the standard, classes that wanted to define a conversion to
bool faced a problem: Because bool is an arithmetic type, a class-type object that is
converted to bool can be used in any context where an arithmetic type is expected.
Such conversions can happen in surprising ways. In particular, if istream had a
conversion to bool, the following code would compile:
int i = 42;
cin << i; // this code would be legal if the conversion to bool were not explicit!
You should use explicit conversion operators. Below is a little example:
class small_int
{
private:
int val;
public:
// constructors and other members
explicit operator int() const { return this->val; }
}
... and in the program:
int main()
{
SmallInt si = 3; // ok: the SmallInt constructor is not explicit
si + 3; // error: implicit is conversion required, but operator int is explicit
static_cast<int>(si) + 3; // ok: explicitly request the conversion
return 0;
}
I think implicit conversions are good/convenient for the application programmer, but bad for library developers as they introduce complexity and other issues: A good example is the interplay between implicit conversion and template deduction.
From a library perspective, it is easier to constrain language features to a minimal set. For that, you could even say that library is easier to maintain if you remove some of the overloading and default parameters. It's less convenient for application programmers, but there is less ambiguity and complexity.
So, it's really a choice between convenience and simplicity.
It highly depends on exact conversion type, but, in general, there are some areas where implicit conversions may introduce problems, whereas explicit rather may not.
Generally, doing anything explicitly in programming is a more safe way.
There is a good source of information on C++ in general and on implicit and explicit conversions in particular.
In C++, the use of operator cast can lead to confusion to readers of your code due to it not being obvious that a function call is being invoked. That being said, I've seen its use being discouraged.
However, under what circumstances would using operator cast be appropriate and have value which exceeds any possible confusion it might lead to?
When the conversion is natural and has no side effects it can be useful. Nobody is going to argue that an automatic conversion from int to double is inappropriate for example, even if you can come up with a corner case that makes it confusing (and I'm not sure anybody can).
I've found the conversion from Microsoft's CString to const char * to be incredibly handy, even though I know others disagree. I wouldn't mind seeing a similar capability in std::string.
Operator casts are very useful in a C++ idiom of wrapper objects. For example, suppose you have some copy-on-write implementation of a string class. You want your users to be able to index it naturally, like
const String s = "abc";
assert(s[0] == 'a');
// given
char String::operator[](int) const
So far, you'd think this would work. Yet what happens when someone wants to modify your string? Perhaps this will work, then?
String s = "abc";
s[0] = 'z';
assert(s[0] == 'z');
// given
char & String::operator[](int)
But this implementation gives a reference to a non-const character. So someone can always use that reference to modify the string. So, before it hands out the reference, it has to perform a copy of the string internally, so that other strings won't be modified. Thus it's not possible to use operator[] on non-const strings without forcing a copy. What to do?
Instead of returning a character reference, you can return a wrapper object with following interface:
class CharRef {
public:
operator char() const;
CharRef & operator=(char);
};
The char() conversion operator simply returns a copy of the character stored in the string. When you assign to the wrapper, though, the operator=(char) will force the string to perform an internal copy if the reference count is >1, and modify that copy instead.
The wrapper's implementation may, for example, hold the char and a pointer to the string (probably some subpart of the string's implementation).
I understand that the keyword explicit can be used to prevent implicit conversion.
For example
Foo {
public:
explicit Foo(int i) {}
}
My question is, under what condition, implicit conversion should be prohibited? Why implicit conversion is harmful?
Use explicit when you would prefer a compiling error.
explicit is only applicable when there is one parameter in your constructor (or many where the first is the only one without a default value).
You would want to use the explicit keyword anytime that the programmer may construct an object by mistake, thinking it may do something it is not actually doing.
Here's an example:
class MyString
{
public:
MyString(int size)
: size(size)
{
}
//... other stuff
int size;
};
With the following code you are allowed to do this:
int age = 29;
//...
//Lots of code
//...
//Pretend at this point the programmer forgot the type of x and thought string
str s = x;
But the caller probably meant to store "3" inside the MyString variable and not 3. It is better to get a compiling error so the user can call itoa or some other conversion function on the x variable first.
The new code that will produce a compiling error for the above code:
class MyString
{
public:
explicit MyString(int size)
: size(size)
{
}
//... other stuff
int size;
};
Compiling errors are always better than bugs because they are immediately visible for you to correct.
It introduces unexpected temporaries:
struct Bar
{
Bar(); // default constructor
Bar( int ); // value constructor with implicit conversion
};
void func( const Bar& );
Bar b;
b = 1; // expands to b.operator=( Bar( 1 ));
func( 10 ); // expands to func( Bar( 10 ));
A real world example:
class VersionNumber
{
public:
VersionNumber(int major, int minor, int patch = 0, char letter = '\0') : mMajor(major), mMinor(minor), mPatch(patch), mLetter(letter) {}
explicit VersionNumber(uint32 encoded_version) { memcpy(&mLetter, &encoded_version, 4); }
uint32 Encode() const { int ret; memcpy(&ret, &mLetter, 4); return ret; }
protected:
char mLetter;
uint8 mPatch;
uint8 mMinor;
uint8 mMajor;
};
VersionNumber v = 10; would almost certainly be an error, so the explicit keyword requires the programmer to type VersionNumber v(10); and - if he or she is using a decent IDE - they will notice through the IntelliSense popup that it wants an encoded_version.
Mostly implicit conversion is a problem when it allows code to compile (and probably do something strange) in a situation where you did something you didn't intend, and would rather the code didn't compile, but instead some conversion allows the code to compile and do something strange.
For example, iostreams have a conversion to void *. If you're bit tired and type in something like: std::cout << std::cout; it will actually compile -- and produce some worthless result -- typically something like an 8 or 16 digit hexadecimal number (8 digits on a 32-bit system, 16 digits on a 64-bit system).
At the same time, I feel obliged to point out that a lot of people seem to have gotten an almost reflexive aversion to implicit conversions of any kind. There are classes for which implicit conversions make sense. A proxy class, for example, allows conversion to one other specific type. Conversion to that type is never unexpected for a proxy, because it's just a proxy -- i.e. it's something you can (and should) think of as completely equivalent to the type for which it's a proxy -- except of course that to do any good, it has to implement some special behavior for some sort of specific situation.
For example, years ago I wrote a bounded<T> class that represents an (integer) type that always remains within a specified range. Other that refusing to be assigned a value outside the specified range, it acts exactly like the underlying intger type. It does that (largely) by providing an implicit conversion to int. Just about anything you do with it, it'll act like an int. Essentially the only exception is when you assign a value to it -- then it'll throw an exception if the value is out of range.
It's not harmful for the experienced. May be harmful for beginner or a fresher debugging other's code.
"Harmful" is a strong statement. "Not something to be used without thought" is a good one. Much of C++ is that way (though some could argue some parts of C++ are harmful...)
Anyway, the worst part of implicit conversion is that not only can it happen when you don't expect it, but unless I'm mistaken, it can chain... as long as an implicit conversion path exists between type Foo and type Bar, the compiler will find it, and convert along that path - which may have many side effects that you didn't expect.
If the only thing it gains you is not having to type a few characters, it's just not worth it. Being explicit means you know what is actually happening and won't get bit.
To expand Brian's answer, consider you have this:
class MyString
{
public:
MyString(int size)
: size(size)
{
}
// ...
};
This actually allows this code to compile:
MyString mystr;
// ...
if (mystr == 5)
// ... do something
The compiler doesn't have an operator== to compare MyString to an int, but it knows how to make a MyString out of an int, so it looks at the if statement like this:
if (mystr == MyString(5))
That's very misleading since it looks like it's comparing the string to a number. In fact this type of comparison is probably never useful, assuming the MyString(int) constructor creates an empty string. If you mark the constructor as explicit, this type of conversion is disabled. So be careful with implicit conversions - be aware of all the types of statements that it will allow.
I use explicit as my default choice for converting (single parameter or equivalent) constructors. I'd rather have the compiler tell me immediately when I'm converting between one class and another and make the decision at that point if the conversion is appropriate or instead change my design or implementation to remove the need for the conversion completely.
Harmful is a slightly strong word for implicit conversions. It's harmful not so much for the initial implementation, but for maintenance of applications. Implicit conversions allow the compiler to silently change types, especially in parameters to yet another function call - for example automatically converting an int into some other object type. If you accidentally pass an int into that parameter the compiler will "helpfully" silently create the temporary for you, leaving you perplexed when things don't work right. Sure we can all say "oh, I'll never make that mistake", but it only takes one time debugging for hours before one starts thinking maybe having the compiler tell you about those conversions is a good idea.
In C++ is it possible to define conversion operators which are not class members? I know how to do that for regular operators (such as +), but not for conversion operators.
Here is my use case: I work with a C Library which hands me out a PA_Unichar *, where the library defines PA_Unichar to be a 16-bit int. It is actually a string coded in UTF-16. I want to convert it to a std::string coded in UTF-8. I have all the conversion code ready and working, and I am only missing the syntactic sugar that would allow me to write:
PA_Unichar *libOutput = theLibraryFunction();
std::string myString = libOutput;
(usually in one line without the temp variable).
Also worth noting:
I know that std::string doesn't define implicit conversion from char* and I know why. The same reason might apply here, but that's beside the point.
I do have a ustring, subclass of std::string that defines the right conversion operator from PA_Unichar*. It works but this means using ustring variables instead of std::string and that then requires conversion to std::string when I use those strings with other libraries. So that doesn't help very much.
Using an assignment operator doesn't work as those must be class members.
So is it possible to define implicit conversion operators between two types you don't control (in my case PA_Unichar* and std::string), which may or may not be class types?
If not what could be workarounds?
What's wrong with a free function?
std::string convert(PA_Unichar *libOutput);
std::string myString = convert(theLibraryFunction());
Edit answering to the comment:
As DrPizza says: Everybody else is trying to plug the holes opened up by implicit conversions through replacing them with those explicit conversion which you call "visual clutter".
As to the temporary string: Just wait for the next compiler version. It's likely to come with rvalue references and its std::string implementation will implement move semantics on top of that, which eliminates the copy. I have yet to see a cheaper way to speedup your code than than by simply upgrading to a new compiler version.
I don't think you can define "global" conversion operators. The standards say that conversion functions are special member functions. I would propse the following if I could consider the following syntax sugar:
struct mystring : public string
{
mystring(PA_Unichar * utf16_string)
{
// All string functionality is present in your mystring.
// All what you have to do is to write the conversion process.
string::operator=("Hello World!");
string::push_back('!');
// any string operation...
}
};
Be aware that polymorphic behavior of this class is broken. As long as you don't create an object of it through a pointer of type string* though, you are in the safe-side! So, this code is perfect:
mystring str(....);
As said before, the following code is broken!
string* str = new mystring(....);
....
delete str; // only deleting "string", but not "mystring" part of the object
// std::string doesn't define virtual destructor
Implicit conversions are the devil, anyway. Make it explicit with a converting function call.
No, you can't. What you could do as an alternative is to create a conversion constructor in the target class (not your case, as you want to convert to std::string - unless you derive it). But I agree to the other answers, I think an implicit conversion is not recommended in this case - especially because you're not converting from an object but from a pointer. Better to have a free function, your code will be easier to understand and the next programmer to inherit the code will for sure thank you.
std::string provides const char* c_str ( ) const which:
Get C string equivalent
Generates a null-terminated sequence
of characters (c-string) with the same
content as the string object and
returns it as a pointer to an array of
characters.
A terminating null character is
automatically appended.
The returned array points to an
internal location with the required
storage space for this sequence of
characters plus its terminating
null-character, but the values in this
array should not be modified in the
program and are only granted to remain
unchanged until the next call to a
non-constant member function of the
string object.
Why don't they just define operator const char*() const {return c_str();}?
From the C++ Programming Language 20.3.7 (emphasis mine):
Conversion to a C-style string could have been provided by an operator const char*() rather than c_str(). This would have provided the convenience of an implicit conversion at the cost of surprises in cases in which such a conversion was unexpected.
I see at least two problems with the implicit conversion:
Even the explicit conversion that c_str() provides is dangerous enough as is. I've seen a lot of cases where the pointer was stored to be used after the lifetime of the original string object had ended (or the object was modified thus invalidating the pointer). With the explicit call to c_str() you hopefully are aware of these issues. But with the implicit conversion it would be very easy to cause undefined behavior as in:
const char *filename = string("/tmp/") + name;
ofstream tmpfile(filename); // UB
The conversion would also happen in some cases where you wouldn't expect it and the semantics are surprising to say the least:
string name;
if (name) // always true
;
name-2; // pointer arithmetic + UB
These could be avoided by some means but why get into this trouble in the first place?
Because implicit conversions almost never behave as you expect. They can give surprising results in overload resolution, so it's usually better to provide an explicit conversion as std::string does.
The Josuttis book says the following:
This is for safety reasons to prevent unintended type conversions that result in strange behavior (type char * often has strange behavior) and ambiguities (for example, in an expression that combines a string and a C-string it would be possible to convert string into char * and vice versa).
I addition to the rationale provided in the specification (unexpected surprises), if you're mixing C API calls with std::string, you really need to get into the habit of using the ::c_str() method. If you ever call a varargs function (eg: printf, or equivalent) which requires a const char*, and you pass a std::string directly (without calling the extraction method), you won't get a compile error (no type checking for varargs functions), but you will get a runtime error (class layout is not binary identical to a const char*).
Incidentally, CString (in MFC) takes the opposite approach: it has an implicit cast, and the class layout is binary-compatible with const char* (or const w_char*, if compiling for wide character strings, ie: "Unicode").
That's probably because this conversion would have surprising and peculiar semantics. Particularly, the fourth paragraph you quote.
Another reason is that there is an implicit conversion const char* -> string, and this would be just the converse, which would mean strange behavior wrt overload resolution (you shouldn't make both implicit conversions A->B and B->A).
Because C-style strings are a source of bugs and many security problems it's way better to make the conversion explicitly.