Why doesn't std::string provide implicit conversion to char*? - c++

std::string provides const char* c_str ( ) const which:
Get C string equivalent
Generates a null-terminated sequence
of characters (c-string) with the same
content as the string object and
returns it as a pointer to an array of
characters.
A terminating null character is
automatically appended.
The returned array points to an
internal location with the required
storage space for this sequence of
characters plus its terminating
null-character, but the values in this
array should not be modified in the
program and are only granted to remain
unchanged until the next call to a
non-constant member function of the
string object.
Why don't they just define operator const char*() const {return c_str();}?

From the C++ Programming Language 20.3.7 (emphasis mine):
Conversion to a C-style string could have been provided by an operator const char*() rather than c_str(). This would have provided the convenience of an implicit conversion at the cost of surprises in cases in which such a conversion was unexpected.

I see at least two problems with the implicit conversion:
Even the explicit conversion that c_str() provides is dangerous enough as is. I've seen a lot of cases where the pointer was stored to be used after the lifetime of the original string object had ended (or the object was modified thus invalidating the pointer). With the explicit call to c_str() you hopefully are aware of these issues. But with the implicit conversion it would be very easy to cause undefined behavior as in:
const char *filename = string("/tmp/") + name;
ofstream tmpfile(filename); // UB
The conversion would also happen in some cases where you wouldn't expect it and the semantics are surprising to say the least:
string name;
if (name) // always true
;
name-2; // pointer arithmetic + UB
These could be avoided by some means but why get into this trouble in the first place?

Because implicit conversions almost never behave as you expect. They can give surprising results in overload resolution, so it's usually better to provide an explicit conversion as std::string does.

The Josuttis book says the following:
This is for safety reasons to prevent unintended type conversions that result in strange behavior (type char * often has strange behavior) and ambiguities (for example, in an expression that combines a string and a C-string it would be possible to convert string into char * and vice versa).

I addition to the rationale provided in the specification (unexpected surprises), if you're mixing C API calls with std::string, you really need to get into the habit of using the ::c_str() method. If you ever call a varargs function (eg: printf, or equivalent) which requires a const char*, and you pass a std::string directly (without calling the extraction method), you won't get a compile error (no type checking for varargs functions), but you will get a runtime error (class layout is not binary identical to a const char*).
Incidentally, CString (in MFC) takes the opposite approach: it has an implicit cast, and the class layout is binary-compatible with const char* (or const w_char*, if compiling for wide character strings, ie: "Unicode").

That's probably because this conversion would have surprising and peculiar semantics. Particularly, the fourth paragraph you quote.
Another reason is that there is an implicit conversion const char* -> string, and this would be just the converse, which would mean strange behavior wrt overload resolution (you shouldn't make both implicit conversions A->B and B->A).

Because C-style strings are a source of bugs and many security problems it's way better to make the conversion explicitly.

Related

What is the difference between two 'char*' castings in c++

I do have a "C" function
find_register(char *name)
which is called from a "C++" routine. The first tip to call it was
find_register("%ebx")
It results in a warning
deprecated conversion from string constant to 'char*'
OK, I corrected it to
find_register(string("%ebx").c_str())
which results in the error message
"invalid conversion from 'const char*' to 'char*'
Finally,
find_register((char*)string("%ebx").c_str())
is accepted without error message and warnings.
My questions:
1./ The probable reason of the error message is that the possibility of changing a 'const char*' is left open. It is OK, but in the first case the less sophisticated version allows the same, and converting a string constant to 'char *' is a legal, but deprecated conversion. Is there any deeper reason behind?
2./ Is there any simple method to do what I want? (the perfect (for the compiler) code is hardly readable for the programmer)
The reasoning lies in the original roots of C++ language, back in the time when certain amount of backward compatibility with C was considered important.
In C language string literals, despite being non-modifiable, have type char [N]. For this reason they are implicitly convertible to type char * in C. In C++ string literals have type const char[N]. Formally, const char[N] is not implicitly convertible to char *. But for aforementioned C compatibility reasons the original C++ specification (C++98) allowed implicit conversion of string literals to char *. This exception was made for immediate string literals only, as a form of special treatment provided to string literals.
But eventually the matter of C compatibility became unimportant and this special treatment was deprecated in C++03. So, this is what the compiler is telling you. In your find_register("%ebx") call it agrees to convert the immediate string literal to char *, but warns you that this conversion is deprecated. In C++11 this implicit conversion is outlawed entirely.
In all other contexts (not an immediate string literal), implicit conversion of const char [N] or const char * to char * is prohibited and has always been prohibited. This is why your
find_register(string("%ebx").c_str())`
variant has no chance of compiling.
As for working around this restriction... If you are sure that find_register(char *name) does not attempt to change the data pointed by name (and you cannot just change the find_register's parameter type to const char *name), then
find_register(const_cast<char *>("%ebx"))
is a fairly acceptable solution (with an appropriate accompanying comment). Of course, if find_register is a modifying function, then you'll have to do something like
char reg[] = "%ebx";
find_register(reg);
The prototype for the function find_register(char*) indicates that it may change the parameter since it is just a pointer that is passed. You do not mention if you have the source code of that function so I assume you don't otherwise it would be better to change that function to accept a char const * provided it doesn't change the contents.
Otherwise just pass a modifiable array:
char arg[] = "%ebx";
find_register(arg);
to write:
find_register((char*)string("%ebx").c_str())
is dangerous, what if the function does indeed modify the string e.g. strtok? But if you insist at least try to use the C++ style of casting instead of C casting (in this case find_register(const_cast<char*>("%ebx"))).
Side note: personally I find it easier to read to avoid using the C-way of putting const when possible. So instead of writing const char * write char const * and char * const when the pointer is constant i.e. const always goes to the left - it is more consistent.
If you are passing string literals to find_register(char *name), then the function prototype should be find_register(const char *name), as string literals are not modifiable.
With this conversion, C++ will happily let you shoot yourself in the foot and try to modify name, which gives me a segmentation fault, as in the following code:
void modifyIllegally(char* name) {
name[0]='d';
}
int main() {
modifyIllegally("abc");
}
Edit: If you don't control the API of the C program and are sure it won't change the char*, you can disable this warning with compiler flags. For gcc the flag is -Wno-write-strings.

Disable warning "deprecated conversion from string constant to 'char*' [-Wwrite-strings]"

I have these two lines in my code:
RFM2G_STATUS result;
result = RFM2gOpen( "\\\\.\\rfm2g1", &rH );
I get the error message:
"warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
result = RFM2gOpen( "\\\\.\\rfm2g1", &rH );"
Actually I can not modify it to
const RFM2G_STATUS result;
because RFM2G_STATUS is pre-defined in another file and does not accept const before. Is there another way to disable this warning message?
Like the message says, conversion from const char* to char* (which C++ inherited from ancient C language which didn't have const) has been deprecated.
To avoid this, you can store the parameter in a non-const string, and pass that to the function:
char parameter[] = "\\\\.\\rfm2g1";
RFM2G_STATUS result;
result = RFM2gOpen( parameter, &rH );
That way you avoid the ugly casts.
You seem to be absolutely sure that RFM2gOpen does not modify the input string, otherwise you would have undefined behavior in your code as it stands now.
If you are sure that the input data will not be written to, you can const_cast the constness away safely:
result = RFM2gOpen(const_cast<char*>("\\\\.\\rfm2g1"), &rH );
Again, this is only safe if the routine does not write to the input string, ever, otherwise this is undefined behavior!
If you are not completely sure that this method will never write to the character array, copy the string to an std::vector<char> and pass the .data() pointer to the function (or use a simple char array as Bo Persson suggests, that would most likely be more efficient/appropriate than the vector).
One possible fix is:
RFM2gOpen(const_cast<char*>("\\.\rfm2g1"), &rH);
This may cause a runtime fault if RFM2gOpen tries to modify the string.
The following is less likely to cause a memory fault, but it still undefined behavior:
std::string s("\\.\rfm2g1");
RFM2gOpen(const_cast<char*>(s.c_str()), &rH);
To be fully conformant you need to copy the "\.\rfm2g1" to a mutable buffer. Something like:
char *s = alloca(strlen("\\.\rfm2g1")+1);
strcpy(s, "\\.\rfm2g1");
RFM2gOpen(s, &rH);
The real fix, of course, is for RFM2gOpen to be updated to take a const char*.
It would seem that the function RFM2gOpen() expects a non-const char* as first parameter (see here), as it can sometimes happen with legacy API's (or API's written by lazy coders), and string litterals are of type const char*
so a deprecated implicit conversion is happening (getting rid of the const qualifier).
If you're *100% sure that the function won't modify the pointed-to memory, then and only then can you just put an explicit conversion, e.g. const_cast<char*>("\\\\.\\rfm2g1") or (C-style) (const char*)"\\\\.\\rfm2g1"

Why is conversion from string constant to 'char*' valid in C but invalid in C++

The C++11 Standard (ISO/IEC 14882:2011) says in ยง C.1.1:
char* p = "abc"; // valid in C, invalid in C++
For the C++ it's OK as a pointer to a String Literal is harmful since any attempt to modify it leads to a crash. But why is it valid in C?
The C++11 says also:
char* p = (char*)"abc"; // OK: cast added
Which means that if a cast is added to the first statement it becomes valid.
Why does the casting makes the second statement valid in C++ and how is it different from the first one? Isn't it still harmful? If it's the case, why did the standard said that it's OK?
Up through C++03, your first example was valid, but used a deprecated implicit conversion--a string literal should be treated as being of type char const *, since you can't modify its contents (without causing undefined behavior).
As of C++11, the implicit conversion that had been deprecated was officially removed, so code that depends on it (like your first example) should no longer compile.
You've noted one way to allow the code to compile: although the implicit conversion has been removed, an explicit conversion still works, so you can add a cast. I would not, however, consider this "fixing" the code.
Truly fixing the code requires changing the type of the pointer to the correct type:
char const *p = "abc"; // valid and safe in either C or C++.
As to why it was allowed in C++ (and still is in C): simply because there's a lot of existing code that depends on that implicit conversion, and breaking that code (at least without some official warning) apparently seemed to the standard committees like a bad idea.
It's valid in C for historical reasons. C traditionally specified that the type of a string literal was char * rather than const char *, although it qualified it by saying that you're not actually allowed to modify it.
When you use a cast, you're essentially telling the compiler that you know better than the default type matching rules, and it makes the assignment OK.
You can also use strdup:
char* p = strdup("abc");
or
char p[] = "abc";
as pointed here
You can declare like one of the below options:
char data[] = "Testing String";
or
const char* data = "Testing String";
or
char* data = (char*) "Testing String";

no conversion operator for string class

I am reading about STL string class. It is mentioned as below
STL string class chooses not to define conversion operators, but rather use the c_str() and data() methods for directly accessing the memory. The STL purposely does not include implicit conversion operators to prevent misuse of raw string pointers.
My question is
c_str() returns const char* pointer and still user can modify string value. Am I right?
What does the author mean by "to prevent misuse of raw string pointers"? Please explain, preferably with an example.
Thanks!
No, you cannot use the return value of std::string::c_str() to
modify the string. Trying to do so is undefined behavior. And
the problem was (and still is) the lifetime of the pointer
returned by std::string::c_str(). It becomes invalid if the
string is destructed, or if any non-const function is called on
the string. The issues are things like:
char const* s = string1 + string2;
// s is invalid here.
vs.
char const* s = (string1 + string2).c_str();
// s is invalid here.
In the first case, it's easy to make the mistake, without
realizing it, so the committee decided to not have implicit
conversion, so that this would be illegal. In the second case,
you have to really want to.

In what situations is using operator cast more useful than confusing?

In C++, the use of operator cast can lead to confusion to readers of your code due to it not being obvious that a function call is being invoked. That being said, I've seen its use being discouraged.
However, under what circumstances would using operator cast be appropriate and have value which exceeds any possible confusion it might lead to?
When the conversion is natural and has no side effects it can be useful. Nobody is going to argue that an automatic conversion from int to double is inappropriate for example, even if you can come up with a corner case that makes it confusing (and I'm not sure anybody can).
I've found the conversion from Microsoft's CString to const char * to be incredibly handy, even though I know others disagree. I wouldn't mind seeing a similar capability in std::string.
Operator casts are very useful in a C++ idiom of wrapper objects. For example, suppose you have some copy-on-write implementation of a string class. You want your users to be able to index it naturally, like
const String s = "abc";
assert(s[0] == 'a');
// given
char String::operator[](int) const
So far, you'd think this would work. Yet what happens when someone wants to modify your string? Perhaps this will work, then?
String s = "abc";
s[0] = 'z';
assert(s[0] == 'z');
// given
char & String::operator[](int)
But this implementation gives a reference to a non-const character. So someone can always use that reference to modify the string. So, before it hands out the reference, it has to perform a copy of the string internally, so that other strings won't be modified. Thus it's not possible to use operator[] on non-const strings without forcing a copy. What to do?
Instead of returning a character reference, you can return a wrapper object with following interface:
class CharRef {
public:
operator char() const;
CharRef & operator=(char);
};
The char() conversion operator simply returns a copy of the character stored in the string. When you assign to the wrapper, though, the operator=(char) will force the string to perform an internal copy if the reference count is >1, and modify that copy instead.
The wrapper's implementation may, for example, hold the char and a pointer to the string (probably some subpart of the string's implementation).