I am reading about STL string class. It is mentioned as below
STL string class chooses not to define conversion operators, but rather use the c_str() and data() methods for directly accessing the memory. The STL purposely does not include implicit conversion operators to prevent misuse of raw string pointers.
My question is
c_str() returns const char* pointer and still user can modify string value. Am I right?
What does the author mean by "to prevent misuse of raw string pointers"? Please explain, preferably with an example.
Thanks!
No, you cannot use the return value of std::string::c_str() to
modify the string. Trying to do so is undefined behavior. And
the problem was (and still is) the lifetime of the pointer
returned by std::string::c_str(). It becomes invalid if the
string is destructed, or if any non-const function is called on
the string. The issues are things like:
char const* s = string1 + string2;
// s is invalid here.
vs.
char const* s = (string1 + string2).c_str();
// s is invalid here.
In the first case, it's easy to make the mistake, without
realizing it, so the committee decided to not have implicit
conversion, so that this would be illegal. In the second case,
you have to really want to.
Related
I have these two lines in my code:
RFM2G_STATUS result;
result = RFM2gOpen( "\\\\.\\rfm2g1", &rH );
I get the error message:
"warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
result = RFM2gOpen( "\\\\.\\rfm2g1", &rH );"
Actually I can not modify it to
const RFM2G_STATUS result;
because RFM2G_STATUS is pre-defined in another file and does not accept const before. Is there another way to disable this warning message?
Like the message says, conversion from const char* to char* (which C++ inherited from ancient C language which didn't have const) has been deprecated.
To avoid this, you can store the parameter in a non-const string, and pass that to the function:
char parameter[] = "\\\\.\\rfm2g1";
RFM2G_STATUS result;
result = RFM2gOpen( parameter, &rH );
That way you avoid the ugly casts.
You seem to be absolutely sure that RFM2gOpen does not modify the input string, otherwise you would have undefined behavior in your code as it stands now.
If you are sure that the input data will not be written to, you can const_cast the constness away safely:
result = RFM2gOpen(const_cast<char*>("\\\\.\\rfm2g1"), &rH );
Again, this is only safe if the routine does not write to the input string, ever, otherwise this is undefined behavior!
If you are not completely sure that this method will never write to the character array, copy the string to an std::vector<char> and pass the .data() pointer to the function (or use a simple char array as Bo Persson suggests, that would most likely be more efficient/appropriate than the vector).
One possible fix is:
RFM2gOpen(const_cast<char*>("\\.\rfm2g1"), &rH);
This may cause a runtime fault if RFM2gOpen tries to modify the string.
The following is less likely to cause a memory fault, but it still undefined behavior:
std::string s("\\.\rfm2g1");
RFM2gOpen(const_cast<char*>(s.c_str()), &rH);
To be fully conformant you need to copy the "\.\rfm2g1" to a mutable buffer. Something like:
char *s = alloca(strlen("\\.\rfm2g1")+1);
strcpy(s, "\\.\rfm2g1");
RFM2gOpen(s, &rH);
The real fix, of course, is for RFM2gOpen to be updated to take a const char*.
It would seem that the function RFM2gOpen() expects a non-const char* as first parameter (see here), as it can sometimes happen with legacy API's (or API's written by lazy coders), and string litterals are of type const char*
so a deprecated implicit conversion is happening (getting rid of the const qualifier).
If you're *100% sure that the function won't modify the pointed-to memory, then and only then can you just put an explicit conversion, e.g. const_cast<char*>("\\\\.\\rfm2g1") or (C-style) (const char*)"\\\\.\\rfm2g1"
This is not my program so don't start berating me :-). Some random program I got. A globally declared buffer is being returned by MyFunc(). I use VS2008 and it does not complain
static char buffer[1024];
std::string MyFunc() {
....
....
return buffer;
}
However when I add this line of code
char * ret;
ret = MyFunc()
It complains: "error: no suitable conversion function from "std::string" to "char *" exists"
My question is why is the compiler complaining now? Why this inconstancy in syntax checking? Again I dont have the freedom to change MyFunc(). In my program if I can make
std::string ret;
ret = MyFunc();
and get rid of the syntax error but would really like to understand this strange behavior.
string() has a constructor that accepts a char*, so you get an automatic conversion. There is no automatic conversion from a string to a char*. You have to call string::c_str() to get the char*.
Edit
Although you asked only for an explanation of the behavior, others in this forum seem to think I have short-changed you by not mentioning that string::c_str returns a const char*, not a simple char*. But the explanation remains: there is no implicit/automatic conversion from string to char* or const char*. Feel free to read about c_str here if it's important to you.
It is not the syntax, it is the structure of the std::string that makes the compiler behave differently.
When you are returning a char* from a function returning std::string, the compiler notices that there is a constructor of std::string that takes char*, calls that constructor, and quietly returns the result.
When you are trying to return a std::string from a char* - returning function, the compiler tries to see if there is a conversion operator to make char* from a std::string, finds that there is no such operator, and reports an error.
If you want to convert a string to char*, you need to make a copy of the string's buffer, like this:
char* ret_ch = new char[ret.size()+1];
memcpy(ret_ch, ret.c_str(), ret.size()+1);
return ret_ch;
You could think that it is OK to return c_str() by itself, but it is not a good idea: the buffer that "backs up" this C string belongs to std::string object, so once the string gets deallocated, accessing the buffer starts producing undefined behavior. That is why you need to make an explicit copy when you access the buffer of a string. Of course you are also responsible for calling delete[] on the copied result.
std::string is designed as implicitly constructable from char const* because this supports using string literals and typical C style code strings as initializer values.
If this was not supported then one would just have to use some intermediate function, which would add nothing but verbosity and inefficiency.
In the other direction, however, std::string is intentionally designed to not convert implicitly to char const*. Part of the rationale is probably that with std::string being logically mutable, the returned raw pointer is only valid as long as no operations are performed that might cause a buffer replacement or string destruction. For example,
char const* s = foo().c_str();
where foo produces a std::string, makes s point to a buffer that no longer exists, a dangling pointer that is invalid.
The c_str() member function call makes the conversion stand out.
Consider how more common that problem could be if one could write just
char const* s = foo();
and have that compile.
Regarding that strike-through (deleted) text, I realized that it's completely irrelevant whether the string is logically mutable or immutable. Sorry. Need more coffee!
In C++, the use of operator cast can lead to confusion to readers of your code due to it not being obvious that a function call is being invoked. That being said, I've seen its use being discouraged.
However, under what circumstances would using operator cast be appropriate and have value which exceeds any possible confusion it might lead to?
When the conversion is natural and has no side effects it can be useful. Nobody is going to argue that an automatic conversion from int to double is inappropriate for example, even if you can come up with a corner case that makes it confusing (and I'm not sure anybody can).
I've found the conversion from Microsoft's CString to const char * to be incredibly handy, even though I know others disagree. I wouldn't mind seeing a similar capability in std::string.
Operator casts are very useful in a C++ idiom of wrapper objects. For example, suppose you have some copy-on-write implementation of a string class. You want your users to be able to index it naturally, like
const String s = "abc";
assert(s[0] == 'a');
// given
char String::operator[](int) const
So far, you'd think this would work. Yet what happens when someone wants to modify your string? Perhaps this will work, then?
String s = "abc";
s[0] = 'z';
assert(s[0] == 'z');
// given
char & String::operator[](int)
But this implementation gives a reference to a non-const character. So someone can always use that reference to modify the string. So, before it hands out the reference, it has to perform a copy of the string internally, so that other strings won't be modified. Thus it's not possible to use operator[] on non-const strings without forcing a copy. What to do?
Instead of returning a character reference, you can return a wrapper object with following interface:
class CharRef {
public:
operator char() const;
CharRef & operator=(char);
};
The char() conversion operator simply returns a copy of the character stored in the string. When you assign to the wrapper, though, the operator=(char) will force the string to perform an internal copy if the reference count is >1, and modify that copy instead.
The wrapper's implementation may, for example, hold the char and a pointer to the string (probably some subpart of the string's implementation).
It maybe seems to be a silly question but i really need to clarify this:
Will this bring any danger to my program?
Is the const_cast even needed?
If i change the input pointers values in place will it work safely with std::string or will it create undefined behaviour?
So far the only concern is that this could affect the string "some_text" whenever I modify the input pointer and makes it unusable.
std::string some_text = "Text with some input";
char * input = const_cast<char*>(some_text.c_str());
Thanks for giving me some hints, i would like to avoid the shoot in my own foot
As an example of evil behavior: the interaction with gcc's Copy On Write implementation.
#include <string>
#include <iostream>
int main() {
std::string const original = "Hello, World!";
std::string copy = original;
char* c = const_cast<char*>(copy.c_str());
c[0] = 'J';
std::cout << original << "\n";
}
In action at ideone.
Jello, World!
The issue ? As the name implies, gcc's implementation of std::string uses a ref-counted shared buffer under the cover. When a string is modified, the implementation will neatly check if the buffer is shared at the moment, and if it is, copy it before modifying it, ensuring that other strings sharing this buffer are not affected by the new write (thus the name, copy on write).
Now, with your evil program, you access the shared buffer via a const-method (promising not to modify anything), but you do modify it!
Note that with MSVC's implementation, which does not use Copy On Write, the behavior would be different ("Hello, World!" would be correctly printed).
This is exactly the essence of Undefined Behavior.
To modify an inherently const object by casting away its constness using const_cast is an Undefined Behavior.
string::c_str() returns a const char *, i.e: a pointer to a constant c-style string. Technically, modifying this will result in Undefined Behavior.
Note, that the use of const_cast is when you have a const pointer to a non const data and you wish to modify the non-constant data.
Simply casting will not bring forth an undefined behavior. Modifying the data pointed at, however, will. (Also see ISO 14882:98 5.2.7-7).
If you want a pointer to modifiable data, you can have a
std::vector<char> wtf(str.begin(), str.end());
char* lol= &wtf[0];
The std::string manages it's own memory internally, which is why it returns a pointer to that memory directly as it does with the c_str() function. It makes sure it's constant so that your compiler will warn you if you try to do modifiy it.
Using const_cast in that way literally casts away such safety and is only an arguably acceptable practice if you are absolutely sure that memory will not be modified.
If you can't guarantee this then you must copy the string and use the copy.; it's certainly a lot safer to do this in any event (you can use strcpy).
See the C++ reference website:
const char* c_str ( ) const;
"Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object."
Yes, it will bring danger, because
input points to whatever c_str happens to be right now, but if some_text ever changes or goes away, you'll be left with a pointer that points to garbage. The value of c_str is guaranteed to be valid only as long as the string doesn't change. And even, formally, only if you don't call c_str() on other strings too.
Why do you need to cast away the const? You're not planning on writing to *input, are you? That is a no-no!
This is a very bad thing to do. Check out what std::string::c_str() does and agree with me.
Second, consider why you want a non-const access to the internals of the std::string. Apparently you want to modify the contents, because otherwise you would use a const char pointer. Also you are concerned that you don't want to change the original string. Why not write
std::string input( some_text );
Then you have a std::string that you can mess with without affecting the original, and you have std::string functionality instead of having to work with a raw C++ pointer...
Another spin on this is that it makes code extremely difficult to maintain. Case in point: a few years ago I had to refactor some code containing long functions. The author had written the function signatures to accept const parameters but then was const_casting them within the function to remove the constness. This broke the implied guarantee given by the function and made it very difficult to know whether the parameter has changed or not within the rest of the body of the code.
In short, if you have control over the string and you think you'll need to change it, make it non-const in the first place. If you don't then you'll have to take a copy and work with that.
it is UB.
For example, you can do something like this this:
size_t const size = (sizeof(int) == 4 ? 1024 : 2048);
int arr[size];
without any cast and the comiler will not report an error. But this code is illegal.
The morale is that you need consider action each time.
Hello I have a pump class that requires using a member variable that is a pointer to a wchar_t array containing the port address ie: "com9".
The problem is that when I initialise this variable in the constructor my compiler flags up a depreciated conversion warning.
pump::pump(){
this->portNumber = L"com9";}
This works fine but the warning every time I compile is anoying and makes me feel like I'm doing something wrong.
I tried creating an array and then setting the member variable like this:
pump::pump(){
wchar_t port[] = L"com9";
this->portNumber = port;}
But for some reason this makes my portNumber point at 'F'.
Clearly another conceptual problem on my part.
Thanks for help with my noobish questions.
EDIT:
As request the definition of portNumber was:
class pump
{
private:
wchar_t* portNumber;
}
Thanks to answers it has now been changed to:
class pump
{
private:
const wchar_t* portNumber;
}
If portNumber is a wchar_t*, it should be a const wchar_t*.
String literals are immutable, so the elements are const. There exists a deprecated conversion from string literal to non-const pointer, but that's dangerous. Make the change so you're keeping type safety and not using the unsafe conversion.
The second one fails because you point to the contents of a local variable. When the constructor finishes, the variable goes away and you're pointing at an invalid location. Using it results in undefined behavior.
Lastly, use an initialization list:
pump::pump() :
portNumber(L"com9")
{}
The initialization list is to initialize, the constructor is to finish construction. (Also, this-> is ugly to almost all C++ people; it's not nice and redundant.)
Use const wchar_t* to point at a literal.
The reason the conversion exists is because it has been valid from early versions of C to assign a string literal to a non-const pointer[*]. The reason it's deprecated is that it's invalid to modify a literal, and it's risky to use a non-const pointer to refer to something that must not be modified.
[*] C didn't originally have const. When const was added, clearly it should apply to string literals, but there was already code out there, written before const existed, that would break if suddenly you had to sprinkle const everywhere. We're still paying today for that breaking change to the language. Since it's C++ you're using, it wasn't even a breaking change to this language.
Apparently, portNumber is a wchar_t * (non-const), correct? If so:
the first one is wrong, because string literals are read-only (they are const pointers to an array of char usually stored in the string table of the executable, which is mapped in memory somewhere, often in a readonly page).
The ugly, implicit conversion to non-const chars/wchar_ts was approved, IIRC, to achieve compatibility with old code written when const didn't even existed; sadly, it let a lot of morons which do not know what const correctness means get away with writing code that asks non-const pointers even when const pointers would be the right choice.
The second one is wrong because you're making portNumber point to a variable allocated on the stack, which is deleted when the constructor returns. After the constructor returns, the pointer stored in portNumber points to random garbage.
The correct approach is to declare portNumber as const wchar_t * if it doesn't need to be modified. If, instead, it does need to be modified during the lifetime of the class, usually the best approach is to avoid C-style strings at all and just throw in a std::wstring, that will take care of all the bookkeeping associated with the string.