C++ const cast, unsure if this is secure - c++

It maybe seems to be a silly question but i really need to clarify this:
Will this bring any danger to my program?
Is the const_cast even needed?
If i change the input pointers values in place will it work safely with std::string or will it create undefined behaviour?
So far the only concern is that this could affect the string "some_text" whenever I modify the input pointer and makes it unusable.
std::string some_text = "Text with some input";
char * input = const_cast<char*>(some_text.c_str());
Thanks for giving me some hints, i would like to avoid the shoot in my own foot

As an example of evil behavior: the interaction with gcc's Copy On Write implementation.
#include <string>
#include <iostream>
int main() {
std::string const original = "Hello, World!";
std::string copy = original;
char* c = const_cast<char*>(copy.c_str());
c[0] = 'J';
std::cout << original << "\n";
}
In action at ideone.
Jello, World!
The issue ? As the name implies, gcc's implementation of std::string uses a ref-counted shared buffer under the cover. When a string is modified, the implementation will neatly check if the buffer is shared at the moment, and if it is, copy it before modifying it, ensuring that other strings sharing this buffer are not affected by the new write (thus the name, copy on write).
Now, with your evil program, you access the shared buffer via a const-method (promising not to modify anything), but you do modify it!
Note that with MSVC's implementation, which does not use Copy On Write, the behavior would be different ("Hello, World!" would be correctly printed).
This is exactly the essence of Undefined Behavior.

To modify an inherently const object by casting away its constness using const_cast is an Undefined Behavior.
string::c_str() returns a const char *, i.e: a pointer to a constant c-style string. Technically, modifying this will result in Undefined Behavior.
Note, that the use of const_cast is when you have a const pointer to a non const data and you wish to modify the non-constant data.

Simply casting will not bring forth an undefined behavior. Modifying the data pointed at, however, will. (Also see ISO 14882:98 5.2.7-7).
If you want a pointer to modifiable data, you can have a
std::vector<char> wtf(str.begin(), str.end());
char* lol= &wtf[0];

The std::string manages it's own memory internally, which is why it returns a pointer to that memory directly as it does with the c_str() function. It makes sure it's constant so that your compiler will warn you if you try to do modifiy it.
Using const_cast in that way literally casts away such safety and is only an arguably acceptable practice if you are absolutely sure that memory will not be modified.
If you can't guarantee this then you must copy the string and use the copy.; it's certainly a lot safer to do this in any event (you can use strcpy).

See the C++ reference website:
const char* c_str ( ) const;
"Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object."

Yes, it will bring danger, because
input points to whatever c_str happens to be right now, but if some_text ever changes or goes away, you'll be left with a pointer that points to garbage. The value of c_str is guaranteed to be valid only as long as the string doesn't change. And even, formally, only if you don't call c_str() on other strings too.
Why do you need to cast away the const? You're not planning on writing to *input, are you? That is a no-no!

This is a very bad thing to do. Check out what std::string::c_str() does and agree with me.
Second, consider why you want a non-const access to the internals of the std::string. Apparently you want to modify the contents, because otherwise you would use a const char pointer. Also you are concerned that you don't want to change the original string. Why not write
std::string input( some_text );
Then you have a std::string that you can mess with without affecting the original, and you have std::string functionality instead of having to work with a raw C++ pointer...

Another spin on this is that it makes code extremely difficult to maintain. Case in point: a few years ago I had to refactor some code containing long functions. The author had written the function signatures to accept const parameters but then was const_casting them within the function to remove the constness. This broke the implied guarantee given by the function and made it very difficult to know whether the parameter has changed or not within the rest of the body of the code.
In short, if you have control over the string and you think you'll need to change it, make it non-const in the first place. If you don't then you'll have to take a copy and work with that.

it is UB.
For example, you can do something like this this:
size_t const size = (sizeof(int) == 4 ? 1024 : 2048);
int arr[size];
without any cast and the comiler will not report an error. But this code is illegal.
The morale is that you need consider action each time.

Related

c_str() vs. data() when it comes to return type

After C++11, I thought of c_str() and data() equivalently.
C++17 introduces an overload for the latter, that returns a non-constant pointer (reference, which I am not sure if it's updated completely w.r.t. C++17):
const CharT* data() const; (1)
CharT* data(); (2) (since C++17)
c_str() does only return a constant pointer:
const CharT* c_str() const;
Why the differentiation of these two methods in C++17, especially when C++11 was the one that made them homogeneous? In other words, why only the one method got an overload, while the other didn't?
The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:
Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.
The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.
I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.
One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.
The reason why the data() member got an overload is explained in this paper at open-std.org.
TL;DR of the paper: The non-const .data() member function for std::string was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters.
Some relevant passages from the paper:
Abstract
Is std::string's lack of a non-const .data() member function an oversight or an intentional design based on pre-C++11 std::string semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const .data() member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code.
Use Cases
C libraries occasionally include routines that have char * parameters. One example is the lpCommandLine parameter of the CreateProcess function in the Windows API. Because the data() member of std::string is const, it cannot be used to make std::string objects work with the lpCommandLine parameter. Developers are tempted to use .front() instead, as in the following example.
std::string programName;
// ...
if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
Note that when programName is empty, the programName.front() expression causes undefined behavior. A temporary empty C-string fixes the bug.
std::string programName;
// ...
if( !programName.empty() ) {
char emptyString[] = {'\0'};
if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
}
If there were a non-const .data() member, as there is with std::vector, the correct code would be straightforward.
std::string programName;
// ...
if( !programName.empty() ) {
char emptyString[] = {'\0'};
if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
}
A non-const .data() std::string member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers.
It just depends on the semantics of "what you want to do with it". Generally speaking, std::string is sometimes used as a buffer vector, i.e., as a replacement to std::vector<char>. This can be seen in boost::asio often. In other words, it's an array of characters.
c_str(): strictly means that you're looking for a null-terminated string. In that sense, you should never modify the data and you should never need the string as a non-const.
data(): you may need the information inside the string as buffer data, and even as non-const. You may or may not need to modify the data, which you can do, as long as it doesn't involve changing the length of the string.
The two member functions c_str and data of std::string exist due to the history of the std::string class.
Until C++11, a std::string could have been implemented as copy-on-write. The internal representation did not need any null termination of the stored string. The member function c_str made sure the returned string was null terminated. The member function data simlpy returned a pointer to the stored string, that was not necessarily null terminated. - To be sure that changes to the string were noticed to enable copy-on-write, both functions needed to return a pointer to const data.
This all changed with C++11 when copy-on-write was no longer allowed for std::string. Since c_str was still required to deliver a null terminated string, the null is always appended to the actual stored string. Otherwise a call to c_str may need to change the stored data to make the string null terminated which would make c_str a non-const function. Since data delivers a pointer to the stored string, it usually has the same implementation as c_str. Both functions still exists due to backward compatibility.

"Use & to create a pointer to a member" when using c_str

I'm trying to add a char* to a vector of char*'s by casting it over from a string. Here's the code I'm using:
vector<char*> actionLog;
// lots of code
int value = ...
// lots of code
string str = "string";
cout << str << value << endl;
str += std::to_string(player->scrap);
actionLog.push_back(str.c_str());
The problem is that I get the specified "Use & to create a pointer to a member" error for the push_back line. str.c_str should return a char*, which is the type that actionLog uses. I'm either incorrect about how c_str works, or doing something else wrong. Pushing to actionLog with
actionLog.push_back("something");
Works fine, but not what I mentioned. What am I doing wrong?
EDIT: I was actually using c_str() as a function, I just copied it incorrectly
There are actually several problems with what you are trying to do.
Firstly, c_str is a member function. You would have to call it, using (): str.c_str().
c_str() returns a const char*, so you won't be able to store it in a vector<char*>. This is so you can't break the std::string by changing it's internals in ways it doesn't expect.
You really should not store the result of c_str(). It only remains valid until you do some non-const operation on the std::string it came from. I.e. if you make a change to the content of the std::string, then try to use the corresponding element in the vector, you have Undefined Behaviour! And from the way you have laid out your example, it looks like the lifetime of the string will be much shorter than the lifetime of the vector, so the vector would point to something that doesn't even exist any more.
Maybe it's better to just use a std::vector<std::string>. If you don't need the original string again after this, you could even std::move it into the vector, and avoid extra copying.
As an aside, please reconsider your use of what are often considered bad practices: using namespace std; and endl (those are links to explanations). The latter is a bit contentious, but at least understand why and make an informed decision.
std::basic_string::c_str() is a member function, not a data member - you need to invoke it by using ().
The correct code is:
actionLog.push_back(str.c_str());
Note that std::basic_string::c_str() returns a pointer to const char - your actionLog vectors should be of type std::vector<const char*>.
Vittorio's answer tells you what you did wrong in the details. But I would argue that what you're doing wrong is really using a vector<char*> instead of a vector<string> in the first place.
With the vector of pointers, you have to worry about lifetime of the underlying strings, about them getting invalidated, changed from under you, and so on. The name actionLog suggests that the thing is long-lived, and the code you use to add to it suggests that str is a local helper variable used to build the log string, and nothing else. The moment str goes out of scope, the vector contains a dangling pointer.
Change the vector to a vector<string>, do a actionLog.push_back(str), and don't worry about lifetimes or invalidation.
You forgot (), c_str is a method, not a data member. Just write actionLog.push_back(str.c_str());

returning const char* to char* and then changing the data

I am confused about the following code:
string _str = "SDFDFSD";
char* pStr = (char*)_str.data();
for (int i = 0; i < iSize; i++)
pStr[i] = ::tolower(pStr[i]);
here _str.data() returns const char*. But we are assigning it to a char*. My questions is,
_str.data()is returning pointer to a constant data. How is it possible to store it in a pointer to data? The data was constant right? If we assign it to char pointer than we can change it like we are doing inside the for statement which should not be possible for a constant data.
Don't do that. It may be fine in this case, but as the documentation for data() says:
The pointer returned may be invalidated by further calls to other
member functions that modify the object.
A program shall not alter any of the characters in this sequence.
So you could very accidentally write to invalid memory if you keep that pointer around. Or, in fact, ruin the implementation of std::string. I would almost go as far as to say that this function shouldn't be exposed.
std::string offers a non-const operator[] for that purpose.
string _str = "SDFDFSD";
for (int i = 0; i < iSize; i++)
_str[i] = ::tolower(_str[i]);
What you are doing is not valid at the standard library level (you're violating std::string contract) but valid at the C++ core language level.
The char * returned from data should not be written to because for example it could be in theory(*) shared between different strings with the same value.
If you want to modify a string just use std::string::operator[] that will inform the object of the intention and will take care of creating a private buffer for the specific instance in case the string was originally shared instead.
Technically you are allowed to cast-away const-ness from a pointer or a reference, but if it's a valid operation or not depends on the semantic of the specific case. The reason for which the operation is allowed is that the main philosophy of C++ is that programmers make no mistakes and know what they are doing. For example is technically legal from a C++ language point of view to do memcpy(&x, "hello", 5) where x is a class instance, but the results are most probably "undefined behavior".
If you think that your code "works" it's because you've the wrong understanding of what "works" really should mean (hint: "works" doesn't mean that someone once observed the code doing what seemed reasonable, but that will work in all cases). A valid C++ implementation is free to do anything it wants if you run that program: that you observed something you think is fine doesn't really mean anything, may be you didn't look close enough, or may be you were just lucky (unfortunate, actually) that no crash happened right away.
(*) In modern times the COW (copy-on-write) implementations of std::string are low in popularity because they pose a lot of problems (e.g. with multithreading) and memory is a lot cheaper now. Still std::string contract says you're not allowed to change the memory pointed by the return value of data(); if you do anything may happen.
You never must change the data returned from std::string::data() or std::string::c_str() directly.
To create a copy of a std::string:
std::string str1 = "test";
std::string str2 = str1; // copy.
Change characters in a string:
std::string str1 = "test"
str1[0] = 'T';
The "correct" way would be to use std::transform instead:
std::transform(_str.begin(), _str.end(), _str.begin(), ::tolower);
The simple answer to your question is that in C++ you can cast away the 'const' of a variable.
You probably shouldn't though.
See this for const correctness in C++
String always allocates memory on heap, so this is not actually const data, it is just marked so (in method data() signature) to prevent modification.
But nothing is impossible in C++, so with a simple cast, though unsafe, you can now treat the same memory space as modifiable.
All constants in a C/C++ program (like "SDFDFSD" below)will be stored in a separate section .rodata. This section is mapped as read-only when the binary is loaded into memory during execution.
int main()
{
char* ptr = "SDFDFSD";
ptr[0]='x'; //segmentation fault!!
return 0;
}
Hence any attempt to modify the data at that location will result in a run-time error i.e. a segmentation fault.
Coming to the above question, when creating a string and assigning a string to it, a new copy in memory now exists (memory used to hold the properties of the string object _str). This is on the heap and NOT mapped to a read-only section. The member function _str.data() points to the location in memory which is mapped read/write.
The const qualifier to the return type ensure that this function is NOT accidentally passed to string manipulation functions which expect a non-const char* pointer.
In your current iteration there was no limitation on the memory location itself that was holding the string object's data; i.e. it was mapped with both read/write permissions. Hence modifying the location using another non-const pointer worked i.e. pStr[i] on the left hand side of an assignment did NOT result in a run-time error as there were no inherent restrictions on the memory location itself.
Again this is NOT guaranteed to work and just a implementation specific behaviour that you have observed (i.e. it simply happens to work for you) and cannot always depend on this.

no conversion operator for string class

I am reading about STL string class. It is mentioned as below
STL string class chooses not to define conversion operators, but rather use the c_str() and data() methods for directly accessing the memory. The STL purposely does not include implicit conversion operators to prevent misuse of raw string pointers.
My question is
c_str() returns const char* pointer and still user can modify string value. Am I right?
What does the author mean by "to prevent misuse of raw string pointers"? Please explain, preferably with an example.
Thanks!
No, you cannot use the return value of std::string::c_str() to
modify the string. Trying to do so is undefined behavior. And
the problem was (and still is) the lifetime of the pointer
returned by std::string::c_str(). It becomes invalid if the
string is destructed, or if any non-const function is called on
the string. The issues are things like:
char const* s = string1 + string2;
// s is invalid here.
vs.
char const* s = (string1 + string2).c_str();
// s is invalid here.
In the first case, it's easy to make the mistake, without
realizing it, so the committee decided to not have implicit
conversion, so that this would be illegal. In the second case,
you have to really want to.

Deprecated conversion from string const. to wchar_t*

Hello I have a pump class that requires using a member variable that is a pointer to a wchar_t array containing the port address ie: "com9".
The problem is that when I initialise this variable in the constructor my compiler flags up a depreciated conversion warning.
pump::pump(){
this->portNumber = L"com9";}
This works fine but the warning every time I compile is anoying and makes me feel like I'm doing something wrong.
I tried creating an array and then setting the member variable like this:
pump::pump(){
wchar_t port[] = L"com9";
this->portNumber = port;}
But for some reason this makes my portNumber point at 'F'.
Clearly another conceptual problem on my part.
Thanks for help with my noobish questions.
EDIT:
As request the definition of portNumber was:
class pump
{
private:
wchar_t* portNumber;
}
Thanks to answers it has now been changed to:
class pump
{
private:
const wchar_t* portNumber;
}
If portNumber is a wchar_t*, it should be a const wchar_t*.
String literals are immutable, so the elements are const. There exists a deprecated conversion from string literal to non-const pointer, but that's dangerous. Make the change so you're keeping type safety and not using the unsafe conversion.
The second one fails because you point to the contents of a local variable. When the constructor finishes, the variable goes away and you're pointing at an invalid location. Using it results in undefined behavior.
Lastly, use an initialization list:
pump::pump() :
portNumber(L"com9")
{}
The initialization list is to initialize, the constructor is to finish construction. (Also, this-> is ugly to almost all C++ people; it's not nice and redundant.)
Use const wchar_t* to point at a literal.
The reason the conversion exists is because it has been valid from early versions of C to assign a string literal to a non-const pointer[*]. The reason it's deprecated is that it's invalid to modify a literal, and it's risky to use a non-const pointer to refer to something that must not be modified.
[*] C didn't originally have const. When const was added, clearly it should apply to string literals, but there was already code out there, written before const existed, that would break if suddenly you had to sprinkle const everywhere. We're still paying today for that breaking change to the language. Since it's C++ you're using, it wasn't even a breaking change to this language.
Apparently, portNumber is a wchar_t * (non-const), correct? If so:
the first one is wrong, because string literals are read-only (they are const pointers to an array of char usually stored in the string table of the executable, which is mapped in memory somewhere, often in a readonly page).
The ugly, implicit conversion to non-const chars/wchar_ts was approved, IIRC, to achieve compatibility with old code written when const didn't even existed; sadly, it let a lot of morons which do not know what const correctness means get away with writing code that asks non-const pointers even when const pointers would be the right choice.
The second one is wrong because you're making portNumber point to a variable allocated on the stack, which is deleted when the constructor returns. After the constructor returns, the pointer stored in portNumber points to random garbage.
The correct approach is to declare portNumber as const wchar_t * if it doesn't need to be modified. If, instead, it does need to be modified during the lifetime of the class, usually the best approach is to avoid C-style strings at all and just throw in a std::wstring, that will take care of all the bookkeeping associated with the string.