How to make std::string compatible with char*? - c++

I usually find some functions with char* as its parameter.But I heard that std::string is more recommended in C++. How can I use a std::string object with functions taking char*s as parameters? Till now I have known the c_str(), but it doesn't work when the content of string should be modified.

For that purpose, you use std::string::data(). Returns a pointer to the internal data. Be careful not to free this memory or anything like that as this memory is managed by the string object.

You can use the address of the first element after C++11 like this:
void some_c_function(char* s, int n);
// ...
std::string s = "some text";
some_c_function(&s[0], s.size());
Before C++11 there was no guarantee that the internal string was stored in a contiguous buffer or that it would be null terminated. In those cases making a copy of the string was the only safe option.
After C++17 (the current standard) you can use this:
some_c_function(s.data(), s.size());
In C++17 a non-const return value of std::string::data() was added in addition to the const version.

Since C++17 std::string::data() returns a pointer to the underlying char array that does allow to modify the contents of the string. However, as usual with strings, you may not write beyond its end. From cppreference:
Modifying the past-the-end null terminator stored at data()+size() to any value other than CharT() has undefined behavior.

Related

Why does string::c_str() return a const char* when strings are allocated dynamically?

Why does it return a constant char pointer? The C++11 standard says:
The pointer returned points to the internal array currently used by the string object to store the characters that conform its value.
How can something that is dynamically allocated be constant(const char*)?
In C and C++, const translates more or less to "read only".
So, when something returns a char const *, that doesn't necessarily mean the data it's pointing at is actually const--it just means that the pointer you're receiving only supports reading, not writing, the data it points at.
The string object itself may be able to modify that data--but (at least via the pointer you're receiving) you're not allowed to modify the data directly.
The pointer returned by c_str is declared to point to a const char to prevent modifying the internal string buffer via that pointer.
The string buffer is indeed dynamically allocated, and the pointer returned by c_str is only valid while the string itself does not change. Quoting from cppreference.com:
The pointer obtained from c_str() may be invalidated by:
Passing a non-const reference to the string to any standard library function, or
Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().
I'm going to interpret the question as why can't you cast it back to a char * and write to it and expect it to work.
The standard library reserves the option to itself to lazy-copy strings in the copy constructor; thus if you wrote to it via the result of c_str() you would potentially write to other strings. As most uses of c_str() would not need to write to the string, dedupling on the call to c_str() would impose too large a penalty.

c_str() vs. data() when it comes to return type

After C++11, I thought of c_str() and data() equivalently.
C++17 introduces an overload for the latter, that returns a non-constant pointer (reference, which I am not sure if it's updated completely w.r.t. C++17):
const CharT* data() const; (1)
CharT* data(); (2) (since C++17)
c_str() does only return a constant pointer:
const CharT* c_str() const;
Why the differentiation of these two methods in C++17, especially when C++11 was the one that made them homogeneous? In other words, why only the one method got an overload, while the other didn't?
The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:
Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.
The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.
I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.
One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.
The reason why the data() member got an overload is explained in this paper at open-std.org.
TL;DR of the paper: The non-const .data() member function for std::string was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters.
Some relevant passages from the paper:
Abstract
Is std::string's lack of a non-const .data() member function an oversight or an intentional design based on pre-C++11 std::string semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const .data() member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code.
Use Cases
C libraries occasionally include routines that have char * parameters. One example is the lpCommandLine parameter of the CreateProcess function in the Windows API. Because the data() member of std::string is const, it cannot be used to make std::string objects work with the lpCommandLine parameter. Developers are tempted to use .front() instead, as in the following example.
std::string programName;
// ...
if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
Note that when programName is empty, the programName.front() expression causes undefined behavior. A temporary empty C-string fixes the bug.
std::string programName;
// ...
if( !programName.empty() ) {
char emptyString[] = {'\0'};
if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
}
If there were a non-const .data() member, as there is with std::vector, the correct code would be straightforward.
std::string programName;
// ...
if( !programName.empty() ) {
char emptyString[] = {'\0'};
if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) {
// etc.
} else {
// handle error
}
}
A non-const .data() std::string member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers.
It just depends on the semantics of "what you want to do with it". Generally speaking, std::string is sometimes used as a buffer vector, i.e., as a replacement to std::vector<char>. This can be seen in boost::asio often. In other words, it's an array of characters.
c_str(): strictly means that you're looking for a null-terminated string. In that sense, you should never modify the data and you should never need the string as a non-const.
data(): you may need the information inside the string as buffer data, and even as non-const. You may or may not need to modify the data, which you can do, as long as it doesn't involve changing the length of the string.
The two member functions c_str and data of std::string exist due to the history of the std::string class.
Until C++11, a std::string could have been implemented as copy-on-write. The internal representation did not need any null termination of the stored string. The member function c_str made sure the returned string was null terminated. The member function data simlpy returned a pointer to the stored string, that was not necessarily null terminated. - To be sure that changes to the string were noticed to enable copy-on-write, both functions needed to return a pointer to const data.
This all changed with C++11 when copy-on-write was no longer allowed for std::string. Since c_str was still required to deliver a null terminated string, the null is always appended to the actual stored string. Otherwise a call to c_str may need to change the stored data to make the string null terminated which would make c_str a non-const function. Since data delivers a pointer to the stored string, it usually has the same implementation as c_str. Both functions still exists due to backward compatibility.

In C++11, can the characters in the array pointed to by string::c_str() be altered?

std::string::c_str() returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object.
In C++98 it was required that "a program shall not alter any of the characters in this sequence". This was encouraged by returning a const char* .
IN C++11, the "pointer returned points to the internal array currently used by the string object to store the characters that conform its value", and I believe the requirement not to modify its contents has been dropped. Is this true?
Is this code OK in C++11?
#include<iostream>
#include<string>
#include<vector>
using namespace std;
std::vector<char> buf;
void some_func(char* s)
{
s[0] = 'X'; //function modifies s[0]
cout<<s<<endl;
}
int main()
{
string myStr = "hello";
buf.assign(myStr.begin(),myStr.end());
buf.push_back('\0');
char* d = buf.data(); //C++11
//char* d = (&buf[0]); //Above line for C++98
some_func(d); //OK in C++98
some_func(const_cast<char*>(myStr.c_str())); //OK in C++11 ?
//some_func(myStr.c_str()); //Does not compile in C++98 or C++11
cout << myStr << endl; //myStr has been modified
return 0;
}
3 Requires: The program shall not alter any of the values stored in the character array.
That requirement is still present as of draft n3337 (The working draft most similar to the published C++11 standard is N3337)
In C++11, yes the restriction for c_str() is still in effect. (Note that the return type is const, so no particular restriction is actually required for this function. The const_cast in your program is a big red flag.)
But as for operator[], it appears to be effect only due to an editorial error. Due to a punctuation change slated for C++14, you may modify it. So the interpretation is sort of up to you. Of course doing this is so common that no library implementation would dare break it.
C++11 phrasing:
Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value
charT(); the referenced value shall not be modified.
C++14 phrasing:
Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.
You can pass c_str() as a read-only reference to a function expecting a C string, exactly as its signature suggests. A function expecting a read-write reference generally expects a given buffer size, and to be able to resize the string by writing a NUL within that buffer, which std::string implementations don't in fact support. If you want to do that, you need to resize the string to include your own NUL terminator, then pass & s[0] which is a read-write reference, then resize it again to remove your NUL terminator and hand the responsibility of termination back to the library.
I'd say that if c_str() returns a const char * then its not ok, even if it can be argued to be a gray area by a language lawyer.
The way I see it is simple. The signature of the method states that the pointer it returns should not be used to modify anything.
In addition, as other commenters have pointed out, there are other ways to do the same thing that do not violate any contracts. So it's definitely not ok to do so.
That said, Borgleader has found that the language still says it isn't.
I have verified that this is in the published C++11 standard
Thank you
what's wrong with &myStr.front()?
string myStr = "hello";
char* p1 = const_cast<char*>(myStr.c_str());
char* p2 = &myStr.front();
p1[0] = 'Y';
p2[1] = 'Z';
It seems that pointers p1 and p2 are exactly the same. Since "The program shall not alter any of the values stored in the character array", it would seem that the last two lines above are both illegal, and possibly dangerous.
At this point, the way I would answer my own question is that it is safest to copy the original std::string into a vector and then pass a pointer to the new array to any function that might possibly change the characters.
I was hoping that that this step might no longer be necessary in C++11, for the reasons I gave in my original post.

C++ const cast, unsure if this is secure

It maybe seems to be a silly question but i really need to clarify this:
Will this bring any danger to my program?
Is the const_cast even needed?
If i change the input pointers values in place will it work safely with std::string or will it create undefined behaviour?
So far the only concern is that this could affect the string "some_text" whenever I modify the input pointer and makes it unusable.
std::string some_text = "Text with some input";
char * input = const_cast<char*>(some_text.c_str());
Thanks for giving me some hints, i would like to avoid the shoot in my own foot
As an example of evil behavior: the interaction with gcc's Copy On Write implementation.
#include <string>
#include <iostream>
int main() {
std::string const original = "Hello, World!";
std::string copy = original;
char* c = const_cast<char*>(copy.c_str());
c[0] = 'J';
std::cout << original << "\n";
}
In action at ideone.
Jello, World!
The issue ? As the name implies, gcc's implementation of std::string uses a ref-counted shared buffer under the cover. When a string is modified, the implementation will neatly check if the buffer is shared at the moment, and if it is, copy it before modifying it, ensuring that other strings sharing this buffer are not affected by the new write (thus the name, copy on write).
Now, with your evil program, you access the shared buffer via a const-method (promising not to modify anything), but you do modify it!
Note that with MSVC's implementation, which does not use Copy On Write, the behavior would be different ("Hello, World!" would be correctly printed).
This is exactly the essence of Undefined Behavior.
To modify an inherently const object by casting away its constness using const_cast is an Undefined Behavior.
string::c_str() returns a const char *, i.e: a pointer to a constant c-style string. Technically, modifying this will result in Undefined Behavior.
Note, that the use of const_cast is when you have a const pointer to a non const data and you wish to modify the non-constant data.
Simply casting will not bring forth an undefined behavior. Modifying the data pointed at, however, will. (Also see ISO 14882:98 5.2.7-7).
If you want a pointer to modifiable data, you can have a
std::vector<char> wtf(str.begin(), str.end());
char* lol= &wtf[0];
The std::string manages it's own memory internally, which is why it returns a pointer to that memory directly as it does with the c_str() function. It makes sure it's constant so that your compiler will warn you if you try to do modifiy it.
Using const_cast in that way literally casts away such safety and is only an arguably acceptable practice if you are absolutely sure that memory will not be modified.
If you can't guarantee this then you must copy the string and use the copy.; it's certainly a lot safer to do this in any event (you can use strcpy).
See the C++ reference website:
const char* c_str ( ) const;
"Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object."
Yes, it will bring danger, because
input points to whatever c_str happens to be right now, but if some_text ever changes or goes away, you'll be left with a pointer that points to garbage. The value of c_str is guaranteed to be valid only as long as the string doesn't change. And even, formally, only if you don't call c_str() on other strings too.
Why do you need to cast away the const? You're not planning on writing to *input, are you? That is a no-no!
This is a very bad thing to do. Check out what std::string::c_str() does and agree with me.
Second, consider why you want a non-const access to the internals of the std::string. Apparently you want to modify the contents, because otherwise you would use a const char pointer. Also you are concerned that you don't want to change the original string. Why not write
std::string input( some_text );
Then you have a std::string that you can mess with without affecting the original, and you have std::string functionality instead of having to work with a raw C++ pointer...
Another spin on this is that it makes code extremely difficult to maintain. Case in point: a few years ago I had to refactor some code containing long functions. The author had written the function signatures to accept const parameters but then was const_casting them within the function to remove the constness. This broke the implied guarantee given by the function and made it very difficult to know whether the parameter has changed or not within the rest of the body of the code.
In short, if you have control over the string and you think you'll need to change it, make it non-const in the first place. If you don't then you'll have to take a copy and work with that.
it is UB.
For example, you can do something like this this:
size_t const size = (sizeof(int) == 4 ? 1024 : 2048);
int arr[size];
without any cast and the comiler will not report an error. But this code is illegal.
The morale is that you need consider action each time.

std::string vs. char*

does std::string store data differently than a char* on either stack or heap or is it just derived from char* into a class?
char*
Is the size of one pointer for your CPU architecture.
May be a value returned from malloc or calloc or new or new[].
If so, must be passed to free or delete or delete[] when you're done.
If so, the characters are stored on the heap.
May result from "decomposition" of a char[ N ] (constant N) array or string literal.
Generically, no way to tell if a char* argument points to stack, heap, or global space.
Is not a class type. It participates in expressions but has no member functions.
Nevertheless implements the RandomAccessIterator interface for use with <algorithm> and such.
std::string
Is the size of several pointers, often three.
Constructs itself when created: no need for new or delete.
Owns a copy of the string, if the string may be altered.
Can copy this string from a char*.
By default, internally uses new[] much as you would to obtain a char*.
Provides for implicit conversion which makes transparent the construction from a char* or literal.
Is a class type. Defines other operators for expressions such as catenation.
Defines c_str() which returns a char* for temporary use.
Implements std::string::iterator type with begin() and end().
string::iterator is flexible: an implementation may make it a range-checked super-safe debugging helper or simply a super-efficient char* at the flip of a switch.
If you mean, does it store contiguously, then the answer is that it's not required but all known (to me, anyway) implementations do so. This is most likely to support the c_str() and data() member requirements, which is to return a contiguous string (null-terminated in the case of c_str())
As far as where the memory is stored, it's usually on the heap. But some implementations employ the "Short String Optimization", whereby short string contents are stored within a small internal buffer. So, in the case that the string object is on the stack, it's possible that the stored contents are also on the stack. But this should make no difference to how you use it, since one the object is destroyed, the memory storing the string data is invalidated in either case.
(btw, here's an article on a similar technique applied generally, which explains the optimization.)
These solve different problems. char* (or char const*) points to a C style string which isn't necessarily owned by the one storing the char* pointer. In C, because of the lack of a string type, necessarily you often use char* as "the string type".
std::string owns the string data it points to. So if you need to store a string somewhere in your class, chances are good you want to use std::string or your librarie's string class instead of char*.
On contiguity of the storage of std::string, other people already answered.