This question already has answers here:
Why does this work: returning C string literal from std::string function and calling c_str()
(9 answers)
Closed 9 years ago.
For Code:
stringstream ss("012345678901234567890123456789012345678901234567890123456789");
some articles said it is wrong for followed usage due to ss.str return temp object and will destructered before call .c_str();
const char* cstr2 = ss.str().c_str();
but I run the example, there is no problem? how to understand?
But I run the example, there is no problem?
In fact, there's nothing wrong in the expression:
const char* cstr2 = ss.str().c_str();
The temporary (copy) object returned by ss.str() will live long enough to let you get the underlying c-string with c_str().
Of course by the end of the expression you'll have a const char pointer to an object that is probably deallocated (this depends heavily on the std::basic_string implementation).
Therefore this is likely not a good idea. What you should do instead is:
auto x = ss.str();
const char* cstr2 = x.c_str();
The above code won't get you any trouble, since the returned value of str() is now being copied/is not a temporary anymore, and the access to x.c_str() will give you a valid pointer.
Yet another wonderful C++ landmine.
Basically, your pointer references a block of memory (C string) that is referencing a temporary copy (string) of whatever was in the stream at the time you did the double affectation.
str() returns a temporary object.
Life expectancy of temporary objects is rather short. Either someone takes a reference to them immediately (e.g. string& s = ss.str()) or they die at the end of the statement they were born in.
However, the compiler allows to get a reference to the same block of memory through c_str(), but indirectly, so it flies under the compiler's radar. What a pity.
So your pointer will indeed be valid at the time c_str gets it, but just for about as long as if you did something like this in plain old C:
const char * cstr2;
{
char ss_str[100]; // str() return value is dynamically created
char * c_str = &ss_str_[10]; // c_str() takes a reference to some part of it
cstr2 = c_str; // cstr2 takes an indirect reference to str() ret. val.
} // compiler pulls the plug on str() ret. val.
So, right after the end of this very statement, c_str is already referencing the cadaver of whatever string str() returned.
Now since temporary objects are allocated on the stack, you may well never notice the problem. Object deallocation in itself will likely not modify the defunct string value, but as soon as the compiler will reuse this bit of stack, the proverbial guru will have something to meditate over.
First of all, it's important to understand that attempting to make use of a dangling pointer is not guaranteed to fail in any obvious way. It can appear to "work" just as often as it can crash spectacularly.
Secondly, yes, that code is invalid and you shouldn't do it. The stringstream's lifetime is not important because std::stringstream::str() returns by value (i.e. a copy of the internal buffer), but then you're still subject to that string going out of scope before you can make use of its C-string pointer.
Related
I made a mistake in a socket interface I wrote a while back and I just noticed the problem while looking through the code for a different issue. The socket receives a string of characters and passes it to jsoncpp to complete the json parsing. I can almost understand what is happening here but I can't get my head around it. I would like to grasp what is actually happening under the hood. Here is the minimum example:
#include <iostream>
#include <cstring>
void doSomethingWithAString(const std::string &val) {
std::cout << val.size() << std::endl;
std::cout << val << std::endl;
}
int main()
{
char responseBufferForSocket[10000];
memset(responseBufferForSocket, 0, 10000);
//Lets simulate a response from a socket connection
responseBufferForSocket[0] = 'H';
responseBufferForSocket[1] = 'i';
responseBufferForSocket[2] = '?';
// Now lets pass a .... the address of the first char in the array...
// wait a minute..that's not a const std::string& ... but hey, it's ok it *works*!
doSomethingWithAString(responseBufferForSocket);
return 0;
}
The code above is not causing any obvious issues but I would like to correct it if there is a problem lurking. Obviously the character array is being transformed to a string, but by what mechanism? I guess I have four questions:
Is this string converted on the stack and passed by reference or is it passed by value?
Is it using the operator= overload? A "from c-string" constructor? Some other mechanism?
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
Is this dangerous. :)
compiled with g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
std::string has a non explicit constructor (i.e. not marked with the explicit keyword) that takes a const char* parameter and copies characters until the first '\0' (the behaviour is undefined if no such character exists in the string). In other words, it performs a copy of the source data. It's overload #5 on this page.
const char[] implicitly decays to const char*, and you can pass a temporary to a function taking a const reference parameter. This only works if the reference is const, by the way; if you can't use const, pass it by value.
And so, when you pass a const char[] to that function, a temporary object of type std::string is constructed using that constructor, and bound to the parameter. The temporary will remain alive for the duration of the function call, and will be destroyed when it returns.
With all that in mind, let's address your questions:
It's passed by reference, but the reference is to a temporary object.
A constructor, since we're constructing an object. std::string also has an operator= taking a const char* parameter, but that's never used for implicit conversions: you'll need to be explicitly assigning something.
The performance is the same since the same code runs, but you do incur some overhead because the data is copied instead of referenced. If that is an issue, use std::string_view instead.
It's safe as long as you don't try to keep a reference or pointer to the parameter for longer than the function call, because the object might not be alive afterwards (but then you should always keep that in mind with reference parameters). You also need to make sure that the C string you're passing is properly null terminated.
Is this string converted on the stack
The language doesn't specify the storage of temporary objects, but in this case it is probably stored on the stack, yes.
or is it passed by value?
The argument is a reference. Therefore you are "passing by reference".
Is it using the operator= overload?
No. You aren't using operator= there, so why would it?
A "from c-string" constructor?
Yes.
Based on 2 is this less efficient in than converting to a string explicitly using a constructor?
No. Whether object is created implicitly or explicitly is irrelevant to efficiency.
Creating a std::string is however potentially less efficient than not creating it which you could achieve by not accepting a reference to a string as the argument. You could use a string view instead.
Is this dangerous.
Not particularly. In some cases implicit conversions can cause a bit of problems when the programmers doesn't notice them, but typically they simplify the language by reducing verbosity.
This is not my program so don't start berating me :-). Some random program I got. A globally declared buffer is being returned by MyFunc(). I use VS2008 and it does not complain
static char buffer[1024];
std::string MyFunc() {
....
....
return buffer;
}
However when I add this line of code
char * ret;
ret = MyFunc()
It complains: "error: no suitable conversion function from "std::string" to "char *" exists"
My question is why is the compiler complaining now? Why this inconstancy in syntax checking? Again I dont have the freedom to change MyFunc(). In my program if I can make
std::string ret;
ret = MyFunc();
and get rid of the syntax error but would really like to understand this strange behavior.
string() has a constructor that accepts a char*, so you get an automatic conversion. There is no automatic conversion from a string to a char*. You have to call string::c_str() to get the char*.
Edit
Although you asked only for an explanation of the behavior, others in this forum seem to think I have short-changed you by not mentioning that string::c_str returns a const char*, not a simple char*. But the explanation remains: there is no implicit/automatic conversion from string to char* or const char*. Feel free to read about c_str here if it's important to you.
It is not the syntax, it is the structure of the std::string that makes the compiler behave differently.
When you are returning a char* from a function returning std::string, the compiler notices that there is a constructor of std::string that takes char*, calls that constructor, and quietly returns the result.
When you are trying to return a std::string from a char* - returning function, the compiler tries to see if there is a conversion operator to make char* from a std::string, finds that there is no such operator, and reports an error.
If you want to convert a string to char*, you need to make a copy of the string's buffer, like this:
char* ret_ch = new char[ret.size()+1];
memcpy(ret_ch, ret.c_str(), ret.size()+1);
return ret_ch;
You could think that it is OK to return c_str() by itself, but it is not a good idea: the buffer that "backs up" this C string belongs to std::string object, so once the string gets deallocated, accessing the buffer starts producing undefined behavior. That is why you need to make an explicit copy when you access the buffer of a string. Of course you are also responsible for calling delete[] on the copied result.
std::string is designed as implicitly constructable from char const* because this supports using string literals and typical C style code strings as initializer values.
If this was not supported then one would just have to use some intermediate function, which would add nothing but verbosity and inefficiency.
In the other direction, however, std::string is intentionally designed to not convert implicitly to char const*. Part of the rationale is probably that with std::string being logically mutable, the returned raw pointer is only valid as long as no operations are performed that might cause a buffer replacement or string destruction. For example,
char const* s = foo().c_str();
where foo produces a std::string, makes s point to a buffer that no longer exists, a dangling pointer that is invalid.
The c_str() member function call makes the conversion stand out.
Consider how more common that problem could be if one could write just
char const* s = foo();
and have that compile.
Regarding that strike-through (deleted) text, I realized that it's completely irrelevant whether the string is logically mutable or immutable. Sorry. Need more coffee!
I am confused about the following code:
string _str = "SDFDFSD";
char* pStr = (char*)_str.data();
for (int i = 0; i < iSize; i++)
pStr[i] = ::tolower(pStr[i]);
here _str.data() returns const char*. But we are assigning it to a char*. My questions is,
_str.data()is returning pointer to a constant data. How is it possible to store it in a pointer to data? The data was constant right? If we assign it to char pointer than we can change it like we are doing inside the for statement which should not be possible for a constant data.
Don't do that. It may be fine in this case, but as the documentation for data() says:
The pointer returned may be invalidated by further calls to other
member functions that modify the object.
A program shall not alter any of the characters in this sequence.
So you could very accidentally write to invalid memory if you keep that pointer around. Or, in fact, ruin the implementation of std::string. I would almost go as far as to say that this function shouldn't be exposed.
std::string offers a non-const operator[] for that purpose.
string _str = "SDFDFSD";
for (int i = 0; i < iSize; i++)
_str[i] = ::tolower(_str[i]);
What you are doing is not valid at the standard library level (you're violating std::string contract) but valid at the C++ core language level.
The char * returned from data should not be written to because for example it could be in theory(*) shared between different strings with the same value.
If you want to modify a string just use std::string::operator[] that will inform the object of the intention and will take care of creating a private buffer for the specific instance in case the string was originally shared instead.
Technically you are allowed to cast-away const-ness from a pointer or a reference, but if it's a valid operation or not depends on the semantic of the specific case. The reason for which the operation is allowed is that the main philosophy of C++ is that programmers make no mistakes and know what they are doing. For example is technically legal from a C++ language point of view to do memcpy(&x, "hello", 5) where x is a class instance, but the results are most probably "undefined behavior".
If you think that your code "works" it's because you've the wrong understanding of what "works" really should mean (hint: "works" doesn't mean that someone once observed the code doing what seemed reasonable, but that will work in all cases). A valid C++ implementation is free to do anything it wants if you run that program: that you observed something you think is fine doesn't really mean anything, may be you didn't look close enough, or may be you were just lucky (unfortunate, actually) that no crash happened right away.
(*) In modern times the COW (copy-on-write) implementations of std::string are low in popularity because they pose a lot of problems (e.g. with multithreading) and memory is a lot cheaper now. Still std::string contract says you're not allowed to change the memory pointed by the return value of data(); if you do anything may happen.
You never must change the data returned from std::string::data() or std::string::c_str() directly.
To create a copy of a std::string:
std::string str1 = "test";
std::string str2 = str1; // copy.
Change characters in a string:
std::string str1 = "test"
str1[0] = 'T';
The "correct" way would be to use std::transform instead:
std::transform(_str.begin(), _str.end(), _str.begin(), ::tolower);
The simple answer to your question is that in C++ you can cast away the 'const' of a variable.
You probably shouldn't though.
See this for const correctness in C++
String always allocates memory on heap, so this is not actually const data, it is just marked so (in method data() signature) to prevent modification.
But nothing is impossible in C++, so with a simple cast, though unsafe, you can now treat the same memory space as modifiable.
All constants in a C/C++ program (like "SDFDFSD" below)will be stored in a separate section .rodata. This section is mapped as read-only when the binary is loaded into memory during execution.
int main()
{
char* ptr = "SDFDFSD";
ptr[0]='x'; //segmentation fault!!
return 0;
}
Hence any attempt to modify the data at that location will result in a run-time error i.e. a segmentation fault.
Coming to the above question, when creating a string and assigning a string to it, a new copy in memory now exists (memory used to hold the properties of the string object _str). This is on the heap and NOT mapped to a read-only section. The member function _str.data() points to the location in memory which is mapped read/write.
The const qualifier to the return type ensure that this function is NOT accidentally passed to string manipulation functions which expect a non-const char* pointer.
In your current iteration there was no limitation on the memory location itself that was holding the string object's data; i.e. it was mapped with both read/write permissions. Hence modifying the location using another non-const pointer worked i.e. pStr[i] on the left hand side of an assignment did NOT result in a run-time error as there were no inherent restrictions on the memory location itself.
Again this is NOT guaranteed to work and just a implementation specific behaviour that you have observed (i.e. it simply happens to work for you) and cannot always depend on this.
It maybe seems to be a silly question but i really need to clarify this:
Will this bring any danger to my program?
Is the const_cast even needed?
If i change the input pointers values in place will it work safely with std::string or will it create undefined behaviour?
So far the only concern is that this could affect the string "some_text" whenever I modify the input pointer and makes it unusable.
std::string some_text = "Text with some input";
char * input = const_cast<char*>(some_text.c_str());
Thanks for giving me some hints, i would like to avoid the shoot in my own foot
As an example of evil behavior: the interaction with gcc's Copy On Write implementation.
#include <string>
#include <iostream>
int main() {
std::string const original = "Hello, World!";
std::string copy = original;
char* c = const_cast<char*>(copy.c_str());
c[0] = 'J';
std::cout << original << "\n";
}
In action at ideone.
Jello, World!
The issue ? As the name implies, gcc's implementation of std::string uses a ref-counted shared buffer under the cover. When a string is modified, the implementation will neatly check if the buffer is shared at the moment, and if it is, copy it before modifying it, ensuring that other strings sharing this buffer are not affected by the new write (thus the name, copy on write).
Now, with your evil program, you access the shared buffer via a const-method (promising not to modify anything), but you do modify it!
Note that with MSVC's implementation, which does not use Copy On Write, the behavior would be different ("Hello, World!" would be correctly printed).
This is exactly the essence of Undefined Behavior.
To modify an inherently const object by casting away its constness using const_cast is an Undefined Behavior.
string::c_str() returns a const char *, i.e: a pointer to a constant c-style string. Technically, modifying this will result in Undefined Behavior.
Note, that the use of const_cast is when you have a const pointer to a non const data and you wish to modify the non-constant data.
Simply casting will not bring forth an undefined behavior. Modifying the data pointed at, however, will. (Also see ISO 14882:98 5.2.7-7).
If you want a pointer to modifiable data, you can have a
std::vector<char> wtf(str.begin(), str.end());
char* lol= &wtf[0];
The std::string manages it's own memory internally, which is why it returns a pointer to that memory directly as it does with the c_str() function. It makes sure it's constant so that your compiler will warn you if you try to do modifiy it.
Using const_cast in that way literally casts away such safety and is only an arguably acceptable practice if you are absolutely sure that memory will not be modified.
If you can't guarantee this then you must copy the string and use the copy.; it's certainly a lot safer to do this in any event (you can use strcpy).
See the C++ reference website:
const char* c_str ( ) const;
"Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object."
Yes, it will bring danger, because
input points to whatever c_str happens to be right now, but if some_text ever changes or goes away, you'll be left with a pointer that points to garbage. The value of c_str is guaranteed to be valid only as long as the string doesn't change. And even, formally, only if you don't call c_str() on other strings too.
Why do you need to cast away the const? You're not planning on writing to *input, are you? That is a no-no!
This is a very bad thing to do. Check out what std::string::c_str() does and agree with me.
Second, consider why you want a non-const access to the internals of the std::string. Apparently you want to modify the contents, because otherwise you would use a const char pointer. Also you are concerned that you don't want to change the original string. Why not write
std::string input( some_text );
Then you have a std::string that you can mess with without affecting the original, and you have std::string functionality instead of having to work with a raw C++ pointer...
Another spin on this is that it makes code extremely difficult to maintain. Case in point: a few years ago I had to refactor some code containing long functions. The author had written the function signatures to accept const parameters but then was const_casting them within the function to remove the constness. This broke the implied guarantee given by the function and made it very difficult to know whether the parameter has changed or not within the rest of the body of the code.
In short, if you have control over the string and you think you'll need to change it, make it non-const in the first place. If you don't then you'll have to take a copy and work with that.
it is UB.
For example, you can do something like this this:
size_t const size = (sizeof(int) == 4 ? 1024 : 2048);
int arr[size];
without any cast and the comiler will not report an error. But this code is illegal.
The morale is that you need consider action each time.
String data
What I am particularly confused about is this statement
"Its contents are guaranteed to remain unchanged only until the next call to a non-constant member function of the string object."
Can someone clarify what does this mean? When to use this and when to avoid using this?
They mean that you could store the pointer and use it later. If some non-const method is called between two accesses the contents of the buffer your stored pointer is set to may change and your will face unexpected behaviour.
const char* data() const;
This is saying that the const char * returned by calling str.data() will not change unless someone modifies the string that it came from. Once someone calls a non-constant member function, the returned pointer could be invalid, or could point to different data from what it pointed to immediately after the str.data() function returned.
It means you can pass the returned data to C functions, for example. It means you should not do something like:
const char *old = str.data();
size_t len = str.length();
...call a function that modifies str...
// cout << old << endl;
// Since old is not guaranteed to be null terminated (thanks MSalter),
// do something else with the old data instead of writing to cout.
// Inventiveness not at a high this morning; this isn't a particularly
// good example of what to do - a sort of string copy.
char buffer[256];
memcpy(buffer, old, MIN(sizeof(buffer)-1, len));
buffer[len] = '\0';
By the time the I/O memory copying is done, old may not be valid any more, and len may also be incorrect.
Sometimes, you need to have access to the string formatted as an array of characters - usually because you need to pass the string to some function which expects the string like this (for example strcmp). You can do this by using the data or c_str members, but you have to respect the rules for calling the function which are spelled out plainly in the link you provided:
The returned array points to an
internal location which should not be
modified directly in the program. Its
contents are guaranteed to remain
unchanged only until the next call to
a non-constant member function of the
string object.
You cannot modify the array of characters - the string object assumes that you do not, and if you do this will lead to undefined behaviour.