String data
What I am particularly confused about is this statement
"Its contents are guaranteed to remain unchanged only until the next call to a non-constant member function of the string object."
Can someone clarify what does this mean? When to use this and when to avoid using this?
They mean that you could store the pointer and use it later. If some non-const method is called between two accesses the contents of the buffer your stored pointer is set to may change and your will face unexpected behaviour.
const char* data() const;
This is saying that the const char * returned by calling str.data() will not change unless someone modifies the string that it came from. Once someone calls a non-constant member function, the returned pointer could be invalid, or could point to different data from what it pointed to immediately after the str.data() function returned.
It means you can pass the returned data to C functions, for example. It means you should not do something like:
const char *old = str.data();
size_t len = str.length();
...call a function that modifies str...
// cout << old << endl;
// Since old is not guaranteed to be null terminated (thanks MSalter),
// do something else with the old data instead of writing to cout.
// Inventiveness not at a high this morning; this isn't a particularly
// good example of what to do - a sort of string copy.
char buffer[256];
memcpy(buffer, old, MIN(sizeof(buffer)-1, len));
buffer[len] = '\0';
By the time the I/O memory copying is done, old may not be valid any more, and len may also be incorrect.
Sometimes, you need to have access to the string formatted as an array of characters - usually because you need to pass the string to some function which expects the string like this (for example strcmp). You can do this by using the data or c_str members, but you have to respect the rules for calling the function which are spelled out plainly in the link you provided:
The returned array points to an
internal location which should not be
modified directly in the program. Its
contents are guaranteed to remain
unchanged only until the next call to
a non-constant member function of the
string object.
You cannot modify the array of characters - the string object assumes that you do not, and if you do this will lead to undefined behaviour.
Related
I'm trying to set a pointer array to a char array in class Tran, however it only applies the first letter of the string. I've tried many other ways but can't get the whole string to go into name.
edit: name is a private variable
char name[MAX_NAME + 1];
Trying to output it using cout << name << endl;
the input is:
setTran("Birth Tran", 1);
help would be appreciated, thank youu
namee[0] == NULL
name[0] = NULL;
These are bugs. NULL is for pointers. name[0] as well as namee[0] is a char. It may work (by work, I mean it will assign the first character to be the null terminator character) on some systems because 0 is both a null pointer constant and an integer literal and thus convertible to char, and NULL may be defined as 0. But NULL may also be defined as nullptr in which case the program will be ill-formed.
Use name[0] = '\0' instead.
name[0] = *namee;
however it only applies the first letter of the string.
Well, you assign only the first character, so this is to be expected.
If you would like to copy the entire string, you need to assign all of the characters. That can be implemented with a loop. There are standard functions for copying a string though; You can use std::strncpy.
That said, constant length arrays are usually problematic because it is rarely possible to correctly predict the maximum required size. std::string is a more robust alternative.
The underlying issue you are trying to assign a const char* to an char* const. When declaring
char name[MAX_NAME + 1];
You are declaring a constant memory address containing mutable char data (char* const). When you are passing a const char* to your function, you are passing a mutable pointer containing constant data. This will not compile. You should be doing a deep copy of the char array by using:
strcpy_s(dst, buffer_size, src);
This copy function will make sure that your array does not overflow, and that it is null terminated.
In order to be able to assign a pointer to a char array, it would need to be allocated on the heap with
char* name = new char[MAX_NAME + 1];
This would allow assigning a char* or char* const to it afterwards. You however need to manage the memory dynamically at this point, and I would advise against this in your case, as passing "Birth Tran" would lead to undefined behaviours as soon as char* const namee goes out of scope.
I am confused about the following code:
string _str = "SDFDFSD";
char* pStr = (char*)_str.data();
for (int i = 0; i < iSize; i++)
pStr[i] = ::tolower(pStr[i]);
here _str.data() returns const char*. But we are assigning it to a char*. My questions is,
_str.data()is returning pointer to a constant data. How is it possible to store it in a pointer to data? The data was constant right? If we assign it to char pointer than we can change it like we are doing inside the for statement which should not be possible for a constant data.
Don't do that. It may be fine in this case, but as the documentation for data() says:
The pointer returned may be invalidated by further calls to other
member functions that modify the object.
A program shall not alter any of the characters in this sequence.
So you could very accidentally write to invalid memory if you keep that pointer around. Or, in fact, ruin the implementation of std::string. I would almost go as far as to say that this function shouldn't be exposed.
std::string offers a non-const operator[] for that purpose.
string _str = "SDFDFSD";
for (int i = 0; i < iSize; i++)
_str[i] = ::tolower(_str[i]);
What you are doing is not valid at the standard library level (you're violating std::string contract) but valid at the C++ core language level.
The char * returned from data should not be written to because for example it could be in theory(*) shared between different strings with the same value.
If you want to modify a string just use std::string::operator[] that will inform the object of the intention and will take care of creating a private buffer for the specific instance in case the string was originally shared instead.
Technically you are allowed to cast-away const-ness from a pointer or a reference, but if it's a valid operation or not depends on the semantic of the specific case. The reason for which the operation is allowed is that the main philosophy of C++ is that programmers make no mistakes and know what they are doing. For example is technically legal from a C++ language point of view to do memcpy(&x, "hello", 5) where x is a class instance, but the results are most probably "undefined behavior".
If you think that your code "works" it's because you've the wrong understanding of what "works" really should mean (hint: "works" doesn't mean that someone once observed the code doing what seemed reasonable, but that will work in all cases). A valid C++ implementation is free to do anything it wants if you run that program: that you observed something you think is fine doesn't really mean anything, may be you didn't look close enough, or may be you were just lucky (unfortunate, actually) that no crash happened right away.
(*) In modern times the COW (copy-on-write) implementations of std::string are low in popularity because they pose a lot of problems (e.g. with multithreading) and memory is a lot cheaper now. Still std::string contract says you're not allowed to change the memory pointed by the return value of data(); if you do anything may happen.
You never must change the data returned from std::string::data() or std::string::c_str() directly.
To create a copy of a std::string:
std::string str1 = "test";
std::string str2 = str1; // copy.
Change characters in a string:
std::string str1 = "test"
str1[0] = 'T';
The "correct" way would be to use std::transform instead:
std::transform(_str.begin(), _str.end(), _str.begin(), ::tolower);
The simple answer to your question is that in C++ you can cast away the 'const' of a variable.
You probably shouldn't though.
See this for const correctness in C++
String always allocates memory on heap, so this is not actually const data, it is just marked so (in method data() signature) to prevent modification.
But nothing is impossible in C++, so with a simple cast, though unsafe, you can now treat the same memory space as modifiable.
All constants in a C/C++ program (like "SDFDFSD" below)will be stored in a separate section .rodata. This section is mapped as read-only when the binary is loaded into memory during execution.
int main()
{
char* ptr = "SDFDFSD";
ptr[0]='x'; //segmentation fault!!
return 0;
}
Hence any attempt to modify the data at that location will result in a run-time error i.e. a segmentation fault.
Coming to the above question, when creating a string and assigning a string to it, a new copy in memory now exists (memory used to hold the properties of the string object _str). This is on the heap and NOT mapped to a read-only section. The member function _str.data() points to the location in memory which is mapped read/write.
The const qualifier to the return type ensure that this function is NOT accidentally passed to string manipulation functions which expect a non-const char* pointer.
In your current iteration there was no limitation on the memory location itself that was holding the string object's data; i.e. it was mapped with both read/write permissions. Hence modifying the location using another non-const pointer worked i.e. pStr[i] on the left hand side of an assignment did NOT result in a run-time error as there were no inherent restrictions on the memory location itself.
Again this is NOT guaranteed to work and just a implementation specific behaviour that you have observed (i.e. it simply happens to work for you) and cannot always depend on this.
This question already has answers here:
Why does this work: returning C string literal from std::string function and calling c_str()
(9 answers)
Closed 9 years ago.
For Code:
stringstream ss("012345678901234567890123456789012345678901234567890123456789");
some articles said it is wrong for followed usage due to ss.str return temp object and will destructered before call .c_str();
const char* cstr2 = ss.str().c_str();
but I run the example, there is no problem? how to understand?
But I run the example, there is no problem?
In fact, there's nothing wrong in the expression:
const char* cstr2 = ss.str().c_str();
The temporary (copy) object returned by ss.str() will live long enough to let you get the underlying c-string with c_str().
Of course by the end of the expression you'll have a const char pointer to an object that is probably deallocated (this depends heavily on the std::basic_string implementation).
Therefore this is likely not a good idea. What you should do instead is:
auto x = ss.str();
const char* cstr2 = x.c_str();
The above code won't get you any trouble, since the returned value of str() is now being copied/is not a temporary anymore, and the access to x.c_str() will give you a valid pointer.
Yet another wonderful C++ landmine.
Basically, your pointer references a block of memory (C string) that is referencing a temporary copy (string) of whatever was in the stream at the time you did the double affectation.
str() returns a temporary object.
Life expectancy of temporary objects is rather short. Either someone takes a reference to them immediately (e.g. string& s = ss.str()) or they die at the end of the statement they were born in.
However, the compiler allows to get a reference to the same block of memory through c_str(), but indirectly, so it flies under the compiler's radar. What a pity.
So your pointer will indeed be valid at the time c_str gets it, but just for about as long as if you did something like this in plain old C:
const char * cstr2;
{
char ss_str[100]; // str() return value is dynamically created
char * c_str = &ss_str_[10]; // c_str() takes a reference to some part of it
cstr2 = c_str; // cstr2 takes an indirect reference to str() ret. val.
} // compiler pulls the plug on str() ret. val.
So, right after the end of this very statement, c_str is already referencing the cadaver of whatever string str() returned.
Now since temporary objects are allocated on the stack, you may well never notice the problem. Object deallocation in itself will likely not modify the defunct string value, but as soon as the compiler will reuse this bit of stack, the proverbial guru will have something to meditate over.
First of all, it's important to understand that attempting to make use of a dangling pointer is not guaranteed to fail in any obvious way. It can appear to "work" just as often as it can crash spectacularly.
Secondly, yes, that code is invalid and you shouldn't do it. The stringstream's lifetime is not important because std::stringstream::str() returns by value (i.e. a copy of the internal buffer), but then you're still subject to that string going out of scope before you can make use of its C-string pointer.
Pointers have always made me blank about the logic I intend to use in code, If someone can help me understand a few concepts that would be really helpful. Here's a code snippet from my program,
vector <char> st;
char *formatForHtml(string str, string htmlTag)
{
string strBegin;
strBegin = "<";
strBegin.append(htmlTag);
strBegin.append(">");
strBegin.append(str);
string strEnd = "</";
strEnd.append(htmlTag);
strEnd.append(">");
strBegin.append(strEnd);
st.resize(strBegin.size());
for (int i =0;i <strBegin.size();i++) {
st[i] = strBegin.at(i);
}
return &st[0];
}
In the code above if I have to return address of st[0], I have to write the function of type char *. I need to know the reason to do so, also if address is the integer value why can I not define function as an int type?
P.S. It's a beginner level doubt.
You don't tell us what st is, so we can't tell whether the code is
totally incorrect, or just bad design. If st is a typo for str
(just guessing, since str isn't used anywhere), then you have
undefined behavior, and the program is incorrect, since you're returning
a pointer into an object which has been destructed. At any rate, a
function like formatForHtml should return an std::string, as a
value, and not a pointer (and certainly not a pointer to a non-const).
I might add that you don't use a loop to copy string values character by
character. std::string acts as a normal value type, so you can just
assign: st = strBegin;.
EDIT:
Just for the record, I've since reexamined your code. The natural way
of writing it would be:
std::string
formatForHtml( std::string const& cdata, std::string const& tag )
{
return '<' + tag + '>' + cdata + "</" + tag + '>';
}
No pointers (at least not visible---in fact, "+ operator), and full use of
std::strings facilities.
I have to write the function of type 'char *' I need to know the reason to do so
There's no reason to do so. Just return a std::string.
From the looks of your code, st is a std::vector<char>. Since std::vector has continuous memory, &st[0] is the address of the first char in the vector, and thus points to the not null-terminated string that the vector represents, which isn't very helpful outside the function because you can't find the length.
That's awkward code, you're expecting 'str' as an argument, but then you return its memory location? Why not return the string itself?
Your code there has a few typos. (Did you mean 'str' when you typed 'st'? If not, then where is 'st' defined?)
As for why you can't define the function as "int" type, that is because an "int" is a different type to "pointer to char".
Personally, I would return 'string', and just have a 'return str' at the end there.
Pointer is pointer, not integer value. Code is right if and only if st is global variable or declared as static in this function. But return string is preferably, than return pointer.
Why don't you use string as return type for your function?
First: a pointer is an adress of something lying in memory, so you can't replace it with an integer (only with some dirty tricks).
Second: &st[0] is the adress of the internal buffer the string uses to store its content. When st goes out of scope (or is reused for the next call), it will be overwritten or given back to the heap manager, so the one calling this function will end with some garbage.
Btw, most of your function does unnessesary work which the string class can do by itself (for example copiing strBegin to st).
Simple.
string str
is something that contains a bunch characters.
str[0]
return the first character.
Therefore,
&str[0]
gives you the address of the first character, therefore it's a "char *" --- "char" being whatever the pointer points to, "*" being the "address of" part of the pointer. The type of the pointer is about the object being pointed to.
Also, addresses may be just "integers" on the hardware level (everything is an integer... except for integers themselves, they are a bunch of bools), but that doesn't mean anything for a higher level language. For a higher level language, a pointer is a pointer, not an integer.
Treating one as another is almost always an error, and C++11 even added the nullptr keyword so we don't have to rely on the one valid "pointer is an integer" case.
And btw, your code won't work. "str" only exists for the duration of the function, so when the function returns "str" ceases to exist, which means the pointer you have returned points into a big black nowhere.
It maybe seems to be a silly question but i really need to clarify this:
Will this bring any danger to my program?
Is the const_cast even needed?
If i change the input pointers values in place will it work safely with std::string or will it create undefined behaviour?
So far the only concern is that this could affect the string "some_text" whenever I modify the input pointer and makes it unusable.
std::string some_text = "Text with some input";
char * input = const_cast<char*>(some_text.c_str());
Thanks for giving me some hints, i would like to avoid the shoot in my own foot
As an example of evil behavior: the interaction with gcc's Copy On Write implementation.
#include <string>
#include <iostream>
int main() {
std::string const original = "Hello, World!";
std::string copy = original;
char* c = const_cast<char*>(copy.c_str());
c[0] = 'J';
std::cout << original << "\n";
}
In action at ideone.
Jello, World!
The issue ? As the name implies, gcc's implementation of std::string uses a ref-counted shared buffer under the cover. When a string is modified, the implementation will neatly check if the buffer is shared at the moment, and if it is, copy it before modifying it, ensuring that other strings sharing this buffer are not affected by the new write (thus the name, copy on write).
Now, with your evil program, you access the shared buffer via a const-method (promising not to modify anything), but you do modify it!
Note that with MSVC's implementation, which does not use Copy On Write, the behavior would be different ("Hello, World!" would be correctly printed).
This is exactly the essence of Undefined Behavior.
To modify an inherently const object by casting away its constness using const_cast is an Undefined Behavior.
string::c_str() returns a const char *, i.e: a pointer to a constant c-style string. Technically, modifying this will result in Undefined Behavior.
Note, that the use of const_cast is when you have a const pointer to a non const data and you wish to modify the non-constant data.
Simply casting will not bring forth an undefined behavior. Modifying the data pointed at, however, will. (Also see ISO 14882:98 5.2.7-7).
If you want a pointer to modifiable data, you can have a
std::vector<char> wtf(str.begin(), str.end());
char* lol= &wtf[0];
The std::string manages it's own memory internally, which is why it returns a pointer to that memory directly as it does with the c_str() function. It makes sure it's constant so that your compiler will warn you if you try to do modifiy it.
Using const_cast in that way literally casts away such safety and is only an arguably acceptable practice if you are absolutely sure that memory will not be modified.
If you can't guarantee this then you must copy the string and use the copy.; it's certainly a lot safer to do this in any event (you can use strcpy).
See the C++ reference website:
const char* c_str ( ) const;
"Generates a null-terminated sequence of characters (c-string) with the same content as the string object and returns it as a pointer to an array of characters.
A terminating null character is automatically appended.
The returned array points to an internal location with the required storage space for this sequence of characters plus its terminating null-character, but the values in this array should not be modified in the program and are only guaranteed to remain unchanged until the next call to a non-constant member function of the string object."
Yes, it will bring danger, because
input points to whatever c_str happens to be right now, but if some_text ever changes or goes away, you'll be left with a pointer that points to garbage. The value of c_str is guaranteed to be valid only as long as the string doesn't change. And even, formally, only if you don't call c_str() on other strings too.
Why do you need to cast away the const? You're not planning on writing to *input, are you? That is a no-no!
This is a very bad thing to do. Check out what std::string::c_str() does and agree with me.
Second, consider why you want a non-const access to the internals of the std::string. Apparently you want to modify the contents, because otherwise you would use a const char pointer. Also you are concerned that you don't want to change the original string. Why not write
std::string input( some_text );
Then you have a std::string that you can mess with without affecting the original, and you have std::string functionality instead of having to work with a raw C++ pointer...
Another spin on this is that it makes code extremely difficult to maintain. Case in point: a few years ago I had to refactor some code containing long functions. The author had written the function signatures to accept const parameters but then was const_casting them within the function to remove the constness. This broke the implied guarantee given by the function and made it very difficult to know whether the parameter has changed or not within the rest of the body of the code.
In short, if you have control over the string and you think you'll need to change it, make it non-const in the first place. If you don't then you'll have to take a copy and work with that.
it is UB.
For example, you can do something like this this:
size_t const size = (sizeof(int) == 4 ? 1024 : 2048);
int arr[size];
without any cast and the comiler will not report an error. But this code is illegal.
The morale is that you need consider action each time.