Casting c_str() only works for short strings - c++

I'm using a C library in C++ and wrote a wrapper. At one point I need to convert an std::string to a c-style string. There is a class with a function, which returns a string. Casting the returned string works if the string is short, otherwise not. Here is a simple and reduced example illustrating the issue:
#include <iostream>
#include <string>
class StringBox {
public:
std::string getString() const { return text_; }
StringBox(std::string text) : text_(text){};
private:
std::string text_;
};
int main(int argc, char **argv) {
const unsigned char *castString = NULL;
std::string someString = "I am a loooooooooooooooooong string"; // Won't work
// std::string someString = "hello"; // This one works
StringBox box(someString);
castString = (const unsigned char *)box.getString().c_str();
std::cout << "castString: " << castString << std::endl;
return 0;
}
Executing the file above prints this to the console:
castString:
whereas if I swap the commenting on someString, it correctly prints
castString: hello
How is this possible?

You are invoking c_str on a temporary string object retuned by the getString() member function. The pointer returned by c_str() is only valid as long as the original string object exists, so at the end of the line where you assign castString it ends up being a dangling pointer. Officially, this leads to undefined behavior.
So why does this work for short strings? I suspect that you're seeing the effects of the Short String Optimization, an optimization where for strings less than a certain length the character data is stored inside the bytes of the string object itself rather than in the heap. It's possible that the temporary string that was returned was stored on the stack, so when it was cleaned up no deallocations occurred and the pointer to the expired string object still holds your old string bytes. This seems consistent with what you're seeing, but it still doesn't mean what you're doing is a good idea. :-)

box.getString() is an anonymous temporary. c_str() is only valid for the length of the variable.
So in your case, c_str() is invalidated by the time you get to the std::cout. The behaviour of reading the pointer contents is undefined.
(Interestingly the behaviour of your short string is possibly different due to std::string storing short strings in a different way.)

As you return by value
box.getString() is a temporary and so
box.getString().c_str() is valid only during the expression, then it is a dangling pointer.
You may fix that with
const std::string& getString() const { return text_; }

box.getString() produces a temporary. Calling c_str() on that gives you a pointer to a temporary. After the temporary ceases to exist, which is immediately, the pointer is invalid, a dangling pointer.
Using a dangling pointer is Undefined Behavior.

First of all, your code has UB independent of the length of the string: At the end of
castString = (const unsigned char *)box.getString().c_str();
the string returned by getString is destroyed and castString is a dangling pointer to the internal buffer of the destroyed string object.
The reason your code "works" for small strings is probably Small String Optimization: Short strings are (commonly) saved in the string object itself instead of being saved in an dynamically allocated array, and apparently that memory is still accesible and unmodified in your case.

Related

C++ sending forming and sending JSON structures and posting with CurlLib [duplicate]

I have a function that is returning a string. However, when I call it and do c_str() on it to convert it into a const char*, it only works when I store it into another string first. If I directly call c_str() off of the function, it stores garbage value in the const char*.
Why is this happening? Feel like I'm missing something very fundamental here...
string str = SomeFunction();
const char* strConverted = str.c_str(); // strConverted stores the value of the string properly
const char* charArray= SomeFunction().c_str(); // charArray stores garbage value
static string SomeFunction()
{
string str;
// does some string stuff
return str;
}
SomeFunction().c_str() gives you a pointer to a temporary(the automatic variable str in the body of SomeFunction). Unlike with references, the lifetime of temporaries isn't extended in this case and you end up with charArray being a dangling pointer explaining the garbage value you see later on when you try to use charArray.
On the other hand, when you do
string str_copy = SomeFunction();
str_copy is a copy of the return value of SomeFunction(). Calling c_str() on it now gives you a pointer to valid data.
The value object returned by a function is a temporary. The results of c_str() are valid only through the lifetime of the temporary. The lifetime of the temporary in most cases is to the end of the full expression, which is often the semicolon.
const char *p = SomeFunction();
printf("%s\n", p); // p points to invalid memory here.
The workaround is to make sure that you use the result of c_str() before the end of the full expression.
#include <cstring>
char *strdup(const char *src_str) noexcept {
char *new_str = new char[std::strlen(src_str) + 1];
std::strcpy(new_str, src_str);
return new_str;
}
const char *p = strdup(SomeFunction.c_str());
Note that strdup is a POSIX function, so if you are a platform that supports POSIX, it's already there.
The "string str" in method SomeFunction() is a local variable in SomeFunction(), and only survives inside the scope of SomeFunction();
Since the return type of the method SomeFunction() is string, not a reference of string, after "return str;", SomeFunction() will return a copy of the value of str, which will be stored as a temporary value in some place of memory, after the call of SomeFunction(), the temporary value will be destroyed immediately;
"string str = SomeFunction();" will store the returned temporary value of SomeFunction() to string str, actually is a copy of that value and stored to str, a new memory block is allocated, and the lifetime of str is bigger than the returned temporary value of SomeFunction(), after the ";" the call of SomeFunction() is finished, and the returned temporary value is destroyed immediately, the memory is recycled by system, but the copy of this value is still stored in str. That is why "const char* strConverted = str.c_str();" can get the right value, actually c_str() returned a pointer of the initial element of str (the first element memory address of str pointed string value), not the returned temporary value of SomeFunction();
"const char* charArray= SomeFunction().c_str();" is different, "SomeFunction().c_str()" will return a pointer of the initial element of the returned temporary value (the first element memory address of returned temporary string value), but after the call of SomeFunction(), the returned temporary value is destroyed, and that memory address is reused by the system, charArray can get the value of that memory address, but not the value you expected;
Use strcpy to copy the string to a locally defined array and your code will work fine.

Is there a dangling pointer problem in this code?

string str;
char *a=str.c_str();
This code is working fine for me but every place else I see this code instead
string str;
char *a=new char[str.length()];
strcpy(a,str.c_str());
I wonder which one is correct and why?
Assuming that the type of str is std::string, neither of the code is are correct.
char *a=str.c_str();
is invalid because c_str() will return const char* and removing const without casting (usually const_cast) is invalid.
char *a=new char[str.length()];
strcpy(a,str.c_str());
is invalid because str.length() don't count the terminating null-character while allocating for terminating null-character is required to use strcpy().
There are no dangling pointer problem in code posted here because no pointers are invalidated here.
The two code segments do different things.
The first assigns the pointer value of str to your new c-tpye string, and implicitly converts from const char*(c_str() return type) to char*, which is wrong. If you were to change your new string you would face an error. Even if c_str() returned char*, altering the new string would also make changes in str.
The second on the other hand creates a new c-type string from the original string, copying it byte-by-byte to the new memory allocated for your new string.
Although the line of code you wrote is incorrect, as it does not cover the terminating null character of a c-type string \0. In order to fix that, allocate 1 extra byte for it:
char *a=new char[str.length()+1];
After copying the data from the first string to your new one, making alterations to it will not result in changes in the original str.
Possibly.
Consider this.
char const* get_string() {
string str{"Hello"};
return str.c_str();
}
That function returns a pointer to the internal value of str, which goes out of scope when the function returns. You have a dangling pointer. Undefined behaviour. Watch out for time-travelling nasal monkeys.
Now consider this.
char const* get_string() {
string str{"Hello"};
char const* a = new char[str.length()+1];
strcpy(a, str.c_str());
return a;
}
That function returns a valid pointer to a valid null-terminated C-style string. No dangling pointer. If you forget to delete[] it you will have a memory leak, but that's not what you asked about.
The difference is one of object lifetime. Be aware of scope.

Why does converting a std::smatch to C-string give empty result? [duplicate]

I have a function that is returning a string. However, when I call it and do c_str() on it to convert it into a const char*, it only works when I store it into another string first. If I directly call c_str() off of the function, it stores garbage value in the const char*.
Why is this happening? Feel like I'm missing something very fundamental here...
string str = SomeFunction();
const char* strConverted = str.c_str(); // strConverted stores the value of the string properly
const char* charArray= SomeFunction().c_str(); // charArray stores garbage value
static string SomeFunction()
{
string str;
// does some string stuff
return str;
}
SomeFunction().c_str() gives you a pointer to a temporary(the automatic variable str in the body of SomeFunction). Unlike with references, the lifetime of temporaries isn't extended in this case and you end up with charArray being a dangling pointer explaining the garbage value you see later on when you try to use charArray.
On the other hand, when you do
string str_copy = SomeFunction();
str_copy is a copy of the return value of SomeFunction(). Calling c_str() on it now gives you a pointer to valid data.
The value object returned by a function is a temporary. The results of c_str() are valid only through the lifetime of the temporary. The lifetime of the temporary in most cases is to the end of the full expression, which is often the semicolon.
const char *p = SomeFunction();
printf("%s\n", p); // p points to invalid memory here.
The workaround is to make sure that you use the result of c_str() before the end of the full expression.
#include <cstring>
char *strdup(const char *src_str) noexcept {
char *new_str = new char[std::strlen(src_str) + 1];
std::strcpy(new_str, src_str);
return new_str;
}
const char *p = strdup(SomeFunction.c_str());
Note that strdup is a POSIX function, so if you are a platform that supports POSIX, it's already there.
The "string str" in method SomeFunction() is a local variable in SomeFunction(), and only survives inside the scope of SomeFunction();
Since the return type of the method SomeFunction() is string, not a reference of string, after "return str;", SomeFunction() will return a copy of the value of str, which will be stored as a temporary value in some place of memory, after the call of SomeFunction(), the temporary value will be destroyed immediately;
"string str = SomeFunction();" will store the returned temporary value of SomeFunction() to string str, actually is a copy of that value and stored to str, a new memory block is allocated, and the lifetime of str is bigger than the returned temporary value of SomeFunction(), after the ";" the call of SomeFunction() is finished, and the returned temporary value is destroyed immediately, the memory is recycled by system, but the copy of this value is still stored in str. That is why "const char* strConverted = str.c_str();" can get the right value, actually c_str() returned a pointer of the initial element of str (the first element memory address of str pointed string value), not the returned temporary value of SomeFunction();
"const char* charArray= SomeFunction().c_str();" is different, "SomeFunction().c_str()" will return a pointer of the initial element of the returned temporary value (the first element memory address of returned temporary string value), but after the call of SomeFunction(), the returned temporary value is destroyed, and that memory address is reused by the system, charArray can get the value of that memory address, but not the value you expected;
Use strcpy to copy the string to a locally defined array and your code will work fine.

Undefined behavior of std::string when c_str() used

The following code example behavior is undefined..
char * getName()
{
std::string name("ABCXYZ");
return name.c_str();
}
This is because name goes out of scope. But I wanted to understand how it is different when we return a std::string and does not it produce undefined behavior ?
When you return a value, the value is safely returned to the caller. That's what the return statement does.
In the case where you call c_str, the value you're returning is a pointer into the string. Once the string is destroyed, that pointer now points to nothing in particular. The value is safely returned, it's just that there's nothing you can do with it safely.
The value of a string is the contents of the string. So in that case, it is the contents of the string that gets passed to the caller. One could say that the primary purpose of the std::string class is to provide an object whose value is the contents of a string.
Simply because return instruction copies or moves (see C++11) the object returned.
With this code:
std::string getName() {
std::string name("ABCXYZ");
return name;
}
the string name will be copied and returned to the caller.
With your code, return will make a copy of a pointer (because your function returns a pointer), not of the pointed object. That'll produce an UB.

string and const char* and .c_str()?

I'm getting a weird problem and I want to know why it behaves like that. I have a class in which there is a member function that returns std::string. My goal to convert this string to const char*, so I did the following
const char* c;
c = robot.pose_Str().c_str(); // is this safe??????
udp_slave.sendData(c);
The problem is I'm getting a weird character in Master side. However, if I do the following
const char* c;
std::string data(robot.pose_Str());
c = data.c_str();
udp_slave.sendData(c);
I'm getting what I'm expecting. My question is what is the difference between the two aforementioned methods?
It's a matter of pointing to a temporary.
If you return by value but don't store the string, it disappears by the next sequence point (the semicolon).
If you store it in a variable, then the pointer is pointing to something that actually exists for the duration of your udp send
Consider the following:
int f() { return 2; }
int*p = &f();
Now that seems silly on its face, doesn't it? You are pointing at a value that is being copied back from f. You have no idea how long it's going to live.
Your string is the same way.
.c_str() returns the the address of the char const* by value, which means it gets a copy of the pointer. But after that, the actual character array that it points to is destroyed. That is why you get garbage. In the latter case you are creating a new string with that character array by copying the characters from actual location. In this case although the actual character array is destroyed, the copy remains in the string object.
You can't use the data pointed to by c_str() past the lifetime of the std::string object from whence it came. Sometimes it's not clear what the lifetime is, such as the code below. The solution is also shown:
#include <string>
#include <cstddef>
#include <cstring>
std::string foo() { return "hello"; }
char *
make_copy(const char *s) {
std::size_t sz = std::strlen(s);
char *p = new char[sz];
std::strcpy(p, s);
return p;
}
int
main() {
const char *p1 = foo().c_str(); // Whoops, can't use p1 after this statement.
const char *p2 = make_copy(foo().c_str()); // Okay, but you have to delete [] when done.
}
From c_str():
The pointer obtained from c_str() may be invalidated by:
Passing a non-const reference to the string to any standard library function, or
Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and
rend().
Which means that, if the string returned by robot.pose_Str() is destroyed or changed by any non-const function, the pointer to the string will be invalidated. Since you may be returning a temporary copy to from robot.pose_Str(), the return of c_str() on it shall be invalid right after that call.
Yet, if you return a reference to the inner string you may be holding, instead of a temporary copy, you can either:
be sure it is going to work, in case your function udp_send is synchronous;
or rely on an invalid pointer, and thus experience undefined behavior if udp_send may finish after some possible modification on the inner contents of the original string.
Q
const char* c;
c = robot.pose_Str().c_str(); // is this safe??????
udp_slave.sendData(c);
A
This is potentially unsafe. It depends on what robot.pose_Str() returns. If the life of the returned std::string is longer than the life of c, then it is safe. Otherwise, it is not.
You are storing an address in c that is going to be invalid right after the statement is finished executing.
std::string s = robot.pose_Str();
const char* c = s.c_str(); // This is safe
udp_slave.sendData(c);
Here, you are storing an address in c that will be valid unit you get out of the scope in which s and c are defined.