Is there a dangling pointer problem in this code? - c++

string str;
char *a=str.c_str();
This code is working fine for me but every place else I see this code instead
string str;
char *a=new char[str.length()];
strcpy(a,str.c_str());
I wonder which one is correct and why?

Assuming that the type of str is std::string, neither of the code is are correct.
char *a=str.c_str();
is invalid because c_str() will return const char* and removing const without casting (usually const_cast) is invalid.
char *a=new char[str.length()];
strcpy(a,str.c_str());
is invalid because str.length() don't count the terminating null-character while allocating for terminating null-character is required to use strcpy().
There are no dangling pointer problem in code posted here because no pointers are invalidated here.

The two code segments do different things.
The first assigns the pointer value of str to your new c-tpye string, and implicitly converts from const char*(c_str() return type) to char*, which is wrong. If you were to change your new string you would face an error. Even if c_str() returned char*, altering the new string would also make changes in str.
The second on the other hand creates a new c-type string from the original string, copying it byte-by-byte to the new memory allocated for your new string.
Although the line of code you wrote is incorrect, as it does not cover the terminating null character of a c-type string \0. In order to fix that, allocate 1 extra byte for it:
char *a=new char[str.length()+1];
After copying the data from the first string to your new one, making alterations to it will not result in changes in the original str.

Possibly.
Consider this.
char const* get_string() {
string str{"Hello"};
return str.c_str();
}
That function returns a pointer to the internal value of str, which goes out of scope when the function returns. You have a dangling pointer. Undefined behaviour. Watch out for time-travelling nasal monkeys.
Now consider this.
char const* get_string() {
string str{"Hello"};
char const* a = new char[str.length()+1];
strcpy(a, str.c_str());
return a;
}
That function returns a valid pointer to a valid null-terminated C-style string. No dangling pointer. If you forget to delete[] it you will have a memory leak, but that's not what you asked about.
The difference is one of object lifetime. Be aware of scope.

Related

Casting c_str() only works for short strings

I'm using a C library in C++ and wrote a wrapper. At one point I need to convert an std::string to a c-style string. There is a class with a function, which returns a string. Casting the returned string works if the string is short, otherwise not. Here is a simple and reduced example illustrating the issue:
#include <iostream>
#include <string>
class StringBox {
public:
std::string getString() const { return text_; }
StringBox(std::string text) : text_(text){};
private:
std::string text_;
};
int main(int argc, char **argv) {
const unsigned char *castString = NULL;
std::string someString = "I am a loooooooooooooooooong string"; // Won't work
// std::string someString = "hello"; // This one works
StringBox box(someString);
castString = (const unsigned char *)box.getString().c_str();
std::cout << "castString: " << castString << std::endl;
return 0;
}
Executing the file above prints this to the console:
castString:
whereas if I swap the commenting on someString, it correctly prints
castString: hello
How is this possible?
You are invoking c_str on a temporary string object retuned by the getString() member function. The pointer returned by c_str() is only valid as long as the original string object exists, so at the end of the line where you assign castString it ends up being a dangling pointer. Officially, this leads to undefined behavior.
So why does this work for short strings? I suspect that you're seeing the effects of the Short String Optimization, an optimization where for strings less than a certain length the character data is stored inside the bytes of the string object itself rather than in the heap. It's possible that the temporary string that was returned was stored on the stack, so when it was cleaned up no deallocations occurred and the pointer to the expired string object still holds your old string bytes. This seems consistent with what you're seeing, but it still doesn't mean what you're doing is a good idea. :-)
box.getString() is an anonymous temporary. c_str() is only valid for the length of the variable.
So in your case, c_str() is invalidated by the time you get to the std::cout. The behaviour of reading the pointer contents is undefined.
(Interestingly the behaviour of your short string is possibly different due to std::string storing short strings in a different way.)
As you return by value
box.getString() is a temporary and so
box.getString().c_str() is valid only during the expression, then it is a dangling pointer.
You may fix that with
const std::string& getString() const { return text_; }
box.getString() produces a temporary. Calling c_str() on that gives you a pointer to a temporary. After the temporary ceases to exist, which is immediately, the pointer is invalid, a dangling pointer.
Using a dangling pointer is Undefined Behavior.
First of all, your code has UB independent of the length of the string: At the end of
castString = (const unsigned char *)box.getString().c_str();
the string returned by getString is destroyed and castString is a dangling pointer to the internal buffer of the destroyed string object.
The reason your code "works" for small strings is probably Small String Optimization: Short strings are (commonly) saved in the string object itself instead of being saved in an dynamically allocated array, and apparently that memory is still accesible and unmodified in your case.

in c++ is this a good practice to initialize char array with string literal?

in c++ is this a good practice to initialize char array with string?
such as:
char* abc = (char *) ("abcabc");
I see a lot of these in my co-worker's code. Should I change it to the right practice?
such as
std::string abc_str = "abcabc";
const char* abc= abc_str .c_str();
This statement
char* abc = (char *) ("abcabc");
is simply bad. String literals in C++ have types of constant character arrays. So a valid declaration will look like
const char *abc = "abcabc";
Note: In C you indeed may write
char *abc = "abcabc";
Nevertheless string literals are immutable. Any attempt to modify a string literal results in undefined behaviour.
By the way there is no any character array that is initialized by a string literal.:) Maybe you mean the following
char abc[] = "abcabc";
Using standard class std::string does not exclude using character arrays and moreover pointers to string literals.
Take into account that these declarations
const char *abc = "abcabc";
and
std::string abc_str = "abcabc";
const char* abc= abc_str .c_str();
are not equivalent. Relative to the first declaration string literals have static storage duration and their addresses are not changed during the program execution.
In the second declaration pointer abc points to dynamically allocated memory that can be reallocated if object abc_str will be changed. In this case the pointer will be invalid.
Also the first declaration supposes that the array (string literal) pointed to by the pointer will not be changed. In the second declaration it is supposed that the object of type std::string will be changed. Otherwise there is no great sense to declare an object of type std::string instead of the pointer.
Thus the meanings of the declarations are simply different.
char* abc = (char *) ("abcabc");
That is bad. Don't do it.
You are treating a string literal that is not supposed to be modified like it can be modified.
After that,
abc[0] = 'd';
will be OK by the compiler but not OK at run time. What you need to use is:
char abc[] = "abcabc";
This will create an array that is modifiable.
Both of those are bad.
char* abc = (char*) ("abcabc");
A string literal is a constant and, as such, may be stored in write protected memory. Therefore writing to it can crash your program, it is undefined behaviour.
Rather than cast away the constness you should keep it const and make a copy if you want to edit its contents.
const char* abc = "abcabc";
The other one should be avoided too:
std::string abc_str = "abcabc";
const char* abc = abc_str.c_str();
Keeping it const is good but if the string is changed it could be reallocated to another place in memory leaving your pointer dangling.
Also in pre C++11 code the pointer stops being valid the second it is assigned because there is no guarantee it is not a temporary.
Better to call abc_str.c_str() each time.
The chances are that because c_str() is such a trivial operation it will be optimized away by the compiler making it just as efficient as using the raw pointer.
Instead of both of those what you should be doing is using std::string all the way. If you absolutely need a const char* (for old legacy code) you can obtain it using c_str().
std::string abc_str = "abcabc"; // this is perfect why do more?
old_horrible_function(abc_str.c_str()); // only when needed

string and const char* and .c_str()?

I'm getting a weird problem and I want to know why it behaves like that. I have a class in which there is a member function that returns std::string. My goal to convert this string to const char*, so I did the following
const char* c;
c = robot.pose_Str().c_str(); // is this safe??????
udp_slave.sendData(c);
The problem is I'm getting a weird character in Master side. However, if I do the following
const char* c;
std::string data(robot.pose_Str());
c = data.c_str();
udp_slave.sendData(c);
I'm getting what I'm expecting. My question is what is the difference between the two aforementioned methods?
It's a matter of pointing to a temporary.
If you return by value but don't store the string, it disappears by the next sequence point (the semicolon).
If you store it in a variable, then the pointer is pointing to something that actually exists for the duration of your udp send
Consider the following:
int f() { return 2; }
int*p = &f();
Now that seems silly on its face, doesn't it? You are pointing at a value that is being copied back from f. You have no idea how long it's going to live.
Your string is the same way.
.c_str() returns the the address of the char const* by value, which means it gets a copy of the pointer. But after that, the actual character array that it points to is destroyed. That is why you get garbage. In the latter case you are creating a new string with that character array by copying the characters from actual location. In this case although the actual character array is destroyed, the copy remains in the string object.
You can't use the data pointed to by c_str() past the lifetime of the std::string object from whence it came. Sometimes it's not clear what the lifetime is, such as the code below. The solution is also shown:
#include <string>
#include <cstddef>
#include <cstring>
std::string foo() { return "hello"; }
char *
make_copy(const char *s) {
std::size_t sz = std::strlen(s);
char *p = new char[sz];
std::strcpy(p, s);
return p;
}
int
main() {
const char *p1 = foo().c_str(); // Whoops, can't use p1 after this statement.
const char *p2 = make_copy(foo().c_str()); // Okay, but you have to delete [] when done.
}
From c_str():
The pointer obtained from c_str() may be invalidated by:
Passing a non-const reference to the string to any standard library function, or
Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and
rend().
Which means that, if the string returned by robot.pose_Str() is destroyed or changed by any non-const function, the pointer to the string will be invalidated. Since you may be returning a temporary copy to from robot.pose_Str(), the return of c_str() on it shall be invalid right after that call.
Yet, if you return a reference to the inner string you may be holding, instead of a temporary copy, you can either:
be sure it is going to work, in case your function udp_send is synchronous;
or rely on an invalid pointer, and thus experience undefined behavior if udp_send may finish after some possible modification on the inner contents of the original string.
Q
const char* c;
c = robot.pose_Str().c_str(); // is this safe??????
udp_slave.sendData(c);
A
This is potentially unsafe. It depends on what robot.pose_Str() returns. If the life of the returned std::string is longer than the life of c, then it is safe. Otherwise, it is not.
You are storing an address in c that is going to be invalid right after the statement is finished executing.
std::string s = robot.pose_Str();
const char* c = s.c_str(); // This is safe
udp_slave.sendData(c);
Here, you are storing an address in c that will be valid unit you get out of the scope in which s and c are defined.

Initializing char pointer

I have a function
ValArgument(char* ptr){
char str[] = "hello world";
ptr = &str[0];
}
In this function, I want to init a char array and add it to the char pointer ptr. I call the function like that:
char* ptr= NULL;
ValArgument(ptr);
The pointer returned still has the value NULL. Why? I expected that the pointer will point onto the char array str[].
The pointer returned still has the value NULL. Why?
Because you passed the pointer by value. That means that the function is given a separate copy of the pointer, and any changes it makes to the pointer will not affect the caller's copy.
You can either pass by reference:
void ValArgument(char *& ptr)
// ^
or return a value:
char * ValArgument();
I expected that the pointer will point onto the char array str[].
No; once you've fixed that problem, it will point to the undead husk of the local variable that was destroyed when the function returned. Any attempt to use the pointer will cause undefined behaviour.
Depending on what you need to do with the string, you might want:
a pointer to a string literal, char const * str = "hello world";. Note that this should be const, since string literals can't be modified.
a pointer to a static array, static char str[] = "hello world";. This means that there is only one string shared by everyone, so any modification will affect everyone.
a pointer to a dynamically allocated array. Don't go there.
a string object, std::string str = "hello world";. This is the least error-prone, since it can be passed around like a simple value.

Pointers to char

char* pStr = new String("Hello");
char* s = "Hello";
Is the first one correct? Are there any difference between these two? My guess is that the first one is allocated on the heap,and the other one an the stack.Am i correct or are there any other differences?
The first one is just incorrect and won't compile because there is no such thing as String in either C or C++. The second one will compile, and is fine in C(afaik). In C++, however, the conversion from a string literal to char* is deprecated. You can unintentionally write later s[0] = 'X'; which is undefined behavior.
The correct way of doing it is using const (in C++)
const char * s = "Hello";
or, better, use string
std::string s("Hello");
pStr and s are pointers, so it is important to distinguish between the pointers themselves and the data that they point to.
On the first line, pStr is a pointer to an instance of the String class allocated on the heap. The string data inside this instance is a copy of a literal string "Hello" that is stored in the program's data segment. The copying is done by the String constructor. (You've referred to a String class, but I assume you mean std::string).
On the second line, s is a pointer to data stored in the program's data segment. Data in the data segment is immutable, so s should really be const char *.
There isn't enough information in your example to tell whether pStr and s are stored on the heap or the stack. If they are variables inside a function then they are on the stack. If there are members of a class then they are on the heap if the class was instantiated on the help (using new) or on the stack if it is instantiated as a value.
The line
char* pStr = new std::string("Hello");
will cause a compiler semantic error, because the LHS has a type of char* and the RHS has a type of std::string.
The line
char* s = "Hello"
will compile, but may give a warning, because the LHS has a type of char* and the RHS has a type of const char*.