Why move() of string changes underlying data position in memory? - c++

I'm trying to save some string via string_view to second data container but run into some difficulties.
It turns out that string changes its underlying data storage after move()'ing it.
And my question is, why does it happen?
Example:
#include <iostream>
#include <string>
#include <string_view>
using namespace std;
int main() {
string a_str = "abc";
cout << "a_str data pointer: " << (void *) a_str.data() << endl;
string_view a_sv = a_str;
string b_str = move(a_str);
cout << "b_str data pointer: " << (void *) b_str.data() << endl;
cout << "a_sv: " << a_sv << endl;
}
Output:
a_str data pointer: 0x63fdf0
b_str data pointer: 0x63fdc0
a_sv: bc
Thanks for your replies!

What you are seeing is a consequence of short string optimization. In the most basic sense, there is an array in the string object to save a call to new for small strings. Since the array is a member of the class, it has to have it's own address in each object and when you move a string that is in the array, a copy happens.

The string "abc" is short enough for short string optimization. See What are the mechanics of short string optimization in libc++?
If you change it to a longer string you will see the same address.

Related

How do I print const char?

#include <iostream>
using namespace std;
int main() {
int age = 20;
const char* pDept = "electronics";
cout << age << " " << pDept;
}
The above code is normal.
Why shouldn't I use cout << *pDept instead of cout << pDept above?
Both of them are legal in C++. Which one to use depends on what you want to print.
In your case, pDept is a pointer that points to a char in memory. It also can be used as a char[] terminated with \0. So std::cout << pDept; prints the string the pointer is pointing to.
*pDept is the content that pDept points to, which is the first character of the string. So std::cout << *pDept; prints the first character only.

Passing string 'by value' change in local value reflect in original value

Why is the change of my local variable's value getting reflected into original variable? I am passing it by value in C++.
#include <string>
#include <iostream>
void test(std::string a)
{
char *buff = (char *)a.c_str();
buff[2] = 'x';
std::cout << "In function: " << a;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World
After : Hexlo World
As soon as you wrote
buff[2] = 'x';
and compiled your code all bets were off. Per [string.accessors]
const charT* c_str() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Complexity: constant time.
Requires: The program shall not alter any of the values stored in the character array.
emphasis mine
Since you are not allowed to modify the characters that the pointer points to but you do, you have undefined behavior. The compiler at this point is allowed to do pretty much whatever it wants. Trying to figure out why it did what it did is meaningless as any other compiler might not do this.
The moral of the story is do not cast const away unless you are really sure that you know what you are doing and if you do you need to, then document the code to show you know what you are doing.
Your std::string implementation uses reference counting and makes a deep copy only if you modify the string via its operator[] (or some other method). Casting the const char* return value of c_str() to char* will lead to undefined behavior.
I believe since C++11 std::string must not do reference counting anymore, so switching to C++11 might be enough to make your code work (Edit: I did not actually check that before, and it seems my assumption was wrong).
To be on the safe side, consider looking for a string implementation that guarantees deep copying (or implement one yourself).
#include <cstring>
#include <string>
#include <iostream>
void test(std::string a)
{
// modification trough valid std::string API
a[2] = 'x';
const char *buff = a.c_str(); // only const char* is available from API
std::cout << "In function: " << a << " | Trough pointer: " << buff;
// extraction to writeable char[] buffer
char writeableBuff[100];
// unsafe, possible attack trough buffer overflow, don't use in real code
strcpy(writeableBuff, a.c_str());
writeableBuff[3] = 'y';
std::cout << "\n" << "In writeable buffer: " << writeableBuff;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World | Trough pointer: Hexlo World
In writeable buffer: Hexyo World
After : Hello World

With a string longer than 10 letters, char* changes without any apparent reason. Why?

In step 2 I change only the value of name_C. Why does name_B also change?
Here is the code:
#include <cstdlib>
#include <dirent.h>
#include <iostream>
#include <fstream>
#include <direct.h>
using namespace std;
int main(int argc, char *argv[])
{
// step 1
char *name_A;
char *name_B;
char *name_C;
string str_L = "hello";
string str_M = "stringVar_A"; ;
name_A = (char *) str_M.c_str();
name_B = (char *) (str_L + "-car-" + str_M).c_str();
name_C = (char *) str_L.c_str();
cout << " name_A= " << name_A << endl;
cout << " name_B= " << name_B << endl;
cout << " name_C= " << name_C << endl << endl << endl;
// step 2
string str_N = "myStringMyString"; // (in my real code, i can't put this line in step 1)
string str_R = "ABCDEFGHI" + str_N; // (in my real code, i can't put this line in step 1)
name_C = (char *)str_R.c_str(); // change only name_C
cout << " name_A= " << name_A << endl;
cout << " name_B= " << name_B << endl; // changed, why?
cout << " name_C= " << name_C << endl; // changed, ok.
system("PAUSE");
return EXIT_SUCCESS;
};
Here the output:
(step 1:)
name_A= stringVar_A
name_B= hello-car-stringVar_A
name_C= hello
(step 2:)
name_A= stringVar_A
name_B= ABCDEFGHImyStringMyString
name_C= ABCDEFGHImyStringMyString
With:
string str_N = "myString"; // in step 2...
name_B does not change.
Why does name_B change if str_N is longer than 10 letters?
Can someone help me understand this behavior?
The pointer returned by a call to c_str is only valid for as long as the corresponding std::string stays in scope and is unmodified.
The behaviour in accessing it beyond that is undefined.
For example, (str_L + "-car-" + str_M).c_str(); is returning you the c_str of an anonymous temporary. It will be immediately invalid after the assignment. In your case, name_B is invalidated.
Also, don't cast away the const char* return of c_str(). It's const for a very good reason: you should not attempt to modify the string contents through that pointer.
You are causing undefined behaviour.
name_B = (char *) (str_L + "-car-" + str_M).c_str();
you create a temoprary string from the result of std::string::operator + , extract the C-character array out of it, but then no-one really cathes the temporary string.
when a temporary is not caught by const reference - it is destroyed right away. the string destructor de-allocates the inner character array and invalidates name_B.
so, this is undefined behaviour since you try to work with memory address that is no longer valid.
std::string::c_str() returns a pointer to an internal buffer of a std::string with the guarantee that:
it points to a NUL-terminated string;
the range [c_str(); c_str() + size()] is valid.
In your case, name_B points to an internal buffer of a temporary object name_B = (str_L + "-car-" + str_M).c_str(); which will lead to an Undefined Behaviour when you'll try to use it.
When you make some modifications on your stack (you define two new std::strings), you probably alter the stack place where your name_B points to (since the memory reserved by your temporary has been freed).
If you really have to get old style C-strings from your std::strings, make sure:
The pointer retrieved from std::string::c_str() is no longer used when the std::string is modified or destroyed,
hence you don't call std::string::c_str() from a temporary.

C++ Odd behaviour with ostringstream

Is there any explanation in the standard for the below behavior?
The following code:
#include <sstream>
#include <iostream>
using namespace std;
int main()
{
ostringstream os1;
ostringstream os2;
os1 << 1 << " " << 2;
os2 << 1 << " " << 2 << " " << 3;
const char *p = os1.str().c_str();
cout << os2.str() << endl;
cout << p << endl;
return 0;
}
displays the output:
1 2 3
1 2 3
However, I would expect it to display:
1 2 3
1 2
It looks like os1 object is somehow influenced by os2, if I remove the os2.str() call, example behaves correctly.
I have tried the example if Solaris Studio 12.2 and G++ 4.8.1 and both behave in the same way.
Thanks for your help!
const char *p = os1.str().c_str();
Here is the problem, in the above line.
os1.str() returns a temporary string object by copying the internal string buffer. And you're taking .c_str() of the temporary object, which gets destroyed at the end of the full expression. The result is, p points to the destroyed object when you take it to print using std::cout.
That is, your program invokes undefined behavior (UB).
Try this:
auto s = os1.str();
const char *p = s.c_str(); //s is not a temporary object anymore!
Now it gives correct output (and this is your code — luckily even coliru gives the same output as you observed on your machine. Note that this output is not guaranteed though precisely because the code invokes UB.).
I think the issue has to do with retaining the pointer returned by c_str().
ostringstream::str() is returning a temporary string object, and you are saving a pointer to it's internal char array. But once that line has executed, the returned string object will be deleted. So your pointer will be invalid.
If you want to keep a c_str pointer around for some reason, you would need to also keep a copy of the string:
string s = os1.str();
const char *p = s.c_str();
cout << os2.str() << endl;
cout << p << endl;
The answer to this question is here:
stringstream, string, and char* conversion confusion
stringstream.str() returns a temporary string object that's destroyed at the end of the full expression. If you get a pointer to a C string from that (stringstream.str().c_str()), it will point to a string which is deleted where the statement ends.

Using Struct Stat()

I'm trying to figure out how exactly to use stat() to capture information about a file. What I need is to be able to print several fields of information about a file. So..
#include <iostream>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
using namespace std;
int main() {
struct stat buf;
stat("file",&buf);
...
cout << st_dev << endl;
cout << st_ino << endl;
cout << st_mode << endl;
cout << st_nlink << endl;
cout << st_uid << endl;
cout << st_gid << endl;
cout << st_rdev << endl;
cout << st_size << endl;
cout << st_blksize << endl;
cout << st_blocks << endl;
cout << st_atime << endl;
cout << st_mtime << endl;
cout << st_ctime << endl;
...
}
I'm thoroughly confused about how to do this. Why is &buf a parameter to stat? I don't care about storing this information in memory, I just need the outputted fields within my c++ program. How do I access the information contained in the struct? Is buf actually supposed to contain the returned information from stat()?
Yes, buf is being used here as an out-parameter. The results are stored in buf and the return value of stat is an error code indicating if the stat operation succeeded or failed.
It is done this way because stat is a POSIX function, designed for C, which does not support out-of-band error reporting mechanisms like exceptions. If stat returned a struct, then it would have no way to indicate errors. Using this out-parameter method also allows the caller to choose where they want to store the results, but that's a secondary feature. It's perfectly fine to pass the address of a normal local variable, just like you have done here.
You access the fields of a struct like you would any other object. I presume you are at least familar with object notation? E.g. the st_dev field within the stat struct called buf is accessed by buf.st_dev. So:
cout << buf.st_dev << endl;
etc.
For another project, I've whipped up a little function that does something similiar to what you need. Take a look at sprintstatf.
Here's an example of usage:
#include <sys/stat.h>
#include <stdlib.h>
#include <stdio.h>
#include "sprintstatf.h"
int
main(int argc, char *argv[])
{
char *outbuf = (char *)malloc(2048 * sizeof(char));
struct stat stbuf;
char *fmt = \
"st_atime (decimal) = \"%a\"\n"
"st_atime (string) = \"%A\"\n"
"st_ctime (decimal) = \"%c\"\n"
"st_ctime (string) = \"%C\"\n"
"st_gid (decimal) = \"%g\"\n"
"st_gid (string) = \"%G\"\n"
"st_ino = \"%i\"\n"
"st_mtime (decimal) = \"%m\"\n"
"st_mtime (string) = \"%M\"\n"
"st_nlink = \"%n\"\n"
"st_mode (octal) = \"%p\"\n"
"st_mode (string) = \"%P\"\n"
"st_size = \"%s\"\n"
"st_uid = \"%u\"\n"
"st_uid = \"%U\"\n";
lstat(argv[1], &stbuf);
sprintstatf(outbuf, fmt, &stbuf);
printf("%s", outbuf);
free(outbuf);
exit(EXIT_SUCCESS);
}
/* EOF */
This question may be way to old to comment but i am posting this as a reference
To get a good understanding about stat() function ,the reason for passing the stat reference and more importantly error handling are explained good in the below link
stat - get file status
You have several errors in your code:
You need &buf, with a single 'f'.
You need to say e.g. buf.st_dev when printing, since st_dev is a field in the struct variable.
Since buf is a local variable on the stack, you're not "saving the values to memory" permanently, it's just as long as that variable is in-scope.
This is how you return multiple values, typically, in C and C++. You pass a pointer to a structure, and the function being called fills in the structure with the values it has computed for you.
buf is the structure that stat loads with the information about the file you pass in the first parameter. You pass &buf here b/c you have buf allocated on the stack as a local variable and you must pass a pointer to the stat function to enable it to load the data.
All variables of st_* are part of the struct stat object and thus must be accessed via your local buf variable as buf.st_uid, etc.
Similar thing is with ctime library. Is designed similar way.
First is to create empty struct.
You have access to object of the struct, but all fields are empty.
Then You use that function (&name-of-created-obiect) and is an adrres to point obiect outside of that function.
Function is designed to store all info to that struct obiect from given reference, and kaboom, you have obiect with ready data to use.
Otherwise, if You don't want use pointer, then you must use
Obiect = function(null);
With pointer
Function(&obiect);