Is a object of std::string really movable? - c++

as we known, a movable object is one would not be copied deeply when it be assigned to another one of same type. By this way, we can save a lot of time.
But today, I found a phenomenon stange to me. Please view code as following.
#include <string>
#include <iostream>
int main() {
std::string s1 = "s1";
std::string s2 = "s2";
std::cout << " s1[" << ( void* ) &s1[0] << "]:" + s1
<< ", s2[" << ( void* ) &s2[0] << "]:" + s2
<< std::endl;
s1.swap( s2 );
std::cout << " s1[" << ( void* ) &s1[0] << "]:" + s1
<< ", s2[" << ( void* ) &s2[0] << "]:" + s2
<< std::endl;
s2 = std::move(s1);
std::cout << " s1[" << ( void* ) &s1[0] << "]:" + s1
<< ", s2[" << ( void* ) &s2[0] << "]:" + s2
<< std::endl;
return EXIT_SUCCESS; }
After moving, although the contents of strings have been changed, but the address that really storing the data of a string has not been changed.
If the memory addesses would not be changed, can we have a reason to confirm that in fact a deeply copy will be performed instead of just only assigen a pointer to target's member?
thanks!
Leon

a movable object is one would not be copied deeply when it be assigned
to another one of same type
Only if it makes sense. in the following snippet
int i0 = 11;
int i1 = std::move(i0);
there will be no "stealing" simply because there is nothing to steal. so the premise of the question is flawed - a move operation would "steal" the content of the movee if it makes sense to do so.
Also note that in the C++ world, unlike Java and C#, an object is everything that occupies memory - integers, pointers, characters - all of them are objects.
std::string uses an optimization technique called "short string optimization" or SSO. If the string is short enough (and "short enough" is implementation defined), no buffer is dynamically allocated and hence nothing to "steal". when such short string is moved, the content of the string is so short it's just copied to the moved-into string without messing with dynamically allocated buffers.

Related

Cap'n Proto - De-/Serialize struct to/from std::string for storing in LevelDB

I want to store some Capnproto struct in a LevelDB, so I have to serialize it to string and deserialize it back from a std::string later. Currently, I play around with the following (adapted from here: https://groups.google.com/forum/#!msg/capnproto/viZXnQ5iN50/B-hSgZ1yLWUJ):
capnp::MallocMessageBuilder message;
WortData::Builder twort = message.initRoot<WortData>();
twort.setWid(1234);
twort.setW("Blabliblub");
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
std::cout << data << std::endl;
const kj::ArrayPtr<const capnp::word> view(
reinterpret_cast<const capnp::word*>(&(*std::begin(data))),
reinterpret_cast<const capnp::word*>(&(*std::end(data))));
capnp::FlatArrayMessageReader message2(view);
WortData::Reader wortRestore = message2.getRoot<WortData>();
std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;
And it basically works, but the people in the link above were unsure if this approach will cause errors later and since the discussion is pretty old, I wanted to ask if there's a better way.
Someone in the end said something like "use memcpy!", but I'm not sure if that's useful and how to do this with the array types needed for FlatArrayMessageReader.
Thanks in advance!
dvs23
Update:
I tried to implement the suggestion related to the word-aligning:
capnp::MallocMessageBuilder message;
WortData::Builder twort = message.initRoot<WortData>();
twort.setWid(1234);
twort.setW("Blabliblub");
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
std::cout << data << std::endl;
if(reinterpret_cast<uintptr_t>(data.data()) % sizeof(void*) == 0) {
const kj::ArrayPtr<const capnp::word> view(
reinterpret_cast<const capnp::word*>(&(*std::begin(data))),
reinterpret_cast<const capnp::word*>(&(*std::end(data))));
capnp::FlatArrayMessageReader message2(view);
WortData::Reader wortRestore = message2.getRoot<WortData>();
std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;
}
else {
size_t numWords = data.size() / sizeof(capnp::word);
if(data.size() % sizeof(capnp::word) != 0) {
numWords++;
std::cout << "Something wrong here..." << std::endl;
}
std::cout << sizeof(capnp::word) << " " << numWords << " " << data.size() << std::endl;
capnp::word dataWords[numWords];
std::memcpy(dataWords, data.data(), data.size());
kj::ArrayPtr<capnp::word> dataWordsPtr(dataWords, dataWords + numWords);
capnp::FlatArrayMessageReader message2(dataWordsPtr);
WortData::Reader wortRestore = message2.getRoot<WortData>();
std::cout << wortRestore.getWid() << " " << std::string(wortRestore.getW()) << std::endl;
}
The linked conversation is still accurate to the best of my knowledge. (Most of the messages on that thread are me, and I'm the author of Cap'n Proto...)
It's very likely that the buffer backing any std::string will be word-aligned in practice -- but it is not guaranteed. When reading from a std::string, you should probably check that the pointer is aligned (e.g. by reinterpret_cast<uintptr_t>(str.data()) % sizeof(void*) == 0). If aligned, you can reinterpret_cast the pointer to capnp::word*. If not aligned, you'll need to make a copy. In practice the code will probably never make a copy because std::string's backing buffer is probably always aligned.
On the writing end, avoiding copies is trickier. Your code as you've written it actually makes two copies.
One here:
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
And one here:
std::string data(bytes.begin(), bytes.end());
It looks like LevelDB supports a type called Slice, which you can use instead of std::string when writing, to avoid the second copy:
leveldb::Slice data(bytes.begin(), bytes.size());
This will reference the underlying bytes rather than make a copy, and should be usable in all the LevelDB write functions.
Unfortunately, one copy is unavoidable here, because LevelDB wants the value to be one contiguous byte array, whereas a Cap'n Proto message may be broken into multiple segments. The only way to avoid this would be for LevelDB to add support for "gather writes".

Passing string 'by value' change in local value reflect in original value

Why is the change of my local variable's value getting reflected into original variable? I am passing it by value in C++.
#include <string>
#include <iostream>
void test(std::string a)
{
char *buff = (char *)a.c_str();
buff[2] = 'x';
std::cout << "In function: " << a;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World
After : Hexlo World
As soon as you wrote
buff[2] = 'x';
and compiled your code all bets were off. Per [string.accessors]
const charT* c_str() const noexcept;
Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Complexity: constant time.
Requires: The program shall not alter any of the values stored in the character array.
emphasis mine
Since you are not allowed to modify the characters that the pointer points to but you do, you have undefined behavior. The compiler at this point is allowed to do pretty much whatever it wants. Trying to figure out why it did what it did is meaningless as any other compiler might not do this.
The moral of the story is do not cast const away unless you are really sure that you know what you are doing and if you do you need to, then document the code to show you know what you are doing.
Your std::string implementation uses reference counting and makes a deep copy only if you modify the string via its operator[] (or some other method). Casting the const char* return value of c_str() to char* will lead to undefined behavior.
I believe since C++11 std::string must not do reference counting anymore, so switching to C++11 might be enough to make your code work (Edit: I did not actually check that before, and it seems my assumption was wrong).
To be on the safe side, consider looking for a string implementation that guarantees deep copying (or implement one yourself).
#include <cstring>
#include <string>
#include <iostream>
void test(std::string a)
{
// modification trough valid std::string API
a[2] = 'x';
const char *buff = a.c_str(); // only const char* is available from API
std::cout << "In function: " << a << " | Trough pointer: " << buff;
// extraction to writeable char[] buffer
char writeableBuff[100];
// unsafe, possible attack trough buffer overflow, don't use in real code
strcpy(writeableBuff, a.c_str());
writeableBuff[3] = 'y';
std::cout << "\n" << "In writeable buffer: " << writeableBuff;
}
int main()
{
std::string s = "Hello World";
std::cout << "Before : "<< s << "\n" ;
test(s);
std::cout << "\n" << "After : " << s << std::endl;
return 0;
}
Output:
Before : Hello World
In function: Hexlo World | Trough pointer: Hexlo World
In writeable buffer: Hexyo World
After : Hello World

C++11 type-conversion heisenbug when using std::string::c_str()

This is very strange.
OSX 10.10
LLVM 6.0
XCode 6.1
test_assert("Wierd", String{"ABC"}, "ABC" ); // claims not equal
String is my custom class (wrapping a Python String primitive), and it should pass this test.
Here's test_assert, with added debug output:
template <typename B, typename V>
static void test_assert( std::string description, B benchmark, V value )
{
std::ostringstream full_description;
full_description << description
<< " : { " << "benchmark" << ", " << "value" << " }"
<< " = { " << typeid(B).name() << ", " << typeid(V).name() << " }"
<< " , { " << benchmark << ", " << value << " }";
// N2Py6StringE, PKc i.e. Py::String and const char* (Pointer to Konst Char)
std::cout << typeid(B).name() << ", " << typeid(V).name() << std::endl;
V b_as_v{static_cast<V>(benchmark)};
// wtf? b_as_v: \352\277_\377 -- should be "ABC"
std::cout << "b_as_v: " << b_as_v << std::endl; // Y
if( b_as_v == value )
std::cout << " PASSED: " << full_description.str() << std::endl;
else
throw TestError( full_description.str() );
}
It is this b_as_v{static_cast<V>(benchmark)}; that is throwing me, because if I single step into it, it correctly takes me to String's 'convert to const char*' operator, which performs its duty correctly:
class String : Object {
explicit operator const char*() const
{
std::string s{ as_std_string() };
const char* c{ s.c_str() };
// c before return: ABC
std::cout << "c before return: " << c << std::endl; // X
return c;
}
:
Now this is the weird thing: if line X is in place, line Y reports nothing: 'b_as_v: '
Removing it, line Y reports the original: 'b_as_v: \352\277_\377'
In fact, just printing std::cout << std::endl; // X' for X is sufficient to clear output from Y (however, moving X' to immediately in front of Y restores the original behaviour).
So it seems that the act of observation modifies the return value.
A heisenbug >:|
And neither behaviour is the desired one.
Another weirdness is that there is an extra Unicode character that copies to my clipboard at the end of '\352\277_\377' if I copy paste from Xcode's console to the SO text edit window.
Even if I only select the last 7 it still copies across, even though it doesn't take up a whitespace in Xcode's console.
(This extra character doesn't show up on the SO question, in fact it is no longer there when I reopen the question for editing. It isn't a
newline character --I've tested copy-paste-ing in the last character of a particular line)
I have tried to create a testcase, but it performs sadly as I would expect: http://ideone.com/gbyU6Y
A fairly complicated setup, but the cause is rather straightforward:
explicit operator const char*() const
{
std::string s{ as_std_string() };
const char* c{ s.c_str() };
// c before return: ABC
std::cout << "c before return: " << c << std::endl; // X
return c;
}
The pointer returned by std::string::c_str() points into the std::string's internal storage, and so can be invalidated for a number of reasons - the destruction of the std::string object being one of them. Here, c is invalidated as soon as your conversion function returns and s is destroyed, meaning that a dangling pointer is returned.
Also, libc++ uses the small-string optimization, meaning that a string as short as "ABC" is stored inside the std::string object itself (in this case, on the stack), rather than in dynamically allocated storage. This makes it much more likely that the space that used to be occupied by the string could be reused before your code attempt to print it.

Fast approach to wrapping data in a struct/class

EDIT: The main intent is to allow manipulating underlying data as part of an encapsulated struct as opposed to direct data manipulation.
Which of the following approaches is recommended when it comes to wrapping some data inside a struct:
Keep a pointer to the data within the struct:
new s(buf), which stores buf in a local field (s->buf = buf)
reinterpret_cast-ing a memory address to a struct:
reinterpret_cast<s*>(buf)
Use the new operator against the memory address where the data is located:
new(buf) s;
Here is a sample program for these approaches:
#include <iostream>
using namespace std;
struct s {
int* i;
s(int* buf) : i(buf) {}
int getValue() { return *i * 2; }
};
struct s2 {
int i;
int getValue() { return i * 2; }
};
int main() {
int buf = 10;
s a(&buf);
cout << "value: " << a.getValue() << ", size: " << sizeof(a) << ", address: " << &a << ", buf-address: " << &buf << endl;
s2* a2 = new(&buf) s2;
cout << "value: " << a2->getValue() << ", size: " << sizeof(*a2) << ", address: " << a2 << ", buf-address: " << &buf << endl;
s2* a3 = reinterpret_cast<s2*>(&buf);
cout << "value: " << a3->getValue() << ", size: " << sizeof(*a3) << ", address: " << a3 << ", buf-address: " << &buf << endl;
}
And the output:
value: 20, size: 4, address: 0027F958, buf-address: 0027F964
value: 20, size: 4, address: 0027F964, buf-address: 0027F964
value: 20, size: 4, address: 0027F964, buf-address: 0027F964
Both size & time are important. Also, maintainability is important, e.g. someone might add by mistake a virtual function to s2 (which will mess up the data alignment).
Thanks!
None of those are even remotely good ideas, although the first one is passable with some modifications. reinterpret_cast doesn't work the way you think it does, and I'm not sure what exactly you're trying to achieve with placement new. Store a smart pointer of some sort in the first one to avoid the obvious issues with lifetime, and the first option isn't bad.
There is a fourth option: just store the data in a struct, and provide whatever encapsulated access you desire.
struct data {
data(int i_) : i(i_) { }
int i;
};
struct s {
s(int i_) : i(i_) { }
data i;
};
Rereading your question, it appears as though maybe your intent is for this struct to be an internal detail of some larger object. In that case, the lifetime issues with the first solution are likely taken care of, so storing a raw pointer is less of a bad idea. Absent additional details, though, I still recommend the fourth option.
Placement new will still call the constructor, wiping out anything that's in the buffer already if such a constructor exists (or is created unknowingly in the future) so I don't think that's a safe option. reinterpret_cast is undefined behavior even though it may appear to work for you. Storing a local pointer seems to be the best option although you've only given a very tiny inkling of what you're trying to do.
If you're attempting serialization here, remember important issues like sizeof(int) and endianness.
Using reinterpret_cast with anything other than char* is an undefined behavior. So 2/ is obviously out.
1 and 3 are OK but 1/ is the most straightforward.
You may prefer to encasulate data, you may want to use a (void*) pointer to a struct or class, these allows type encapsulation, and allow to extend the data in your struct, in next versions of your code:
struct HiddenCode{
int Field1;
char Field2;
};
void transferData(void* anyptr)
{
// this method "knows" that "anyptr" is a "HiddenCode*"
HiddenCode* MyHiddenCode = (void*) anyptr;
// do something else
}
void main()
{
HiddenCode* MyHiddenCode = new HiddenCode();
MyHiddenCode->Field1 = 5;
MyHiddenCode->Field2 = '1';
void* anyptr = (void*)MyHiddenCode;
transferData(anyptr);
}
Cheers.

TOUGH: Dealing with deeply nested pointers in C++

I define this structure:
struct s_molecule
{
std::string res_name;
std::vector<t_particle> my_particles;
std::vector<t_bond> my_bonds;
std::vector<t_angle> my_angles;
std::vector<t_dihedral> my_dihedrals;
s_molecule& operator=(const s_molecule &to_assign)
{
res_name = to_assign.res_name;
my_particles = to_assign.my_particles;
my_bonds = to_assign.my_bonds;
my_angles = to_assign.my_angles;
my_dihedrals = to_assign.my_dihedrals;
return *this;
}
};
and these structures:
typedef struct s_particle
{
t_coordinates position;
double charge;
double mass;
std::string name;
std::vector<t_lj_param>::iterator my_particle_kind_iter;
s_particle& operator=(const s_particle &to_assign)
{
position = to_assign.position;
charge = to_assign.charge;
mass = to_assign.mass;
name = to_assign.name;
my_particle_kind_iter = to_assign.my_particle_kind_iter;
return *this;
}
} t_particle;
struct s_bond
{
t_particle * particle_1;
t_particle * particle_2;
std::vector<t_bond_param>::iterator my_bond_kind_iter;
s_bond& operator=(const s_bond &to_assign)
{
particle_1 = to_assign.particle_1;
particle_2 = to_assign.particle_2;
my_bond_kind_iter = to_assign.my_bond_kind_iter;
return *this;
}
};
and then in my code I return a pointer to an s_molecule (typedef'd to t_molecule, but still).
Using this pointer I can get this code to work:
for (unsigned int i = 0;
i < current_molecule->my_particles.size();
i++)
{
std::cout << "Particle "
<< current_molecule->my_particles[i].name << std::endl
<< "Charge: "
<< current_molecule->my_particles[i].charge << std::endl
<< "Mass: "
<< current_molecule->my_particles[i].mass << std::endl
<< "Particle Kind Name: "
<< (*current_molecule->my_particles[i].my_particle_kind_iter).atom_kind_name
<< std::endl
<< "x: " << current_molecule->my_particles[i].position.x
<< " y: " << current_molecule->my_particles[i].position.y
#ifdef USE_3D_GEOM
<< "z: " << current_molecule->my_particles[i].position.z
#endif
<< std::endl;
}
If I replace it with:
for (std::vector<t_particle>::iterator it = current_molecule->my_particles.begin();
it !=current_molecule->my_particles.end();
it++)
{
std::cout << "Particle "
<< (*it).name << std::endl
<< "Charge: "
<< (*it).charge << std::endl
<< "Mass: "
<< (*it).mass << std::endl
<< "Particle Kind Name: "
<< (*(*it).my_particle_kind_iter).atom_kind_name
<< std::endl
<< "x: " << (*it).position.x
<< " y: " << (*it).position.y
#ifdef USE_3D_GEOM
<< "z: " << (*it).position.z
#endif
<< std::endl;
}
I now get nasty segfaults...
Not to put too much here, but I'm also getting segfaults when I tried to do this:
std::cout << "Bond ATOMS : "
<< (*current_molecule).my_bonds[0].particle_1->name
<< std::endl
Again, current_molecule is a pointer to a s_molecule structure, which contains arrays of structures, which in turn either directly have vars or are pointers. I can't get these multiple layers of indirection to work. Suggestions on fixing these segfaults.
FYI I'm compiling on Linux Centos 5.4 with g++ and using a custom makefile system.
#sbi Thanks for the good advice! I believe you are right -- the assignment overloaded operator is unnecessary and should be scrapped.
I've followed the approach of commenting out stuff and am very confused. Basically in the function that passes the pointer to my particular molecule to the main function to print, I can see all the data in that molecule (bonds, particles, name, etc) perfectly, printing with cout's.
Once I pass it to the main as a ptr, if I use that ptr with an iterator I get a segfault. In other words. Also for some reason the bond data (which I can freely print in my funct that returns to the pointer) also segfaults if I try to print it, even if I use the [] to index the vector of bonds (which works for the particle vector).
That's the best info I can give for now.
A wild guess: Are you using shared libraries. I remember having difficulties passing STL-containers back and forth across shared library boundaries.
Jason (OP) was asked in a comment by David Rodríguez:
Are you returning a pointer to a local variable?
Jason answered:
No its a ptr to a class variable. The class is very much in existence (it contains the function that returns the molecule).
Unless you're talking of a true class variable (qualified as static), the fact that the class exists doesn't have much to do with it. Instances of a class exist, and they might have ceased to exist even if you just called a function on them.
As such, the question is:
Does the instance of the class that returned the pointer current_molecule still exist?
Or is current_molecule qualified as static, i.e. being a true class variable?
If the answer to both questions is "no", you're in Undefined County.
At this point, it becomes very important that you post source code that can be used by us here to actually reproduce the problem; it might well be located in source you aren't showing us.
Again, this issue was answered here:
Weird Pointer issue in C++
by DeadMG. Sorry for the double post.