I found unexpected (at least for me) behavior.
class A
{
char _text[100];
char* _beg;
char* _end;
public:
explicit A(char* text, size_t tsize) : _beg(&text[0]), _end(&text[std::min(tsize, 99)])
{
memcpy(_text, text, std::min(tsize, 99));
*_end = '\0';
}
inline std::string get_text()
{
return std::move(std::string(_beg, _end));
}
};
After that somewhere in code I do that:
A* add_A(A&& a)
{
list_a.push_back(std::move(a));
return &(list_a.back());
}
std::list<A> list_a;
{
add_A(A("my_text", 7));
list_a.back().get_text(); //returns "my_text"
}
list_a.back().get_text(); //returns trash
As only I move this class (using std::move), and call get_text() of object that was moved, I get trash, as if after movement address of variable _text changed, and so _beg and _end points to nowhere.
Does address of variables really can be changes after std::move (I thought move don't really move object, it was invented for that)?
If it can be changed, what is usual pattern to handle it (to change pointers accordingly)?
If it can't be change, may that behavior happens because I try to move such object to std::list (and so there somehow happens copying, it changes address of variables and makes pointers point to wrong positions)?
Moving in C++ is just a specialized form of copy, where you modify the data in the object being moved from. That's how unique_ptr works; you copy the pointer from one unique_ptr object to the other, then set the original value to NULL.
When you move an object, you are creating a new object, one who gets its data from another object. The address of members don't "change"; it's simply not the same object.
Because you didn't write a copy/move constructor, that means the compiler will write one for you. And all they do is copy each element. So the newly moved-to object will have pointers that point to the old object.
An object that is about to be destroyed.
It's like moving into a house that happens to look identical to your old one. No matter how much it looks like your old house, it isn't. You still have to change your address, since it's a new house. So too must the addresses of _beg and _end be updated.
Now, you could create a move constructor/assignment operator (along with a copy constructor/assignment operator) to update your pointers. But quite frankly, that's just wallpapering over bad design. It's not a good idea to have pointers to subobjects within the same object if you can help it. Instead of begin/end pointers, just have an actual size:
class A
{
char _text[100];
size_t _size;
public:
explicit A(char* text, size_t tsize) : _size(tsize)
{
strncpy(_text, text, 100);
}
inline std::string get_text()
{
return std::string(_text, _size); //Explicit `move` call is unnecessary
}
};
This way, there is no need to store begin/end pointers. Those can be synthesized as needed.
std::move has no moving parts, it simply promotes the input parameter to an rvalue reference -- remember that inside the body of foo(T&& t) { ... } the use of t by name evaluates as an lvalue (reference to rvalue).
inline std::string get_text()
{
return std::move(std::string(_beg, _end));
}
Breaking this down:
std::string(_beg, _end);
creates an anonymous, temporary std::string object constructed from _beg to _end. This is an rvalue.
std::move(...);
forcibly promotes this to an rvalue reference and prevents the compiler from performing return-value optimization. What you want is
return std::string(_beg, _end);
See assembly code comparison
You probably also want to use
list_a.emplace_back(std::move(a));
Unfortunately, there are two flaws in this approach.
The simpler is that the term moving can be a bit misleading, it sounds very one way. But in practice it is often a two way swap: the two objects exchange properties so that when the temporary object goes out of scope it performs cleanup of whatever the other object previously owned:
struct S {
char* s_;
S(const char* s) : s_(strdup(s)) {}
~S() { release(); }
void release() { if (s_) free(s_); }
S(const S& s) : s_(strdup(s.s_)) {}
S(S&& s) : s_(s.s_) { s.s_ = nullptr; }
S& operator=(const S& s) { release(); s_ = strdup(s); return *this; }
S& operator=(S&& s) { std::swap(s_, s.s_); return *this; }
};
Note this line:
S& operator=(S&& s) { std::swap(s_, s.s_); return *this; }
When we write:
S s1("hello");
s1 = S("world");
the second line invokes the move-assignment operator. The pointer for the copy of hello is moved into the temporary, the temporary goes out of scope and is destroyed, the copy of "hello" is freed.
Doing this swap with your array of characters is significantly less efficient than a one-way copy would be:
struct S {
char s_[100];
S(const S& s) {
std::copy(std::begin(s.s_), std::end(s.s_), std::begin(s_));
}
S(S&& s) {
char t_[100];
std::copy(std::begin(s.s_), std::end(s.s_), std::begin(t_));
std::copy(std::begin(s_), std::end(s_), std::begin(s.s_));
std::copy(std::begin(t_), std::end(t_), std::end(s_));
}
};
You don't have to do this, the rvalue parameter only needs to be in a safe to destroy state, but the above is what the default move operators are going to do.
The disasterous part of your code is that the default move operator is naive.
struct S {
char text_[100];
char *beg_, *end_;
S() : beg_(text_), end_(text_ + 100) {}
};
Consider the following copy-construction:
S s(S());
What does s.beg_ point to?
Answer: it points to S().text_, not s.text_. You would need to write a copy constructor that copied the contents of text_ and then pointed its own beg_ and end_ to its own text_ rather than copying the source values.
The same problem occurs with the move operator: it will move the contents of text_ but it will also move the pointers, and have no clue that they are relative.
You'll either need to write copy/move constructors and assignment operators, or you could consider replacing beg_ and end_ with a single size_t size value.
But in either case, move is not your friend here: you're not transferring ownership or performing a shallow copy, all of your data is inside your object.
Related
MY Previous question:
How's the Copy & Swap idiom really supposed to work, seriously! My code fails
In the code below, I need the variable auto ptr to remain valid and the assertion to pass.
auto ptr = a.data();
Looks like this:
+--------------+
| a.local_data | --\
+--------------+ \ +-------------+
>--> | "Some data" |
+-----+ / +-------------+
| ptr | -----------/
+-----+
#include <iostream>
#include <cassert>
using namespace std;
class Data
{
private:
char* local_data;
int _size = 0;
inline int length(const char* str)
{
int n = 0;
while(str[++n] != '\0');
return n;
}
public:
Data() {
local_data = new char[_size];
}
Data(const char* cdata) : _size { length(cdata) }{
local_data = new char[_size];
std::copy(cdata, cdata + _size, local_data);
}
int size() const { return _size; }
const char* data() const { return local_data; }
void swap(Data& rhs) noexcept
{
std::swap(_size, rhs._size);
std::swap(local_data, rhs.local_data);
}
Data& operator=(const Data& data)
{
Data tmp(data);
swap(tmp);
return *this;
}
};
int main()
{
Data a("Some data");
auto ptr = a.data(); // Obtains a pointer to the original location
a = Data("New data");
assert(ptr == a.data()); // Fails
return 0;
}
EDIT: To GIVE some perspective, the following runs perfectly well with the Standard C++ String class.
#include <iostream>
#include <string>
#include <cassert>
int main()
{
std::string str("Hello");
auto ptr = str.data();
str = std::string("Bye!");
assert(ptr == str.data());
std::cin.get();
return 0;
}
And, I am trying to achieve the same functionality.
In terms of correctness, contrary to what some comments indicate, your assignment operator looks correct for copy/swap:
Data& operator=(const Data& data)
{
// Locally this code is fine
Data tmp(data);
swap(tmp);
return *this;
}
It makes a copy of the data into tmp, swaps with it. Thus, the current object's new state is a copy of the data, and the object's old state is inside tmp and should be cleaned up in its destructor. This is exception safe.
However, it depends on two key things that you failed to do (as the comments did point out in part):
a non-throwing destructor that cleans up the old state. You omitted this, and it is crucial for proper management of the resources this object owns.
~Data()
{
delete [] local_data;
}
note: you don't need to set it to nullptr, and don't need to check for nullptr, because deleting a null pointer is a noop, and once the destructor begins running, the object ceases to exist (lifetime is over) and so it should never be read again or your program has undefined behavior.
You did not write a copy constructor.
When you don't write a proper copy constructor, the compiler generates one for you that does an element-wise copy. That means you end up with a copy of the pointer, not a copy of the data to which it points! That is an aliasing bug because both objects will point to (and logically "own") the same memory. Whichever is destroyed first will delete the memory and corrupt the memory to which the other still points. Fortunately, a copy constructor is easy to make for your class:
Data(const Data& other) :
local_data{new char[other._size]}
_size{other._size},
{
std::copy(other.local_data, other.local_data + _size, local_data);
}
Things to observe about this copy constructor:
if new[] throws, nothing is leaked. copy() cannot throw. This is exception safe.
the order of initialization is not the order listed in the constructor, but the order the data members are declared in the class. Thus, local_data will be initialized before _size, and so it's important to use other._size for the new expression.
The copy/swap idiom is clean, and concise, and can lead to exception safe code. However, it does have some overhead, as it makes an extra object off to the side, and does the work to swap with it. The benefit of this idiom is when multiple operations can throw exceptions, and you want an "all or nothing" assignment. In your particular class, the only thing that can throw is the allocation of local_data in operator=, and so it is not really necessary to use this idiom in this class.
I think your code should be ok after adding these functions. In this case, you would benefit from a move constructor too, and move assignment too, since copying from an rvalue can be optimized, since we know the temporary is about to be destroyed when the assignment completes, we can "steal" its allocation and not have to create one of our own. This is fast, and also exception safe:
Data(Data&& other) :
local_data{other._local_data}
_size{other._size},
{
// important! This prevents other's destructor from
// deleting the allocation we just pilfered from it.
// Note, other's size and pointer are inconsistent, but it's
// about to be destroyed, so it doesn't matter. If it did,
// then swap both members, but that's needless more work
// in this case.
other._local_data = nullptr;
}
Data& operator=(Data&& other) {
_size = other._size;
swap(local_data, other.local_data);
return *this;
}
[updated to address this]
As for your main() function, the assertion does not look reasonable.
int main()
{
Data a("Some data");
auto ptr = a.data(); // Obtains a pointer to the original location
a = Data("New data");
assert(ptr == a.data()); // ????
return 0;
}
After you assign to a, the pointer should be different, and you should be asserting that the pointers are NOT the same. But in this case, ptr will be pointing to the old address that a held, which has been deleted by the time you get to the assertion. Storing pointers to object internals while modifying those objects is one of the basic recipes for errors.
One last thing: if you write an operator=, or a custom constructor, you almost always need a custom destructor. Always think of these three together as a special relationship. This was called the "Rule of Three": if you write any of them, you almost certainly must write all of them. The rule was expanded to the "Rule of Five" (after c++11) to include move constructors and move assignment. You should read up on these rules, and always think of these special member functions together. Another one to consider (not for this class, but in class design in general) is the best one, the Rule of Zero.
Thee problem here is to understand if the copy or move constructor was called when initializing a vector by a return object of a function.
Checking the mallocs with a profiler shows similar memcopies in both cases. Why?
We have a class of type "message". The class provides a function "data_copy" which returns the contents of the "message" as vector.
There are 2 options that I tried.
One is to use directly the copy constructor to initialize a new vector.
std::vector<uint8_t> vector1 ( message.data_copy() );
The second option was to try to avoid the extra copy and do
std::vector<uint8_t> vector1 ( std::move( message.data_copy() ) );
For reference I attach what data_copy() does.
std::vector<uint8_t> message::data_copy(void) const
{
std::vector<uint8_t> d(this->size());
copy_data_to_buffer(d.data());
return d;
}
void message::copy_data_to_buffer(uint8_t* buffer) const
{
DEBUG_LOG("copy_data_to_buffer");
for(const fragment* p = &head; p != nullptr; p = p->next)
{
memcpy(buffer, p->data[0], p->size[0]);
buffer += p->size[0];
if(p->size[1])
{
memcpy(buffer, p->data[1], p->size[1]);
buffer += p->size[1];
}
}
}
Finally, by using a profiler I compare the amount of malloc calls. While one would expect that the move constructor would avoid the extra memcopy in reality they are the same in both cases.
You are using the move constructor both times. The result of data_copy is a temporary. Then you construct the vector with this temporary argument. So it's a move.
The second time, you are basically casting the thing that's already an rvalue reference to an rvalue reference, and so the move constructor is used again. There should be absolutely no difference between the behavior of both of them.
I think, maybe, you are misunderstanding what a temporary is and what an rvalue reference is. It's not often you have to use ::std::move and if you are using it, you should look carefully at the code you're using it in to make sure you're doing the right thing.
This code on Godbolt is proof that the copy constructor is never called. Here is the code being linked to:
#include <utility>
struct ICantBeCopied {
ICantBeCopied(ICantBeCopied const &) = delete;
ICantBeCopied(ICantBeCopied &&) {}
ICantBeCopied() {}
};
ICantBeCopied make_a_copy()
{
ICantBeCopied c;
return c;
}
void a_function()
{
ICantBeCopied a{make_a_copy()};
ICantBeCopied b{::std::move(make_a_copy())};
}
I have a rapidjson wrapper that does the following:
class ADocument
{
void setJson(const char *data) { m_D.parse(data); }
AData operator[](const char *key) const
{
const rapidjson::Value *value = rapidjson::Pointer(key).Get(m_D);
if(value)
return AData(value);
else
return AData(&m_D);
}
private:
rapidjson::Document m_D;
};
and the AData class like this:
class AData
{
public:
Adata(const rapidjson::Value *val) : m_Value(val) {}
operator QString() const { return m_Value.IsString() ? m_Value.GetString() : QString(); }
private:
const rapidjson::Value *m_Value;
};
And the whole thing is called like this:
ADocument doc;
doc.setJson("{\"Hello\":{\"Hello\":\"test\"}}");
QString str = doc["/Hello/Hello"];
when str becomes "test".
Now by debugging this code I found out that the AData object somehow shifts - the operator QString() gets called from an object in different memory location than the original AData object is constructed in the operator[] of ADocument. Regular constructor is called once. But it might be that copy-elision simply moved the same object around in memory.
However when I define one of rule-of-three/five methods such as a destructor in AData without changing anything else (and the destructor does nothing itself) then the operator QString() is called on the SAME OBJECT (same memory location) which got constructed in the operator[] of ADocument.
Even when I implemented all thinkable constructors and operators (move, copy, assign...) NONE of them is ever called but the result is the same - only one object is ever created.
What is going on here? I would like to understand it.
And further, how to change this implementation so that it is as performance and memory efficient as possible (=minimum copies etc.)? Or maybe I am really worrying about nothing here and what I am seeing is only some compiler optimization?
What is going on here? I would like to understand it.
You're experiencing copy elision.
When a unnamed temporary would be copied or moved into a variable of the same type, the compiler is allowed to construct the object directly in the variable and skip the copy or move operation.
Another situation where you may experience this is when you return a variable with automatic storage duration from a function.
for method:
Object test(){
Object str("123");
return str;
}
then, I had two methods to call it:
code 1:
const Object &object=test();
code 2:
Object object=test();
which one is better? is twice calls to copy constructor happen in code 2 if without optimize?
other what's the difference?
for code2 I suppose:
Object tmp=test();
Object object=tmp;
for code1 I suppose:
Object tmp=test();
Object &object=tmp;
but the tmp will be deconstructor after the method.so it must add const?
is code 1 right without any issues?
Let's analyse your function:
Object test()
{
Object temp("123");
return temp;
}
Here you're constructing a local variable named temp and returning it from the function. The return type of test() is Object meaning you're returning by value. Returning local variables by value is a good thing because it allows a special optimization technique called Return Value Optimization (RVO) to take place. What happens is that instead of invoking a call to the copy or move constructor, the compiler will elide that call and directly construct the initializer into the address of the caller. In this case, because temp has a name (is an lvalue), we call it N(amed)RVO.
Assuming optimizations take place, no copy or move has been performed yet. This is how you would call the function from main:
int main()
{
Object obj = test();
}
That first line in main seems to be of particular concern to you because you believe that the temporary will be destroyed by the end of the full expression. I'm assuming it is a cause for concern because you believe obj will not be assigned to a valid object and that initializing it with a reference to const is a way to keep it alive.
You are right about two things:
The temporary will be destroyed at the end of the full expression
Initializing it with a reference to const will extend its life time
But the fact that the temporary will be destroyed is not a cause for concern. Because the initializer is an rvalue, its contents can be moved from.
Object obj = test(); // move is allowed here
Factoring in copy-elision, the compiler will elide the call to the copy or move constructor. Therefore, obj will be initialized "as if" the copy or move constructor was called. So because of these compiler optimizations, we have very little reason to fear multiple copies.
But what if we entertain your other examples? What if instead we had qualified obj as:
Object const& obj = test();
test() returns a prvalue of type Object. This prvalue would normally be destructed at the end of the full expression in which it is contained, but because it is being initialized to a reference to const, its lifetime is extended to that of the reference.
What are the differences between this example and the previous one?:
You cannot modify the state of obj
It inhibits move semantics
The first bullet point is obvious but not the second if you are unfamiliar with move semantics. Because obj is a reference to const, it cannot be moved from and the compiler cannot take advantage of useful optimizations. Assigning reference to const to an rvalue is only helpful in a narrow set of circumstances (as DaBrain has pointed out). It is instead preferable that you exercise value-semantics and create value-typed objects when it makes sense.
Moreover, you don't even need the function test(), you can simply create the object:
Object obj("123");
but if you do need test(), you can take advantage of type deduction and use auto:
auto obj = test();
Your last example deals with an lvalue-reference:
[..] but the tmp will be destructed after the method. So must we add const?
Object &object = tmp;
The destructor of tmp is not called after the method. Taking in to account what I said above, the temporary to which tmp is being initialized will be moved into tmp (or it will be elided). tmp itself doesn't destruct until it goes out of scope. So no, there is no need to use const.
But a reference is good if you want to refer to tmp through some other variable. Otherwise, if you know you will not need tmp afterwards, you can move from it:
Object object = std::move(tmp);
Both your examples are valid - in 1 const reference refers to a temporary object, but lifetime of this object is prolonged till the reference goes out of scope (see http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/). The second example is obviously valid, and most modern compilers will optimize away additional copying (even better if you use C+11 move semantics) so for practical purposes examples are equivalent (though in 2 additionally you can modify the value).
In C++11, std::string has a move constructor / move assignment operator, hence the code:
string str = test();
will (at worst) have one constructor call and one move assignment call.
Even without move semantics, this will (likely) be optimised away by NRVO (return value optimisation).
Don't be afraid of returning by value, basically.
Edit: Just to make it 100% clear what is going on:
#include <iostream>
#include <string>
class object
{
std::string s;
public:
object(const char* c)
: s(c)
{
std::cout << "Constructor\n";
}
~object()
{
std::cout << "Destructor\n";
}
object(const object& rhs)
: s(rhs.s)
{
std::cout << "Copy Constructor\n";
}
object& operator=(const object& rhs)
{
std::cout << "Copy Assignment\n";
s = rhs.s;
return *this;
}
object& operator=(object&& rhs)
{
std::cout << "Move Assignment\n";
s = std::move(rhs.s);
return *this;
}
object(object&& rhs)
: s(std::move(rhs.s))
{
std::cout << "Move Constructor\n";
}
};
object test()
{
object o("123");
return o;
}
int main()
{
object o = test();
//const object& o = test();
}
You can see that there is 1 constructor call and 1 destructor call for each - NRVO kicks in here (as expected) eliding the copy/move.
Code 1 is correct. As I said, the C++ Standard guarantees a temporary to a const reference is valid. It's main usage is polymorphic behavior with refenences:
#include <iostream>
class Base { public: virtual void Do() const { std::cout << "Base"; } };
class Derived : public Base { public: virtual void Do() const { std::cout << "Derived"; } };
Derived Factory() { return Derived(); }
int main(int argc, char **argv)
{
const Base &ref = Factory();
ref.Do();
return 0;
}
This will return "Derived". A famouse example was Andrei Alexandrescu's ScopeGuard but with C++11 it's even simpler yet.
Does C++ provide a guarantee for the lifetime of a temporary variable that is created within a function call but not used as a parameter? Here's an example class:
class StringBuffer
{
public:
StringBuffer(std::string & str) : m_str(str)
{
m_buffer.push_back(0);
}
~StringBuffer()
{
m_str = &m_buffer[0];
}
char * Size(int maxlength)
{
m_buffer.resize(maxlength + 1, 0);
return &m_buffer[0];
}
private:
std::string & m_str;
std::vector<char> m_buffer;
};
And here's how you would use it:
// this is from a crusty old API that can't be changed
void GetString(char * str, int maxlength);
std::string mystring;
GetString(StringBuffer(mystring).Size(MAXLEN), MAXLEN);
When will the destructor for the temporary StringBuffer object get called? Is it:
Before the call to GetString?
After GetString returns?
Compiler dependent?
I know that C++ guarantees that a local temporary variable will be valid as long as there's a reference to it - does this apply to parent objects when there's a reference to a member variable?
Thanks.
The destructor for that sort of temporaries is called at the end of the full-expression. That's the most outer expression which is not part of any other expression. That is in your case after the function returns and the value is evaluated. So, it will work all nice.
It's in fact what makes expression templates work: They can keep hold references to that sort of temporaries in an expression like
e = a + b * c / d
Because every temporary will last until the expression
x = y
Is evaluated completely. It's quite concisely described in 12.2 Temporary objects in the Standard.
litb's answer is accurate. The lifetime of the temporary object (also known as an rvalue) is tied to the expression and the destructor for the temporary object is called at the end of the full expression and when the destructor on StringBuffer is called, the destructor on m_buffer will also be called, but not the destructor on m_str since it is a reference.
Note that C++0x changes things just a little bit because it adds rvalue references and move semantics. Essentially by using an rvalue reference parameter (notated with &&) I can 'move' the rvalue into the function (instead of copying it) and the lifetime of the rvalue can be bound to the object it moves into, not the expression. There is a really good blog post from the MSVC team on that walks through this in great detail and I encourage folks to read it.
The pedagogical example for moving rvalue's is temporary strings and I'll show assignment in a constructor. If I have a class MyType that contains a string member variable, it can be initialized with an rvalue in the constructor like so:
class MyType{
const std::string m_name;
public:
MyType(const std::string&& name):m_name(name){};
}
This is nice because when I declare an instance of this class with a temporary object:
void foo(){
MyType instance("hello");
}
what happens is that we avoid copying and destroying the temporary object and "hello" is placed directly inside the owning class instance's member variable. If the object is heavier weight than a 'string' then the extra copy and destructor call can be significant.
After the call to GetString returns.
I wrote almost exactly the same class:
template <class C>
class _StringBuffer
{
typename std::basic_string<C> &m_str;
typename std::vector<C> m_buffer;
public:
_StringBuffer(std::basic_string<C> &str, size_t nSize)
: m_str(str), m_buffer(nSize + 1) { get()[nSize] = (C)0; }
~_StringBuffer()
{ commit(); }
C *get()
{ return &(m_buffer[0]); }
operator C *()
{ return get(); }
void commit()
{
if (m_buffer.size() != 0)
{
size_t l = std::char_traits<C>::length(get());
m_str.assign(get(), l);
m_buffer.resize(0);
}
}
void abort()
{ m_buffer.resize(0); }
};
template <class C>
inline _StringBuffer<C> StringBuffer(typename std::basic_string<C> &str, size_t nSize)
{ return _StringBuffer<C>(str, nSize); }
Prior to the standard each compiler did it differently. I believe the old Annotated Reference Manual for C++ specified that temporaries should clean up at the end of the scope, so some compilers did that. As late as 2003, I found that behaviour still existed by default on Sun's Forte C++ compiler, so StringBuffer didn't work. But I'd be astonished if any current compiler was still that broken.
StringBuffer is in the scope of GetString. It should get destroyed at the end of GetString's scope (ie when it returns). Also, I don't believe that C++ will guarantees that a variable will exist as long as there is reference.
The following ought to compile:
Object* obj = new Object;
Object& ref = &(*obj);
delete obj;
From cppreference:
All temporary objects are destroyed as the last step in evaluating the
full-expression that (lexically) contains the point where they were
created, and if multiple temporary objects were created, they are
destroyed in the order opposite to the order of creation. This is true
even if that evaluation ends in throwing an exception.