Deep-copy of struct with reference member in C++17 - c++

I'm still fairly new to C++ and am confused about references and move semantics. For a compiler I'm writing that generates C++17 code, I need to be able to have structs with fields that are other structs. Since the struct definitions will be generated from the user's code in the other language, they could potentially be very large, so I'm storing the inner struct as a reference. This is also necessary to deal with incomplete types that are declared at the beginning but defined later, which may happen in the generated code. (I avoided using pointers because adding * all over the place for dereferencing makes the code generation less straightforward.)
The language I'm compiling from has no aliasing, so something like Outer b = a should always be a "deep-copy". So in this case, b.inner should be a copy of a.inner and not a reference to it. But I can't figure out how to setup the constructors to create the deep-copy behavior in C++. I tried many different configurations of the constructors for Outer, and I tried both Inner& and Inner&& for storing inner.
Here is a mock example of how the generated code would look:
#include <iostream>
template<typename T>
T copy(T a) {
return a;
}
struct Inner;
struct Outer {
Inner&& inner;
Outer(Inner&& a);
Outer(Outer& a);
};
struct Inner {
int v;
};
Outer::Outer(Inner&& a) : inner(std::move(a)) {
std::cout << " -- Constructor 1 --" << std::endl;
}
// Copy the insides of the original object, then move that rvalue to the new object?
Outer::Outer(Outer& a) : inner(std::move(copy(a.inner))) {
std::cout << " -- Constructor 2 --" << std::endl;
}
int main() {
Outer a = {Inner {30}};
std::cout << a.inner.v << std::endl; // Should be: 30
a.inner.v += 1;
std::cout << a.inner.v << std::endl; // Should be: 31
Outer b = a; // Copy a to b
std::cout << a.inner.v << std::endl; // Should be: 31
std::cout << b.inner.v << std::endl; // Should be: 31
b.inner.v += 1;
std::cout << a.inner.v << std::endl; // Should be: 31
std::cout << b.inner.v << std::endl; // Should be: 32
return 0;
}
And this is what it currently outputs (it may vary by implementation):
-- Constructor 1 --
30
31
-- Constructor 2 --
297374876
32574
297374876
32574
Clearly this output is incorrect, and I think I must have a dangling reference somewhere among other things. How should I setup Outer to get the proper behavior here?

References in C++ are (almost always) non owning aliases.
You do not want a non owning alias.
Thus, do not use references.
You could have an owning (smart) pointer and a reference alias to make some code generation easier. Do not do this. The result of doing it is a class with mixed semantics; there is no coherant sensible operator= and copy/move constructors you can write in that case.
My advice would be to:
Write a value_ptr that inherits from unique_ptr but copies on assignment.
then either:
Generate code with ->
or
Add a helper method that returns *ptr reference, and generate code that does method().

(I avoided using pointers because adding * all over the place for dereferencing makes the code generation less straightforward.)
Don't let your desired interface interfere so much with your implementation. Separation of interface and implementation is a powerful tool.
Your goal is a deep copy. Your temporaries will not live long enough. Something has to own the copied data so it both lives long enough (no dangling references) and does not live too long (no leaked memory). A reference does not own its data. Since the data will not be directly part of your structure, you need a pointer with ownership semantics.
This does not mean that the code has to add de-referencing "all over the place". To aid your interface, you could have a reference to the object owned by the pointer. Normally this would be wasted space, but it might serve a purpose in your project, assuming your assessment about code generation is accurate.
Example:
struct Outer {
// Order matters here! The pointer must be declared before the reference!
// (This should be less of a problem for generated code than it can be for
// code edited by human programmers.)
const std::unique_ptr<Inner> inner_ptr;
Inner & inner;
// The idea is that `inner` refers to `*inner_ptr`, and the `const` on
// `inner_ptr` will prevent `inner` from becoming a dangling reference.
// Copy constructor
Outer(const Outer& src) :
inner_ptr(std::make_unique<Inner>(src.inner)), // Make a copy
inner(*inner_ptr) // Reference to the copy
{}
// The compiler-generated assignment operator will be deleted because
// of the reference member, just as in the question's code
// (so having it deleted because of the `unique_ptr` is not an issue).
// However, to make this explicit:
Outer& operator=(const Outer&) = delete;
};
With the above setup, you could still access the members of the inner data via syntax like object.inner.field. While this is redundant with access via the object.inner_ptr->field syntax, you indicated that you have established a need for the former syntax.
For the benefit of future readers:
This approach has drawbacks that would normally cause me to recommend against it. It is a judgement call as to which drawbacks are greater – those in this approach or the "less straightforward" code generation. Sometimes machine-generated code needs a bit of inefficiency to ensure that corner cases function correctly. So this might be acceptable in this particular case.
If I may stray a bit from your desired syntax, a neater option would be to have an accessor function. Whether or not this is applicable in your situation depends on details that are appropriately out-of-scope for this question. It might be worth considering.
Instead of wasting space by storing a reference in the structure, you could generate the reference as needed via a member function. This has the side-effect of removing the need to mark the pointer const.
struct Outer {
// Note the lack of restrictions imposed on the data.
// All that might be needed is an assertion that inner_ptr will never be null.
std::unique_ptr<Inner> inner_ptr;
// Here, `inner` will be a member function instead of member data.
Inner & inner() { return *inner_ptr; }
// And a const version for good measure.
const Inner & inner() const { return *inner_ptr; }
// Copy constructor
Outer(const Outer& src) :
inner_ptr(std::make_unique<Inner>(src.inner())) // Make a copy
{}
// With this setup, the compiler-generated copy assignment
// operator is still deleted because of the `unique_ptr`.
// However, a compiler-generated *move* assignment is
// available if you specifically request it.
Outer& operator=(const Outer&) = delete;
Outer& operator=(Outer &&) = default;
};
With this setup, access to the members of the inner data could be done via syntax like object.inner().field. I don't know if the extra parentheses will cause the same issues as the asterisks would.

Deep copying only makes sense when the class has ownership. A reference isn't generally used for owernship.
Clearly this output is incorrect, and I think I must have a dangling reference somewhere among other things
You've guessed correctly. In the declaration: Outer a = {Inner {30}}; The instance of Inner is a temporary object and its lifetime extends until the end of that declaration. After that, the reference member is left dangling.
so I'm storing the inner struct as a reference
A reference doesn't store an object. A reference refers to an object that is stored somewhere else.
How should I setup Outer to get the proper behavior here?
It seems that a smart pointer might be useful for your use case:
struct Outer {
std::unique_ptr<Inner> inner;
};
You'll need to define a deep copy constructor and assignment operator though.

Related

Why not to accept std::unique_ptr by rvalue reference?

Can somebody explain why everybody passes std::unique_ptr by value instead of by rvalue reference?
From what I've observed, this required an additional move constructor to be invoked.
Here's an example of a class holding a "pointer". It takes 3 move-ctor calls to take it by value, versus 2 calls to take it by reference:
#include <memory>
#include <iostream>
class pointer {
public:
pointer()
{ std::cerr << "ctor" << std::endl; }
pointer(const pointer&)
{ std::cerr << "copy-ctor" << std::endl; }
pointer& operator=(const pointer&)
{ std::cerr << "copy-assignment" << std::endl; return *this; }
pointer(pointer&&)
{ std::cerr << "move-ctor" << std::endl; }
pointer& operator=(pointer&&)
{ std::cerr << "move-assignment" << std::endl; return *this; }
~pointer()
{ std::cerr << "dtor" << std::endl; }
};
class A {
public:
// V1
A(pointer _ptr) : ptr(std::move(_ptr)) {}
// V2
A(pointer&& _ptr) : ptr(std::move(_ptr)) {}
private:
pointer ptr;
};
int main() {
// Three calls to move-ctor versus two calls if pass by rvalue reference
auto ptr = pointer();
A a(std::move(ptr));
// Two calls to move-ctor always
A a(pointer{});
}
Passing a unique_ptr by reference, rvalue or otherwise, doesn't actual move anything, so you can't know by just looking at the function declaration if a move will happen.
Passing a unique_ptr by value on the other hand guarantees that the passed in pointer will be moved from, so without even have to look at the documentation you know calling that function releases you from the pointers' ownership.
For the same reason people pass int instead of const int&.
std::unique_ptr is just an RAII wrapper around a single pointer value, so moving it is just copying a single register width value, then zeroing the source. That's so trivial there's no real benefit to avoiding the move. After all, the cost to pass the reference (when not inlined) is the cost of passing a pointer too, so passing by reference can be less efficient (because if not inlined, it has to follow the reference to the real memory, then pull out the value from there; the top of the stack is likely in L1 cache, who knows if the place it's stored is?).
In practice, much of this will be inlined with optimizations enabled, and both approaches would get the same result. Passing by value is a good default when there's no benefit to passing by reference, so why not do it that way?
why everybody passes the std::unique_ptr by value instead of rvalue reference?
It may be more common, but it's not "everybody".
The drawback of std::unique_ptr&& parameter is that it doesn't explicitly communicate to the caller whether the pointer will be moved from or not. It might always move, or it might depend on some condition. You would have to know the implementation or at least API documentation to know for sure. The corresponding benefit of std::unique_ptr parameter is that it alone tells the reader of the declaration that the function will take ownership of the pointer. For this reason, it may be a good choice to use std::unique_ptr parameter and probably part of the reason why it's more common.
The benefit of std::unique_ptr&& is avoiding the extra move. However, moving of a std::unique_ptr is a very fast operation. It's insignificant compared for example to the memory allocation itself. In most cases, it simply doesn't matter.
The difference between the two is fairly subtle. std::unique_ptr&& parameter may be considered in a case where you've measured the move to have significant cost. Which is not very common. Or in cases where your API may be used in cases where that cost could be significant. It's hard to prove that this won't ever happen if you're writing a public API, so it is a more likely argument to use.

What lasts after using std::move c++11

After using std::move in a variable that might be a field in a class like:
class A {
public:
vector<string>&& stealVector() {
return std::move(myVector);
}
void recreateMyVector() {
}
private:
vector<string> myVector;
};
How would I recreate the vector, like a clear one? What is left in myVector after the std::move?
The common mantra is that a variable that has been "moved-from" is in a valid, but unspecified state. That means that it is possible to destroy and to assign to the variable, but nothing else.
(Stepanov calls this "partially formed", I believe, which is a nice term.)
To be clear, this isn't a strict rule; rather, it is a guideline on how to think about moving: After you move from something, you shouldn't want to use the original object any more. Any attempt to do something non-trivial with the original object (other than assigning to it or destroying it) should be carefully thought about and justified.
However, in each particular case, there may be additional operations that make sense on a moved-from object, and it's possible that you may want to take advantage of those. For example:
The standard library containers describe preconditions for their operations; operations with no pre­conditions are fine. The only useful ones that come to mind are clear(), and perhaps swap() (but prefer assignment rather than swapping). There are other operations without preconditions, such as size(), but following the above reasoning, you shouldn't have any business inquiring after the size of an object which you just said you didn't want any more.
The unique_ptr<T, D> guarantees that after being moved-from, it is null, which you can exploit in a situation where ownership is taken conditionally:
std::unique_ptr<T> resource(new T);
std::vector<std::function<int(std::unique_ptr<T> &)> handlers = /* ... */;
for (auto const & f : handlers)
{
int result = f(resource);
if (!resource) { return result; }
}
A handler looks like this:
int foo_handler(std::unique_ptr<T> & p)
{
if (some_condition))
{
another_container.remember(std::move(p));
return another_container.state();
}
return 0;
}
It would have been possible generically to have the handler return some other kind of state that indi­cates whether it took ownership from the unique pointer, but since the standard actually guaran­tees that moving-from a unique pointer leaves it as null, we can exploit that to transmit that information in the unique pointer itself.
Move the member vector to a local vector, clear the member, return the local by value.
std::vector<string> stealVector() {
auto ret = std::move(myVector);
myVector.clear();
return ret;
}
What is left in myVector after the std::move?
std::move doesn't move, it is just a cast. It can happen that myVector is intact after the call to stealVector(); see the output of the first a.show() in the example code below. (Yes, it is a silly but valid code.)
If the guts of myVector are really stolen (see b = a.stealVector(); in the example code), it will be in a valid but unspecified state. Nevertheless, it must be assignable and destructible; in case of std::vector, you can safely call clear() and swap() as well. You really should not make any other assumptions concerning the state of the vector.
How would I recreate the vector, like a clear one?
One option is to simply call clear() on it. Then you know its state for sure.
The example code:
#include <initializer_list>
#include <iostream>
#include <string>
#include <vector>
using namespace std;
class A {
public:
A(initializer_list<string> il) : myVector(il) { }
void show() {
if (myVector.empty())
cout << "(empty)";
for (const string& s : myVector)
cout << s << " ";
cout << endl;
}
vector<string>&& stealVector() {
return std::move(myVector);
}
private:
vector<string> myVector;
};
int main() {
A a({"a", "b", "c"});
a.stealVector();
a.show();
vector<string> b{"1", "2", "3"};
b = a.stealVector();
a.show();
}
This prints the followings on my machine:
a b c
(empty)
Since I feel Stepanov has been misrepresented in the answers so far, let me add a quick overview of my own:
For std types (and only those), the standard specifies that a moved-from object is left in the famous "valid, but unspecified" state. In particular, none of the std types use Stepanov's Partially-Formed State, which some, me included, think of as a mistake.
For your own types, you should strive for both the default constructor as well as the source object of a move to establish the Partially-Formed State, which Stepanov defined in Elements of Programming (2009) as a state in which the only valid operations are destruction and assignment of a new value. In particular, the Partially-Formed State need not represent a valid value of the object, nor does it need to adhere to normal class invariants.
Contrary to popular belief, this is nothing new. The Partially-Formed State exists since the dawn of C/C++:
int i; // i is Partially-Formed: only going out of scope and
// assignment are allowed, and compilers understand this!
What this practically means for the user is to never assume you can do more with a moved-from object than destroy it or assign a new value to it, unless, of course, the documentation states that you can do more, which is typically possible for containers, which can often naturally, and efficiently, establish the empty state.
For class authors, it means that you have two choices:
First, you avoid the Partially-Formed State as the STL does. But for a class with Remote State, e.g. a pimpl'ed class, this means that to represent a valid value, either you accept nullptr as a valid value for pImpl, prompting you to define, at the public API level, what a nullptr pImpl means, incl. checking for nullptr in all member functions.
Or you need to allocate a new pImpl for the moved-from (and default-constructed) object, which, of course, is nothing any performance-conscious C++ programmer would do. A performance-conscious C++ programmer, however, would also not like to litter his code with nullptr checks just to support the minor use-case of a non-trivial use of a moved-from object.
Which brings us to the second alternative: Embrace the Partially-Formed State. That means, you accept nullptr pImpl, but only for default-constructed and moved-from objects. A nullptr pImpl represents the Partially-Formed State, in which only destruction and assignment of another value are allowed. This means that only the dtor and the assignment operators need to be able to deal with a nullptr pImpl, while all other members can assume a valid pImpl. This has another benefit: both your default ctor as well as the move operators can be noexcept, which is important for use in std::vector (so moves and not copies are used upon reallocation).
Example Pen class:
class Pen {
struct Private;
Private *pImpl = nullptr;
public:
Pen() noexcept = default;
Pen(Pen &&other) noexcept : pImpl{std::exchange(other.pImpl, {})} {}
Pen(const Pen &other) : pImpl{new Private{*other.pImpl}} {} // assumes valid `other`
Pen &operator=(Pen &&other) noexcept {
Pen(std::move(other)).swap(*this);
return *this;
}
Pen &operator=(const Pen &other) {
Pen(other).swap(*this);
return *this;
}
void swap(Pen &other) noexcept {
using std::swap;
swap(pImpl, other.pImpl);
}
int width() const { return pImpl->width; }
// ...
};

Reference member variables as class members

In my place of work I see this style used extensively:-
#include <iostream>
using namespace std;
class A
{
public:
A(int& thing) : m_thing(thing) {}
void printit() { cout << m_thing << endl; }
protected:
const int& m_thing; //usually would be more complex object
};
int main(int argc, char* argv[])
{
int myint = 5;
A myA(myint);
myA.printit();
return 0;
}
Is there a name to describe this idiom? I am assuming it is to prevent the possibly large overhead of copying a big complex object?
Is this generally good practice? Are there any pitfalls to this approach?
Is there a name to describe this idiom?
In UML it is called aggregation. It differs from composition in that the member object is not owned by the referring class. In C++ you can implement aggregation in two different ways, through references or pointers.
I am assuming it is to prevent the possibly large overhead of copying a big complex object?
No, that would be a really bad reason to use this. The main reason for aggregation is that the contained object is not owned by the containing object and thus their lifetimes are not bound. In particular the referenced object lifetime must outlive the referring one. It might have been created much earlier and might live beyond the end of the lifetime of the container. Besides that, the state of the referenced object is not controlled by the class, but can change externally. If the reference is not const, then the class can change the state of an object that lives outside of it.
Is this generally good practice? Are there any pitfalls to this approach?
It is a design tool. In some cases it will be a good idea, in some it won't. The most common pitfall is that the lifetime of the object holding the reference must never exceed the lifetime of the referenced object. If the enclosing object uses the reference after the referenced object was destroyed, you will have undefined behavior. In general it is better to prefer composition to aggregation, but if you need it, it is as good a tool as any other.
It's called dependency injection via constructor injection: class A gets the dependency as an argument to its constructor and saves the reference to dependent class as a private variable.
There's an interesting introduction on wikipedia.
For const-correctness I'd write:
using T = int;
class A
{
public:
A(const T &thing) : m_thing(thing) {}
// ...
private:
const T &m_thing;
};
but a problem with this class is that it accepts references to temporary objects:
T t;
A a1{t}; // this is ok, but...
A a2{T()}; // ... this is BAD.
It's better to add (requires C++11 at least):
class A
{
public:
A(const T &thing) : m_thing(thing) {}
A(const T &&) = delete; // prevents rvalue binding
// ...
private:
const T &m_thing;
};
Anyway if you change the constructor:
class A
{
public:
A(const T *thing) : m_thing(*thing) { assert(thing); }
// ...
private:
const T &m_thing;
};
it's pretty much guaranteed that you won't have a pointer to a temporary.
Also, since the constructor takes a pointer, it's clearer to users of A that they need to pay attention to the lifetime of the object they pass.
Somewhat related topics are:
Should I prefer pointers or references in member data?
Using reference as class members for dependencies
GotW #88
Forbid rvalue binding via constructor to member const reference
Is there a name to describe this idiom?
There is no name for this usage, it is simply known as "Reference as class member".
I am assuming it is to prevent the possibly large overhead of copying a big complex object?
Yes and also scenarios where you want to associate the lifetime of one object with another object.
Is this generally good practice? Are there any pitfalls to this approach?
Depends on your usage. Using any language feature is like "choosing horses for courses". It is important to note that every (almost all) language feature exists because it is useful in some scenario.
There are a few important points to note when using references as class members:
You need to ensure that the referred object is guaranteed to exist till your class object exists.
You need to initialize the member in the constructor member initializer list. You cannot have a lazy initialization, which could be possible in case of pointer member.
The compiler will not generate the copy assignment operator=() and you will have to provide one yourself. It is cumbersome to determine what action your = operator shall take in such a case. So basically your class becomes non-assignable.
References cannot be NULL or made to refer any other object. If you need reseating, then it is not possible with a reference as in case of a pointer.
For most practical purposes (unless you are really concerned of high memory usage due to member size) just having a member instance, instead of pointer or reference member should suffice. This saves you a whole lot of worrying about other problems which reference/pointer members bring along though at expense of extra memory usage.
If you must use a pointer, make sure you use a smart pointer instead of a raw pointer. That would make your life much easier with pointers.
C++ provides a good mechanism to manage the life time of an object though class/struct constructs. This is one of the best features of C++ over other languages.
When you have member variables exposed through ref or pointer it violates the encapsulation in principle. This idiom enables the consumer of the class to change the state of an object of A without it(A) having any knowledge or control of it. It also enables the consumer to hold on to a ref/pointer to A's internal state, beyond the life time of the object of A. This is bad design. Instead the class could be refactored to hold a ref/pointer to the shared object (not own it) and these could be set using the constructor (Mandate the life time rules). The shared object's class may be designed to support multithreading/concurrency as the case may apply.
Wanted to add some point that was (somewhat) introduced in manilo's (great!) answer with some code:
As David Rodríguez - dribeas mentioed (in his great answer as well!), there are two "forms" of aggragation: By pointer and by reference. Take into account that if the former is used (by reference, as in your example), then the container class can NOT have a default constructor - cause all class' members of type reference MUST be initialized at construction time.
The below code will NOT compile if you will remove the comment from the default ctor implementation (g++ version 11.3.0 will output the below error):
error: uninitialized reference member in ‘class AggregatedClass&’ [-fpermissive]
MyClass()
#include <iostream>
using namespace std;
class AggregatedClass
{
public:
explicit AggregatedClass(int a) : m_a(a)
{
cout << "AggregatedClass::AggregatedClass - set m_a:" << m_a << endl;
}
void func1()
{
cout << "AggregatedClass::func1" << endl;
}
~AggregatedClass()
{
cout << "AggregatedClass::~AggregatedClass" << endl;
}
private:
int m_a;
};
class MyClass
{
public:
explicit MyClass(AggregatedClass& obj) : m_aggregatedClass(obj)
{
cout << "MyClass::MyClass(AggregatedClass& obj)" << endl;
}
/* this ctor can not be compiled
MyClass()
{
cout << "MyClass::MyClass()" << endl;
}
*/
void func1()
{
cout << "MyClass::func1" << endl;
m_aggregatedClass.func1();
}
~MyClass()
{
cout << "MyClass::~MyClass" << endl;
}
private:
AggregatedClass& m_aggregatedClass;
};
int main(int argc, char** argv)
{
cout << "main - start" << endl;
// first we need to create the aggregated object
AggregatedClass aggregatedObj(15);
MyClass obj(aggregatedObj);
obj.func1();
cout << "main - end" << endl;
return 0;
}
Member references are usually considered bad. They make life hard compared to member pointers. But it's not particularly unsual, nor is it some special named idiom or thing. It's just aliasing.

std::vector push_back and class constructor not being called?

I have class like this
class variable
{
public:
variable(int _type=0) : type(_type), value(NULL), on_pop(NULL)
{
}
virtual ~variable()
{
if (type)
{
std::cout << "Variable Deleted" <<std::endl;
on_pop(*this);
value=NULL;
}
}
int type;
void* value;
typedef void(*func1)(variable&);
func1 on_pop;
}
And then I push instances into a std::vector like this:
stack.push_back(variable(0));
I expect that the destructor of variable will be called but the if won't enter until a value is assigned to type because I expect the constructor I provide will be called when the instance is copied into the vector. But for some reason it is not.
After calling stack.push_back the destructor (of the copy?) is ran and type has some random value like if the constructor was never called.
I can't seem to figure what I am doing wrong. Please help! ^_^
EDIT:
Ok here is a self contained example to show what I mean:
#include <iostream>
#include <vector>
class variable
{
public:
variable(int _type=0) : type(_type), value(NULL), on_pop(NULL)
{
}
~variable()
{
if (type)
{
std::cout << "Variable Deleted" <<std::endl;
on_pop(*this);
value=NULL;
}
}
int type;
void* value;
typedef void(*func1)(variable&);
func1 on_pop;
};
static void pop_int(variable& var)
{
delete (int*)var.value;
}
static void push_int(variable& var)
{
var.type = 1;
var.value = new int;
var.on_pop = &pop_int;
}
typedef void(*func1)(variable&);
func1 push = &push_int;
int main()
{
std::vector<variable> stack;
stack.push_back(variable(0));
push(stack[stack.size()-1]);
stack.push_back(variable(0));
push(stack[stack.size()-1]);
stack.push_back(variable(0));
push(stack[stack.size()-1]);
return 0;
}
The program above outputs the following:
Variable Deleted
Variable Deleted
Variable Deleted
Variable Deleted
Variable Deleted
Variable Deleted
Process returned 0 (0x0) execution time : 0.602 s
Press any key to continue.
Welcome to RVO and NRVO. This basically means that the compiler can skip creating an object if it's redundant- even if it's constructor and destructor have side effects. You cannot depend on an object which is immediately copied or moved to actually exist.
Edit: The actual value in the vector cannot be ellided at all. Only the intermediate variable variable(0) can be ellided. The object in the vector must still be constructed and destructed as usual. These rules only apply to temporaries.
Edit: Why are you writing your own resource management class? You could simply use unique_ptr with a custom deleter. And your own RTTI?
Every object that was destructed must have been constructed. There is no rule in the Standard that violates this. RVO and NRVO only become problematic when you start, e.g., modifying globals in your constructors/destructors. Else, they have no impact on the correctness of the program. That's why they're Standard. You must be doing something else wrong.
Ultimately, I'm just not sure exactly WTF is happening to you and why it's not working or what "working" should be. Post an SSCCE.
Edit: In light of your SSCCE, then absolutely nothing is going wrong whatsoever. This is entirely expected behaviour. You have not respected the Rule of Three- that is, you destroy the resource in your destructor but make no efforts to ensure that you actually own the resource in question. Your compiler-generated copy constructor is blowing up your logic. You must read about the Rule of Three, copy and swap and similar idioms for resource handling in C++, and preferably, use a smart pointer which is already provided as Standard like unique_ptr which does not have these problems.
After all, you create six instances of variable- three temporaries on the stack, and three inside the vector. All of these have their destructors called. The problem is that you never considered the copy operation or what copying would do or what would happen to these temporaries (hint: they get destructed).
Consider the equal example of
int main()
{
variable v(0);
push_int(v);
variable v2 = v;
return 0;
}
Variable v is constructed and allocates a new int and everything is dandy. But wait- then we copy it into v2. The compiler-generated constructor copies all the bits over. Then both v2 and v are destroyed- but they both point to the same resource because they both hold the same pointer. Double delete abounds.
You must define copy (shared ownership - std::shared_ptr) or move (unique ownership - std::unique_ptr) semantics.
Edit: Just a quick note. I observe that you actually don't push into items until after they're already in the vector. However, the same effect is observed when the vector must resize when you add additional elements and the fundamental cause is the same.
The destructor is called 6 times. A constructor is called six times. Just not the one you intended.
Ok. I've been reading some more about the intrinsics of different containers and, apparently, the one that does the job I'm trying to accomplish here is std::deque.

Avoiding need for #define with expression templates

With the following code, "hello2" is not displayed as the temporary string created on Line 3 dies before Line 4 is executed. Using a #define as on Line 1 avoids this issue, but is there a way to avoid this issue without using #define? (C++11 code is okay)
#include <iostream>
#include <string>
class C
{
public:
C(const std::string& p_s) : s(p_s) {}
const std::string& s;
};
int main()
{
#define x1 C(std::string("hello1")) // Line 1
std::cout << x1.s << std::endl; // Line 2
const C& x2 = C(std::string("hello2")); // Line 3
std::cout << x2.s << std::endl; // Line 4
}
Clarification:
Note that I believe Boost uBLAS stores references, this is why I don't want to store a copy. If you suggest that I store by value, please explain why Boost uBLAS is wrong and storing by value will not affect performance.
Expression templates that do store by reference typically do so for performance, but with the caveat they only be used as temporaries
Taken from the documentation of Boost.Proto (which can be used to create expression templates):
Note An astute reader will notice that the object y defined above will be left holding a dangling reference to a temporary int. In the sorts of high-performance applications Proto addresses, it is typical to build and evaluate an expression tree before any temporary objects go out of scope, so this dangling reference situation often doesn't arise, but it is certainly something to be aware of. Proto provides utilities for deep-copying expression trees so they can be passed around as value types without concern for dangling references.
In your initial example this means that you should do:
std::cout << C(std::string("hello2")).s << std::endl;
That way the C temporary never outlives the std::string temporary. Alternatively you could make s a non reference member as others pointed out.
Since you mention C++11, in the future I expect expression trees to store by value, using move semantics to avoid expensive copying and wrappers like std::reference_wrapper to still give the option of storing by reference. This would play nicely with auto.
A possible C++11 version of your code:
class C
{
public:
explicit
C(std::string const& s_): s { s_ } {}
explicit
C(std::string&& s_): s { std::move(s_) } {}
std::string const&
get() const& // notice lvalue *this
{ return s; }
std::string
get() && // notice rvalue *this
{ return std::move(s); }
private:
std::string s; // not const to enable moving
};
This would mean that code like C("hello").get() would only allocate memory once, but still play nice with
std::string clvalue("hello");
auto c = C(clvalue);
std::cout << c.get() << '\n'; // no problem here
but is there a way to avoid this issue without using #define?
Yes.
Define your class as: (don't store the reference)
class C
{
public:
C(const std::string & p_s) : s(p_s) {}
const std::string s; //store the copy!
};
Store the copy!
Demo : http://www.ideone.com/GpSa2
The problem with your code is that std::string("hello2") creates a temporary, and it remains alive as long as you're in the constructor of C, and after that the temporary is destroyed but your object x2.s stills points to it (the dead object).
After your edit:
Storing by reference is dangerous and error prone sometimes. You should do it only when you are 100% sure that the variable reference will never go out of scope until its death.
C++ string is very optimized. Until you change a string value, all will refer to the same string only. To test it, you can overload operator new (size_t) and put a debug statement. For multiple copies of same string, you will see that the memory allocation will happen only once.
You class definition should not be storing by reference, but by value as,
class C {
const std::string s; // s is NOT a reference now
};
If this question is meant for general sense (not specific to string) then the best way is to use dynamic allocation.
class C {
MyClass *p;
C() : p (new MyClass()) {} // just an example, can be allocated from outside also
~C() { delete p; }
};
Without looking at BLAS, expression templates typically make heavy use of temporary objects of types you aren't supposed to even know exists. If Boost is storing references like this within theirs, then they would suffer the same problem you see here. But as long as those temporary objects remain temporary, and the user doesnt store them for later, everything is fine because the temporaries they reference remain alive for as long as the temporary objects do. The trick is you perform a deep copy when the intermediate object is turned into the final object that the user stores. You've skipped this last step here.
In short, it's a dangerous move, which is perfectly safe as long as the user of your library doesn't do anything foolish. I wouldn't recommend making use of it unless you have a clear need, and you're well aware of the consequences. And even then, there might be a better alternative, I've never worked with expression templates in any serious capacity.
As an aside, since you tagged this C++0x, auto x = a + b; seems like it would be one of those "foolish" things users of your code can do to make your optimization dangerous.