Is getting address of rvalue reference before move considered safe? - c++

I'm playing around Move Semantics and [r|l]value references to learn how to use them in real-world programs. Consider following code:
// Item is a heavy class having move ctor and assignment but no copy.
std::map<std::string, Item*> lookup;
std::forward_list<Item> items;
void FooClass::addItem(Item&& d) {
if (lookup.find(d.getName()) == lookup.end()) {
lookup[d.getName()] = &d; //<== not safe after move?
items.push_front(std::move(d));
}
}
I'm getting address of an Item&& and store it in a pointer. Then move data that to a std::forward_list (items). I assume calling move assignment does not affect address of object. Is that correct? Though content of d is no more valid after move. That is content of lookup table (lookup) is incorrect.
I'm assuming that I have to re-order a) adding lookup item and b) move actual data. Above code is not sane. Is this correct?
Also I can't see why do I have to say std::move there. Compiler should know that d is a rvalue reference. So it should call std::forward_list<T>::push_front(T&&) and move assignment...

lookup.[d.getName()] = &d; //<== not safe after move?
This is completely unsafe, but not just because of the move. Contrary to your question's title, you are not taking the address of an rvalue reference, you are taking the address of an lvalue, but one which is probably going to go out of scope soon after the function returns, which will leave a dangling pointer. Consider:
FooClass f;
f.addItem( Item() );
This adds the address of a temporary to the map, so if you ever dereference the pointer your program has undefined behaviour, the epitome of unsafe.
The move on the next line might make things worse, because the object referred to by the pointer in the map might get modified by the move, leaving a pointer to a moved-from Item in the map, but that's nothing compared to the undefined behaviour that results from it going out of scope after the function returns.
It's trivial to make the code safe, so there is no reason to write it the way you have done.
items.push_front(std::move(d));
auto& item = items.front();
lookup[item.getName()] = &item;
Now the pointer in the map refers to an object which is not about to go out of scope. The pointer will be valid as long as the element is in the forward_list.

I would get rid of items and change lookup to store Item by value, as in:
using Lookup = std::map<std::string, Item>;
Lookup lookup;
void addItem(Item&& d)
{ lookup.insert(std::pair<std::string const&, Item&&>{d.getName(), std::move(d)}); }

Related

inserting temporary std::shared_ptr into std::map, is it bad?

I'm designing a class for my application that implements a lot of standard shared pointers and usage of standard containers such as std::map and std::vector
It's very specific question to the problem so I just copied a piece of code
from my header for clarification purposes..
here is a snapshot of that declarations from the header:
struct Drag;
std::map<short, std::shared_ptr<Drag>> m_drag;
typedef sigc::signal<void, Drag&> signal_bet;
inline signal_bet signal_right_top();
and here is one of the functions that uses the above declarations and a temporary shared_ptr which is intended to be used not only in this function but until some late time. that means after the function returns a shared pointer should be still alive because it will be assigned at some point to another shared_ptr.
void Table::Field::on_signal_left_top(Drag& drag)
{
m_drag.insert(std::make_pair(drag.id, std::make_shared<Drag>(this))); // THIS!
auto iter = m_drag.find(drag.id);
*iter->second = drag;
iter->second->cx = 0 - iter->second->tx;
iter->second->cy = 0 - iter->second->ty;
invalidate_window();
}
the above function first insert a new shared_ptr and then assigns the values from one object into another,
What I need from your answer is to tell whether is it safe to insert temporary shared_ptr into the map and be sure that it will not be a dangling or what ever bad thing.
According to THIS website the above function is not considered safe because it would much better to write it like so:
void Table::Field::on_signal_left_top(Drag& drag)
{
std::shared_ptr pointer = std::make_shared<Drag>(this);
m_drag.insert(std::make_pair(drag.id, pointer));
auto iter = m_drag.find(drag.id);
*iter->second = drag;
// etc...
}
well one line more in the function.
is it really required to type it like that and why ?
There's no difference between the two functions in regard to the std::shared_ptr, because the std::make_pair function will create a copy of the temporary object before the temporary object is destructed. That copy will in turn be copied into the std::map, and will then itself be destructed, leaving you with a copy-of-a-copy in the map. But because the two other objects have been destructed, the reference count of the object in the map will still be one.
As for handling the return value from insert, it's very simple:
auto result = m_drag.insert(...);
if (!result.second)
{
std::cerr << "Could not insert value\n";
return;
}
auto iter = result.first;
...
The code in the example given is different from your example code, because it is using the new operator instead of std::make_shared. The key part of their advice is here:
Since function arguments are evaluated in unspecified order, it is possible for new int(2) to be evaluated first, g() second, and we may never get to the shared_ptr constructor if g throws an exception.
std::make_shared eliminates this problem - any dynamic memory allocated while constructing an object within std::make_shared will be de-allocated if anything throws. You won't need to worry about temporary std::shared_ptrs in this case.

const parameter vs const reference parameter

Implementation 1:
foo(const Bar x);
Implementation 2:
foo(const Bar & x);
If the object will not be changed within the function, why would you ever copy it(implementation 1).
Will this be automatically optimized by the compiler?
Summary: Even though the object is declared as const in the function declaration, it is still possible that the object be edited via some other alias &.
If you are the person writing the library and know that your functions don't do that or that the object is big enough to justify the dereferencing cost on every operation, than
foo(const Bar & x); is the way to go.
Part 2:
Will this be automatically optimized by the compiler?
Since we established that they are not always equivalent, and the conditions for equivalence is non-trivial, it would generally be very hard for the compiler to ensure them, so almost certainly no
you ask,
“If the object will not be changed within the function, why would you ever copy it(implementation 1).”
well there are some bizarre situations where an object passed by reference might be changed by other code, e.g.
namespace g { int x = 666; }
void bar( int ) { g::x = 0; }
int foo( int const& a ) { assert( a != 0 ); bar( a ); return 1000/a; } // Oops
int main() { foo( g::x ); }
this has never happened to me though, since the mid 1990s.
so, this aliasing is a theoretical problem for the single argument of that type.
with two arguments of the same type it gets more of a real possibility. for example, an assignment operator might get passed the object that it's called on. when the argument is passed by value (as in the minimal form of the swap idiom) it's no problem, but if not then self-assignment generally needs to be avoided.
you further ask,
“Will this be automatically optimized by the compiler?”
no, not in general, for the above mentioned reason
the compiler can generally not guarantee that there will be no aliasing for a reference argument (one exception, though, is where the machine code of a call is inlined)
however, on the third hand, the language could conceivably have supported the compiler in this, e.g. by providing the programmer with a way to explicitly accept any such optimization, like, a way to say ”this code is safe to optimize by replacing pass by value with pass by reference, go ahead as you please, compiler”
Indeed, in those circumstances you would normally use method 2.
Typically, you would only use method 1 if the object is tiny, so that it's cheaper to copy it once than to pay to access it repeatedly through a reference (which also incurs a cost). In TC++PL, Stroustrup develops a complex number class and passes it around by value for exactly this reason.
It may be optimized in some circumstances, but there are plenty of things that can prevent it. The compiler can't avoid the copy if:
the copy constructor or destructor has side effects and the argument passed is not a temporary.
you take the address of x, or a reference to it, and pass it to some code that might be able to compare it against the address of the original.
the object might change while foo is running, for example because foo calls some other function that changes it. I'm not sure whether this is something you mean to rule out by saying "the object will not be changed within the function", but if not then it's in play.
You'd copy it if any of those things matters to your program:
if you want the side effects of copying, take a copy
if you want "your" object to have a different address from the user-supplied argument, take a copy
if you don't want to see changes made to the original during the running of your function, take a copy
You'd also copy it if you think a copy would be more efficient, which is generally assumed to be the case for "small" types like int. Iterators and predicates in standard algorithms are also taken by value.
Finally, if your code plans to copy the object anyway (including by assigning to an existing object) then a reasonable idiom is to take the copy as the parameter in the first place. Then move/swap from your parameter.
What if the object is changed from elsewhere?
void f(const SomeType& s);
void g(const SomeType s);
int main() {
SomeType s;
std::thread([&](){ /* s is non-const here, and we can modify it */}
// we get a const reference to the object which we see as const,
// but others might not. So they can modify it.
f(s);
// we get a const *copy* of the object,
// so what anyone else might do to the original doesn't matter
g(s);
}
What if the object is const, but has mutable members? Then you can still modify the object, and so it's very important whether you have a copy or a reference to the original.
What if the object contains a pointer to another object? If s is const, the pointer will be const, but what it points to is not affected by the constness of s. But creating a copy will (hopefully) give us a deep copy, so we get our own (const) object with a separate (const) pointer pointing to a separate (non-const) object.
There are a number of cases where a const copy is different than a const reference.

Inside the copy constructor of shared_ptr

I have some confusion about the shared_ptr copy constructor. Please consider the following 2 lines:
It is a "constant" reference to a shared_ptr object, that is passed to the copy constructor so that another shared_ptr object is initialized.
The copy constructor is supposed to also increment a member data - "reference counter" - which is also shared among all shared_ptr objects, due to the fact that it is a reference/pointer to some integer telling each shared_ptr object how many of them are still alive.
But, if the copy constructor attempts to increment the reference counting member data, does it not "hit" the const-ness of the shared_ptr passed by reference? Or, does the copy constructor internally use the const_cast operator to temporarily remove the const-ness of the argument?
The phenomenon you're experiencing is not special to the shared pointer. Here's a typical primeval example:
struct Foo
{
int * p;
Foo() : p(new int(1)) { }
};
void f(Foo const & x) // <-- const...?!?
{
*x.p = 12; // ...but this is fine!
}
It is true that x.p has type int * const inside f, but it is not an int const * const! In other words, you cannot change x.p, but you can change *x.p.
This is essentially what's going on in the shared pointer copy constructor (where *p takes the role of the reference counter).
Although the other answers are correct, it may not be immediately apparent how they apply. What we have is something like this:
template <class T>
struct shared_ptr_internal {
T *data;
size_t refs;
};
template <class T>
class shared_ptr {
shared_ptr_internal<T> *ptr;
public:
shared_ptr(shared_ptr const &p) {
ptr = p->ptr;
++(ptr->refs);
}
// ...
};
The important point here is that the shared_ptr just contains a pointer to the structure that contains the reference count. The fact that the shared_ptr itself is const doesn't affect the object it points at (what I've called shared_ptr_internal). As such, even when/if the shared_ptr itself is const, manipulating the reference count isn't a problem (and doesn't require a const_cast or mutable either).
I should probably add that in reality, you'd probably structure the code a bit differently than this -- in particular, you'd normally put more (all?) of the code to manipulate the reference count into the shared_ptr_internal (or whatever you decide to call it) itself, instead of messing with those in the parent shared_ptr class.
You'll also typically support weak_ptrs. To do this, you have a second reference count for the number of weak_ptrs that point to the same shared_ptr_internal object. You destroy the final pointee object when the shared_ptr reference count goes to 0, but only destroy the shared_ptr_internal object when both the shared_ptr and weak_ptr reference counts go to 0.
It uses an internal pointer which doesn't inherit the contests of the argument, like:
(*const_ref.member)++;
Is valid.
the pointer is constant, but not the value pointed to.
Wow, what an eye opener this has all been! Thanks to everyone that I have been able to pin down the source of confusion to the fact that I always assumed the following ("a" contains the address of "b") were all equivalent.
int const *a = &b; // option1
const int *a = &b; // option2
int * const a = &b; // option3
But I was wrong! Only the first two options are equivalent. The third is totally different.
With option1 or option2, "a" can point to anything it wants but cannot change the contents of what it points to.
With option3, once decided what "a" points to, it cannot point to anything else. But it is free to change the contents of what it is pointing to. So, it makes sense that shared_ptr uses option3.

c++ unique_ptr argument passing

Suppose I have the following code:
class B { /* */ };
class A {
vector<B*> vb;
public:
void add(B* b) { vb.push_back(b); }
};
int main() {
A a;
B* b(new B());
a.add(b);
}
Suppose that in this case, all raw pointers B* can be handled through unique_ptr<B>.
Surprisingly, I wasn't able to find how to convert this code using unique_ptr. After a few tries, I came up with the following code, which compiles:
class A {
vector<unique_ptr<B>> vb;
public:
void add(unique_ptr<B> b) { vb.push_back(move(b)); }
};
int main() {
A a;
unique_ptr<B> b(new B());
a.add(move(b));
}
So my simple question: is this the way to do it and in particular, is move(b) the only way to do it? (I was thinking of rvalue references but I don't fully understand them.)
And if you have a link with complete explanations of move semantics, unique_ptr, etc. that I was not able to find, don't hesitate to share it.
EDIT According to http://thbecker.net/articles/rvalue_references/section_01.html, my code seems to be OK.
Actually, std::move is just syntactic sugar. With object x of class X, move(x) is just the same as:
static_cast <X&&>(x)
These 2 move functions are needed because casting to a rvalue reference:
prevents function "add" from passing by value
makes push_back use the default move constructor of B
Apparently, I do not need the second std::move in my main() if I change my "add" function to pass by reference (ordinary lvalue ref).
I would like some confirmation of all this, though...
I am somewhat surprised that this is not answered very clearly and explicitly here, nor on any place I easily stumbled upon. While I'm pretty new to this stuff, I think the following can be said.
The situation is a calling function that builds a unique_ptr<T> value (possibly by casting the result from a call to new), and wants to pass it to some function that will take ownership of the object pointed to (by storing it in a data structure for instance, as happens here into a vector). To indicate that ownership has been obtained by the caller, and it is ready to relinquish it, passing a unique_ptr<T> value is in place. Ther are as far as I can see three reasonable modes of passing such a value.
Passing by value, as in add(unique_ptr<B> b) in the question.
Passing by non-const lvalue reference, as in add(unique_ptr<B>& b)
Passing by rvalue reference, as in add(unique_ptr<B>&& b)
Passing by const lvalue reference would not be reasonable, since it does not allow the called function to take ownership (and const rvalue reference would be even more silly than that; I'm not even sure it is allowed).
As far as valid code goes, options 1 and 3 are almost equivalent: they force the caller to write an rvalue as argument to the call, possibly by wrapping a variable in a call to std::move (if the argument is already an rvalue, i.e., unnamed as in a cast from the result of new, this is not necessary). In option 2 however, passing an rvalue (possibly from std::move) is not allowed, and the function must be called with a named unique_ptr<T> variable (when passing a cast from new, one has to assign to a variable first).
When std::move is indeed used, the variable holding the unique_ptr<T> value in the caller is conceptually dereferenced (converted to rvalue, respectively cast to rvalue reference), and ownership is given up at this point. In option 1. the dereferencing is real, and the value is moved to a temporary that is passed to the called function (if the calles function would inspect the variable in the caller, it would find it hold a null pointer already). Ownership has been transferred, and there is no way the caller could decide to not accept it (doing nothing with the argument causes the pointed-to value to be destroyed at function exit; calling the release method on the argument would prevent this, but would just result in a memory leak). Surprisingly, options 2. and 3. are semantically equivalent during the function call, although they require different syntax for the caller. If the called function would pass the argument to another function taking an rvalue (such as the push_back method), std::move must be inserted in both cases, which will transfer ownership at that point. Should the called function forget to do anything with the argument, then the caller will find himself still owning the object if holding a name for it (as is obligatory in option 2); this in spite of that fact that in case 3, since the function prototype asked the caller to agree to the release of ownership (by either calling std::move or supplying a temporary). In summary the methods do
Forces caller to give up ownership, and be sure to actually claim it.
Force caller to possess ownership, and be prepared (by supplying a non const reference) to give it up; however this is not explicit (no call of std::move required or even allowed), nor is taking away ownership assured. I would consider this method rather unclear in its intention, unless it is explicitly intended that taking ownership or not is at discretion of the called function (some use can be imagined, but callers need to be aware)
Forces caller to explicitly indicate giving up ownership, as in 1. (but actual transfer of ownership is delayed until after the moment of function call).
Option 3 is fairly clear in its intention; provided ownership is actually taken, it is for me the best solution. It is slightly more efficient than 1 in that no pointer values are moved to temporaries (the calls to std::move are in fact just casts and cost nothing); this might be especially relevant if the pointer is handed through several intermediate functions before its contents is actually being moved.
Here is some code to experiment with.
class B
{
unsigned long val;
public:
B(const unsigned long& x) : val(x)
{ std::cout << "storing " << x << std::endl;}
~B() { std::cout << "dropping " << val << std::endl;}
};
typedef std::unique_ptr<B> B_ptr;
class A {
std::vector<B_ptr> vb;
public:
void add(B_ptr&& b)
{ vb.push_back(std::move(b)); } // or even better use emplace_back
};
void f() {
A a;
B_ptr b(new B(123)),c;
a.add(std::move(b));
std::cout << "---" <<std::endl;
a.add(B_ptr(new B(4567))); // unnamed argument does not need std::move
}
As written, output is
storing 123
---
storing 4567
dropping 123
dropping 4567
Note that values are destroyed in the ordered stored in the vector. Try changing the prototype of the method add (adapting other code if necessary to make it compile), and whether or not it actually passes on its argument b. Several permutations of the lines of output can be obtained.
Yes, this is how it should be done. You are explicitly transferring ownership from main to A. This is basically the same as your previous code, except it's more explicit and vastly more reliable.
So my simple question: is this the way to do it and in particular, is this "move(b)" the only way to do it? (I was thinking of rvalue references but I don't fully understand it so...)
And if you have a link with complete explanations of move semantics, unique_ptr... that I was not able to find, don't hesitate.
Shameless plug, search for the heading "Moving into members". It describes exactly your scenario.
Your code in main could be simplified a little, since C++14:
a.add( make_unique<B>() );
where you can put arguments for B's constructor inside the inner parentheses.
You could also consider a class member function that takes ownership of a raw pointer:
void take(B *ptr) { vb.emplace_back(ptr); }
and the corresponding code in main would be:
a.take( new B() );
Another option is to use perfect forwarding for adding vector members:
template<typename... Args>
void emplace(Args&&... args)
{
vb.emplace_back( std::make_unique<B>(std::forward<Args>(args)...) );
}
and the code in main:
a.emplace();
where, as before, you could put constructor arguments for B inside the parentheses.
Link to working example

Is it wrong to dereference a pointer to get a reference?

I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value. And I feel dirty converting back to a reference, it just seems wrong.
Is it?
To clarify...
MyType *pObj = ...
MyType &obj = *pObj;
Isn't this 'dirty', since you can (even if only in theory since you'd check it first) dereference a NULL pointer?
EDIT: Oh, and you don't know if the objects were dynamically created or not.
Ensure that the pointer is not NULL before you try to convert the pointer to a reference, and that the object will remain in scope as long as your reference does (or remain allocated, in reference to the heap), and you'll be okay, and morally clean :)
Initialising a reference with a dereferenced pointer is absolutely fine, nothing wrong with it whatsoever. If p is a pointer, and if dereferencing it is valid (so it's not null, for instance), then *p is the object it points to. You can bind a reference to that object just like you bind a reference to any object. Obviously, you must make sure the reference doesn't outlive the object (like any reference).
So for example, suppose that I am passed a pointer to an array of objects. It could just as well be an iterator pair, or a vector of objects, or a map of objects, but I'll use an array for simplicity. Each object has a function, order, returning an integer. I am to call the bar function once on each object, in order of increasing order value:
void bar(Foo &f) {
// does something
}
bool by_order(Foo *lhs, Foo *rhs) {
return lhs->order() < rhs->order();
}
void call_bar_in_order(Foo *array, int count) {
std::vector<Foo*> vec(count); // vector of pointers
for (int i = 0; i < count; ++i) vec[i] = &(array[i]);
std::sort(vec.begin(), vec.end(), by_order);
for (int i = 0; i < count; ++i) bar(*vec[i]);
}
The reference that my example has initialized is a function parameter rather than a variable directly, but I could just have validly done:
for (int i = 0; i < count; ++i) {
Foo &f = *vec[i];
bar(f);
}
Obviously a vector<Foo> would be incorrect, since then I would be calling bar on a copy of each object in order, not on each object in order. bar takes a non-const reference, so quite aside from performance or anything else, that clearly would be wrong if bar modifies the input.
A vector of smart pointers, or a boost pointer vector, would also be wrong, since I don't own the objects in the array and certainly must not free them. Sorting the original array might also be disallowed, or for that matter impossible if it's a map rather than an array.
No. How else could you implement operator=? You have to dereference this in order to return a reference to yourself.
Note though that I'd still store the items in the STL container by value -- unless your object is huge, overhead of heap allocations is going to mean you're using more storage, and are less efficient, than you would be if you just stored the item by value.
My answer doesn't directly address your initial concern, but it appears you encounter this problem because you have an STL container that stores pointer types.
Boost provides the ptr_container library to address these types of situations. For instance, a ptr_vector internally stores pointers to types, but returns references through its interface. Note that this implies that the container owns the pointer to the instance and will manage its deletion.
Here is a quick example to demonstrate this notion.
#include <string>
#include <boost/ptr_container/ptr_vector.hpp>
void foo()
{
boost::ptr_vector<std::string> strings;
strings.push_back(new std::string("hello world!"));
strings.push_back(new std::string());
const std::string& helloWorld(strings[0]);
std::string& empty(strings[1]);
}
I'd much prefer to use references everywhere but the moment you use an STL container you have to use pointers unless you really want to pass complex types by value.
Just to be clear: STL containers were designed to support certain semantics ("value semantics"), such as "items in the container can be copied around." Since references aren't rebindable, they don't support value semantics (i.e., try creating a std::vector<int&> or std::list<double&>). You are correct that you cannot put references in STL containers.
Generally, if you're using references instead of plain objects you're either using base classes and want to avoid slicing, or you're trying to avoid copying. And, yes, this means that if you want to store the items in an STL container, then you're going to need to use pointers to avoid slicing and/or copying.
And, yes, the following is legit (although in this case, not very useful):
#include <iostream>
#include <vector>
// note signature, inside this function, i is an int&
// normally I would pass a const reference, but you can't add
// a "const* int" to a "std::vector<int*>"
void add_to_vector(std::vector<int*>& v, int& i)
{
v.push_back(&i);
}
int main()
{
int x = 5;
std::vector<int*> pointers_to_ints;
// x is passed by reference
// NOTE: this line could have simply been "pointers_to_ints.push_back(&x)"
// I simply wanted to demonstrate (in the body of add_to_vector) that
// taking the address of a reference returns the address of the object the
// reference refers to.
add_to_vector(pointers_to_ints, x);
// get the pointer to x out of the container
int* pointer_to_x = pointers_to_ints[0];
// dereference the pointer and initialize a reference with it
int& ref_to_x = *pointer_to_x;
// use the reference to change the original value (in this case, to change x)
ref_to_x = 42;
// show that x changed
std::cout << x << '\n';
}
Oh, and you don't know if the objects were dynamically created or not.
That's not important. In the above sample, x is on the stack and we store a pointer to x in the pointers_to_vectors. Sure, pointers_to_vectors uses a dynamically-allocated array internally (and delete[]s that array when the vector goes out of scope), but that array holds the pointers, not the pointed-to things. When pointers_to_ints falls out of scope, the internal int*[] is delete[]-ed, but the int*s are not deleted.
This, in fact, makes using pointers with STL containers hard, because the STL containers won't manage the lifetime of the pointed-to objects. You may want to look at Boost's pointer containers library. Otherwise, you'll either (1) want to use STL containers of smart pointers (like boost:shared_ptr which is legal for STL containers) or (2) manage the lifetime of the pointed-to objects some other way. You may already be doing (2).
If you want the container to actually contain objects that are dynamically allocated, you shouldn't be using raw pointers. Use unique_ptr or whatever similar type is appropriate.
There's nothing wrong with it, but please be aware that on machine-code level a reference is usually the same as a pointer. So, usually the pointer isn't really dereferenced (no memory access) when assigned to a reference.
So in real life the reference can be 0 and the crash occurs when using the reference - what can happen much later than its assignemt.
Of course what happens exactly heavily depends on compiler version and hardware platform as well as compiler options and the exact usage of the reference.
Officially the behaviour of dereferencing a 0-Pointer is undefined and thus anything can happen. This anything includes that it may crash immediately, but also that it may crash much later or never.
So always make sure that you never assign a 0-Pointer to a reference - bugs likes this are very hard to find.
Edit: Made the "usually" italic and added paragraph about official "undefined" behaviour.