shared_ptr assignment: order of reference counting - c++

When you use the copy assignment operator of a shared_ptr, conceptually, the shared_ptr on the left hand side of the assignment would need to decrement the reference count of the object it currently owns, and then increment the reference count of the object on the right-hand side of the assignment. (Assuming, of course, that both pointers are non-null.)
So an implementation might look something like the following pseudo code:
shared_ptr& operator = (const shared_ptr& rhs)
{
decrement_reference_count(this->m_ptr);
this->m_ptr = rhs.m_ptr;
increment_reference_count(this->m_ptr);
return *this;
}
But note that here we decrement the reference count of this before we increment the reference count of rhs. We could also do it the other way around. My question is, does the standard actually specify the order here?
Why it makes a difference: it could make a big difference in the event that there is some kind of dependency between the reference count of this and the reference count of lhs. For example, suppose both are part of a linked list structure, where the next pointer in each linked node is a shared_ptr. So, decrementing the reference count of any node in the structure could trigger a destructor, which would then set off a chain reaction and decrement the reference count (and possibly also destruct) every other node in the chain.
So, supposing a situation where the reference count of lhs is affected by the reference count of this, it makes a big difference if we first decrement this, or we first increment lhs. If we first increment lhs before decrementing this, then we can be sure that lhs will not end up being destructed when we decrement this.
But does the standard actually specify an order here? As far as I can see, the only thing the standard says is that the copy assignment operator is equivalent to the expression:
shared_ptr(lhs).swap(*this)
But I can't really wrap my head around the implications (if any) that this equivalency might have in regard to the order of decrementing/incrementing the reference counts.
So does the standard specify an order here? Or is this implementation defined behavior?

The Standard says [20.7.2.2.3] that
shared_ptr& operator=(const shared_ptr& r) noexcept;
has effects equivalent to
shared_ptr(r).swap(*this)
This means constructing a temporary, which increments r's reference count, then swapping its data with *this, then destroying the temporary, which means decrementing the reference count that used to belong to *this.

It must increment the reference counter first, in case rhs is this. Otherwise it could inadvertently destroy the pointee when the reference counter is 1. It could check whether this == &rhs but this check is unnecessary if the reference counter increment is performed before the decrement.
shared_ptr(lhs).swap(*this) does not suffer from this issue because it creates a copy first, thus incrementing the reference counter first.

Related

Will this C++ code always work as I expect, or is the execution order not guaranteed?

OK, I have some code that seems to work but I'm not sure it will always work. I'm moving a unique_ptr into a stl map using one of the members of the class as the map key, but I'm not sure whether the move might invalidate the pointer in some situations.
The code is along these lines:
struct a
{
std::string s;
};
std::map<std::string, std::unique_ptr<a>> m;
std::unique_ptr<a> p = std::make_unique<a>();
// some code that does stuff
m[p->s] = std::move(p);
So this currently seems works but it seems to me it might be possible for p to become invalid before the string is used as the map key, and that would lead to a memory exception. Obviously I could create a temporary string before the move, or I could assign via an iterator, but I'd prefer not to if it isn't necessary.
This code has well-defined behaviour.
In C++17, std::move(p) will be evaluated before m[p->s]. Before C++17, std::move(p) could be evaluated either before or after m[p->s]. However, this doesn't matter because std::move(p) does not modify p. It is only the assignment that actually causes p to be moved-from.
The assignment operator that is called has the signature
unique_ptr& operator=(unique_ptr&& other);
and is called as if by
m[p->s].operator=(std::move(p));
This means that the modification of p is guaranteed to not take place until the body of operator= is entered (the initialization of the other parameter is merely a reference binding). And certainly the body of operator= cannot be entered until the object expression m[p->s] is evaluated.
So your code is well-defined in all versions of C++.
The code is fine. In C++ 17, we were given strong guarantees on the sequencing, which makes this code 100% OK.
Prior to C++17 the standard has
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
But that still means the code is okay. We don't know which of m[p->s] and std::move(p) happens first, but since move does't actually do anything to p, p->s will be valid and be resolved before p is moved into m[p->s]

what happens to lvalue passed in function as rvalue (c++)?

I have been wondering about that all day long and I can't find an answer to that specific case.
Main :
std::vector<MyObject*> myVector;
myVector.reserve(5);
myFunction(std::move(myVector));
myFunction :
void myFunction(std::vector<MyObject*> && givenVector){
std::vector<MyObject*> otherVector = givenVector;
std::cout << givenVector[0];
// do some stuff
}
My questions are :
in the main, is myVector destroyed by the function myFunction() because it is considered as an rvalue or does the compiler knows that it is also a lvalue and therefore performs a copy before sending it to myFunction ? What happens if I try to use the vector after the call to myFunction()?
inside the function myFunction() , is the vector givenVector destroyed when affected to otherVector ? if so, what happens when I try to print it ? if not is it useful to use rvalue in this function ?
Looks like duplicate.
myVector is not destroyed by the function myFunction(). It's unspecifed what should happen in general case with class with stealen resources.
givenVector is not destroyed when affected to otherVector. It's unspecifed what should happen in general case with class with stealen resources.
In order to be compilable, you should apply a std::move to your vector before you pass it to the function (--at least if no further overloads exists):
myFunction(std::move(myVector));
Then, inside the function, by
std::vector<MyObject*> otherVector = std::move(givenVector);
the move constructor of std::vector is called which basically moves all the content out of the vector (note however again the std::move on the right-hand side -- otherwise you'll get a copy). By this, the vector is not "destroyed". Even after the move it is still alive, yet in an unspecified state.
That means that those member functions which pose no specific condition on the state of the vector might be called, such as the destructor, the size() operator and so on. A pop_back() or a derefencing of a vector element however will likely fail.
See here for a more detailed explanation what you still can do with a moved-from object.
The code won't compile, since you try to bind an lvalue to an rvalue reference. You'll need to deliberately convert it to an rvalue:
myFunction(std::move(givenVector));
Simply doing this won't "destroy" the object; what happens to it depends on what the function does. Generally, functions which take rvalue references do so in order to move from the argument, in which case they might leave it in some valid but "empty" state, but won't destroy it.
Your code moves the vector to the local otherVector, leaving it empty. Then you try to print the first element of an empty vector, giving undefined behaviour.
No copy is performed. What happens to myVector depends on what myFunction does with it. You should consider objects that have been moved from as either being the same or being empty. You can assign new values and keep using it or destroy it.
myVector is fine. It is an lvalue and otherVector makes a copy of it. You most likely wanted to write otherVector = std::move(myVector);, in which case myVector should be empty. If you have an old implementation of the STL (that does not know about move semantics) a copy is performed and myVector is not changed. If that makes sense is for you to decide. You moved a given vector to a new vector, which can be useful. Printing an empty vector is not so useful.
If a function gets an argument by rvalue-reference, that does not mean it will be destructively used, only that it can be.
Destructive use means that the passed object is thereafter in some unspecified but valid state, fit only for re-initializing, mving, copying or destruction.
In the function, the argument has a name and thus is an lvalue.
To mark the place(s) where you want to take advantage of the licence to ruthlessly plunder it, you have to convert it to an rvalue-reference on passing it on, for example with std::move or std::forward, the latter mostly for templates.

C++ multiple access to rvalue reference in the same statement as perfect forwarding

Is the following code safe? Particularly, if vec is an rvalue reference, does the last line do what it should (namely a recursion in which the elements of vec are correctly summed up)?
template<typename VectorType>
auto recurse(VectorType&& vec, int i)
{
if(i<0)
return decltype(vec[i])();
return vec[i] + recurse(std::forward<VectorType>(vec), i-1); //is this line safe?
}
My doubts are related to the fact that since the order of evaluation is unspecified, the vector could have been moved before operator[] is evaluated, and therefore the latter could fail.
Is this fear justified, or is there some rule that prevents this?
Consider the following:
std::vector<int> things;
// fill things here
const auto i = static_cast<int>(things.size()) - 1;
// "VectorType &&" resolves to "std::vector<int> &&" -- forwarded indirectly
recurse(move(things), i);
// "VectorType &&" resolves to "std::vector<int> &" -- also forwarded indirectly
recurse(things, i);
// "VectorType &&" resolves to "const std::vector<int> &" -- again, forwarded indirectly
recurse(static_cast<const std::vector<int> &>(things), i);
Even after all 3 calls to recurse in the example above, the things vector would still be intact.
If the recursion were changed to:
return vec[i] + recurse(forward<VectorType>(vec), --i);
the results would then be undefined, as either vec[i] or --i could be evaluated in either order.
Function calls are like sequence points: The results of argument expressions must be computed before the function is called. The order in which this happens, however, is undefined -- even with respect to sub-expressions within the same statement.
Forcing the construction of an intermediate within the recursion statement would also result in undefined behavior.
For example:
template<typename VectorType>
auto recurse(VectorType &&vec, int i)
{
using ActualType = typename std::decay<VectorType>::type;
if(i < 0){
return decltype(vec[i]){};
}
return vec[i] + recurse(ActualType{forward<VectorType>(vec)}, i - 1);
}
Here, vec[i] or ActualType{forward<VectorType>(vec)} could be evaluated in either order. The latter would either copy-construct or move-construct a new instance of ActualType, which is why this is undefined.
In Summary
Yes, your example will sum the contents of the vector.
There is no reason for the compiler to construct an intermediate, so successive invocations of recurse each receive an indirect reference to the same, unchanging instance.
Addendum
As pointed out in a comment, it is possible for a non-const operator[] to mutate the instance of VectorType, whatever that might be. In this case, the result of the recursion would be undefined.
You are correct, evaluation of vec[i] and the recursive call is indeterminately ordered (they can't actually overlap, but can happen in either of the two possible orders).
I don't see why you're taking an rvalue reference, though, because you don't ever actually move from vec.
Unless VectorType has operator[](index_type) &&, I don't see how the indeterminate execution actually leads to failure. (On the other hand, the infinite recursion ildjarn spotted will cause a failure). Actually, operator[](index_type) && would lead to failure with all execution orders.
This function should work just fine with a const& parameter type, and then you wouldn't be worrying.
std::forward won't mutate your object by itself. It will cast vec to the same value category that was deduced when the parent call to recurse was made. For that matter, std::move also won't by itself mutate its argument.
However, casting to the rvalue category can indirectly enable mutation to happen (and this is the whole point, it enables some optimizations based on mutating the moved from object). If the rvalue overload for the function you are calling mutates its argument.
In this case it looks like no-one is mutating things (as long as vec[i] or operator+ isn't mutating things), so I think you would be safe here no matter what the order of evaluation is.
As another answerer pointed out, passing by const ref means the compiler would complain if mutation could be happening somewhere (eg operator[] for the current vector type is non const, or operator+ is non-const, etc). This would allow you to worry less.

What is the difference between these two parameters in C++?

I am new to C++ and currently am learning about templates and iterators.
I saw some code implementing custom iterators and I'm curious to know what the difference between these two iterator parameters is:
iterator & operator=(iterator i) { ... i.someVar }
bool operator==(const iterator & i) { ... i.someVar }
They implement the = and == operators for the particular iterator. Assuming the iterator class has a member variable 'someVar', why is one operator implemented using "iterator i" and another with "iterator & i"? Is there any difference between the two "i.someVar" expressions?
I googled a little and found this question
Address of array - difference between having an ampersand and no ampersand
to which the answer was "the array is converted to a pointer and its value is the address of the first thing in the array." I'm not sure this is related, but it seems like the only valid explanation I could find.
Thank you!
operator= takes its argument by value (a.k.a. by copy). operator == takes its argument by const reference (a.k.a. by address, albeit with a guarantee that the object will not be modified).
An iterator may be/contain a pointer into an array but it is not itself an array.
The ampersand (&) has different contextual meanings. Used in an expression, it behaves as an operator. Used in a declaration such as iterator & i, it forms part of the type iterator & and indicates that i is a reference, as opposed to an object.
For more discussion (with pictures!), see Pass by Reference / Value in C++ and What's the difference between passing by reference vs. passing by value? (this one is language agnostic).
the assignment operator = takes the iterator i as value, which means a copy of the original iterator is made and passed to the function so any changes applied to the iterator i inside the operator method won't affect the original.
the comparison operator == takes a constant reference, which denotes that the original object can't/shouldn't be changed in the method. This makes sense since a comparison operator usually only compares objects without changing them. The reference allows to pass a reference to the original iterator which lives outside the method. This means that the actual object won't be copied which is usually faster.
First, you don't have an address of an array here.
There's no semantic difference, unless you try to make a local change to the local variable i: iterator i will allow a local change, while const iterator & i will not.
Many people are used to writing const type & var for function parameters because passing by reference can be faster than by value, especially if type is big and expensive to copy, but in your case, an iterator should be small and cheap to copy, so there's no gain from avoiding copying. (Actually, having a local copy can enhance locality of reference and help optimization, so I would just pass small values by value (by copying).)

Move constructor and pre-increment vs post-increment

In C++, if you have a for loop that "copies" objects of a user defined type using a move constructor, does it make any difference if you use ++i or i++ as the loop counter?
I know this question seems rather vague, but I was (I believe) asked this in a phone interview. I wasn't sure if I understood the question correctly, and the interviewer took this as my not knowing the answer, and cut the interview short.
What could he have been getting at?
In C++, if you have a for loop that "copies" objects of a user defined type using a move constructor [...]
First of all, a move constructor is used for move-constructing, which usually means you are not "copying": you can realize moving as copying - in fact, a class which is copy-constructible is also move-constructible - but then why defining a move constructor explicitly?
[...] does it make any difference if you use ++i or i++ as the loop counter?
It depends on what i is. If it is a scalar object, like an int, then there is no difference at all.
If i is a class-type iterator, on the other hand, ++i should be more efficient (on a purely theoretical ground), because the implementation of operator ++ will not have to create a copy of the iterator to be returned before the iterator itself is incremented.
Here, for instance, is how stdlibc++ defines the increment operators for the iterator type of an std::list:
_Self&
operator++()
{
_M_node = _M_node->_M_next;
return *this;
}
_Self
operator++(int)
{
_Self __tmp = *this;
_M_node = _M_node->_M_next;
return __tmp;
}
As you can see, the postfix version (the one accepting a dummy int) has more work to do: it needs to create a copy of the original iterator to be refurned, then alter the iterator's internal pointer, then return the copy.
On the other hand, the prefix version just has to alter the internal pointer and return (a reference to) itself.
However, please keep in mind that when performance is concerned, all assumptions have to be backed up by measurement. In this case, I do not expect any sensible difference between these two functions.