Performance of resizing std::vector<std::unique_ptr<T>>

Performance of resizing std::vector<std::unique_ptr<T>> - c++

The general conception seems to be that std::unique_ptr has no time overhead compared to properly used owning raw pointers, given sufficient optimization.
But what about using std::unique_ptr in compound data structures, in particular std::vector<std::unique_ptr<T>>? For instance, resizing the underlying data of a vector, which can happen during push_back. To isolate the performance, I loop around pop_back, shrink_to_fit, emplace_back:
#include <chrono>
#include <vector>
#include <memory>
#include <iostream>
constexpr size_t size = 1000000;
constexpr size_t repeat = 1000;
using my_clock = std::chrono::high_resolution_clock;
template<class T>
auto test(std::vector<T>& v) {
v.reserve(size);
for (size_t i = 0; i < size; i++) {
v.emplace_back(new int());
}
auto t0 = my_clock::now();
for (int i = 0; i < repeat; i++) {
auto back = std::move(v.back());
v.pop_back();
v.shrink_to_fit();
if (back == nullptr) throw "don't optimize me away";
v.emplace_back(std::move(back));
}
return my_clock::now() - t0;
}
int main() {
std::vector<std::unique_ptr<int>> v_u;
std::vector<int*> v_p;
auto millis_p = std::chrono::duration_cast<std::chrono::milliseconds>(test(v_p));
auto millis_u = std::chrono::duration_cast<std::chrono::milliseconds>(test(v_u));
std::cout << "raw pointer: " << millis_p.count() << " ms, unique_ptr: " << millis_u.count() << " ms\n";
for (auto p : v_p) delete p; // I don't like memory leaks ;-)
}
Compiling the code with -O3 -o -march=native -std=c++14 -g with gcc 7.1.0, clang 3.8.0, and 17.0.4 on Linux on a Intel Xeon E5-2690 v3 # 2.6 GHz (no turbo):
raw pointer: 2746 ms, unique_ptr: 5140 ms (gcc)
raw pointer: 2667 ms, unique_ptr: 5529 ms (clang)
raw pointer: 1448 ms, unique_ptr: 5374 ms (intel)
The raw pointer version spends all it's time in an optimized memmove (intel seems to have a much better one than clang and gcc). The unique_ptr code seems to first copy over the vector data from one memory block to the other and assign the original one with zero - all in a horribly un-optimized loop. And then it loops over the original block of data again to see if any of those that were just zero'd are nonzero and need to be deleted. The full gory detail can be seen on godbolt. The question is not how the compiled code differs, that is pretty clear. The question is why the compiler fails to optimize what is generally regarded as a no-extra-overhead abstraction.
Trying to understand how the compilers reason about handling std::unique_ptr, I was looking a bit more at isolated code. For instance:
void foo(std::unique_ptr<int>& a, std::unique_ptr<int>& b) {
a.release();
a = std::move(b);
}
or the similar
a.release();
a.reset(b.release());
none of the x86 compilers seem to be able to optimize away the senseless if (ptr) delete ptr;. The Intel compiler even gives the delete a 28 % chance. Surprisingly, the delete check is consistently omitted for:
auto tmp = b.release();
a.release();
a.reset(tmp);
These bits are not the main aspect of this question, but all of this makes me feel that I am missing something.
Why do various compilers fail to optimize reallocation within std::vector<std::unique_ptr<int>>? Is there anything in the standard that prevents generating code as efficient as with raw pointers? Is this an issue with the standard library implementation? Or are the compilers just not sufficiently clever enough (yet)?
What can one do to avoid performance impact compared to using raw pointers?
Note: Assume that T is polymorphic and expensive to move, so std::vector<T> is not an option.

The claim that unique_ptr performs as well as a raw pointer after optimization mostly applies only to the basic operations on a single pointer, such as creation, dereferencing, assignment of a single pointer and deletion. Those operations are defined simply enough that an optimizing compiler can usually make the required transformations such that the resulting code is equivalent (or nearly so) in performance to the raw version0.
One place this falls apart is especially higher level language-based optimizations on array-based containers such as std::vector, as you have noted with your test. These containers usually use source level optimizations which depend on type traits to determine at compile time if a type can safely be copied using a byte-wise copy such as memcpy, and delegate to such a method if so, or otherwise fall back to an element-wise copy loop.
To be safely copyable with memcpy an object must be trivially copyable. Now std::unique_ptr is not trivially copyable since indeed it fails several of the requirements such as having only trivial or deleted copy and move constructors. The exact mechanism depends on the standard library involved, but in general a quality std::vector implementation will end up calling a specialized form of something like std::uninitialized_copy for trivially-copyable types that just delegates to memmove.
The typical implementation details are quite tortured, but for libstc++ (used by gcc) you can see the high-level divergence in std::uninitialized_copy:
template<typename _InputIterator, typename _ForwardIterator>
inline _ForwardIterator
uninitialized_copy(_InputIterator __first, _InputIterator __last,
_ForwardIterator __result)
{
...
return std::__uninitialized_copy<__is_trivial(_ValueType1)
&& __is_trivial(_ValueType2)
&& __assignable>::
__uninit_copy(__first, __last, __result);
}
From there you can take my word that many of the std::vector "movement" methods end up here, and that __uninitialized_copy<true>::__uinit_copy(...) ultimately calls memmove while the <false> version doesn't - or you can trace through the code yourself (but you already saw the result in your benchmark).
Ultimately then, you end up with a several loops that perform the required copy steps for non-trivial objects, such as calling the move constructor of the destination object, and subsequently calling the destructor of all the source objects. These are separate loops and even modern compilers will pretty much not be able to reason about something like "OK, in the first loop I moved all the destination objects so their ptr member will be null, so the second loop is a no-op". Finally, to equal the speed of raw pointers, not only would compilers need to optimize across these two loops, they would need to have a transformation which recognizes that the whole thing can be replaced by memcpy or memmove2.
So one answer to your question is that compilers just aren't smart enough to do this optimization, but it's largely because the "raw" version has a lot of compile-time help to skip the need for this optimization entirely.
Loop Fusion
As mentioned the existing vector implementations implement a resize-type operation in two separate loops (in addition to non-loop work such as allocating the new storage and freeing the old storage):
Copying the source objects into the newly allocated destination array (conceptually using something like placement new calling the move constructor).
Destroying the source objects in the old region.
Conceptually you could imagine an alternative way: doing this all in one loop, copying each element and them immediately destroying it. It possible that a compiler could even notice that the two loops iterate over the same set of values and fuse the two loops into one. [Apparently], howevever, (https://gcc.gnu.org/ml/gcc/2015-04/msg00291.html) gcc doesn't do any loop fusion today, and nor do clang or icc if you believe this test.
So then we are left trying to put the loops together explicitly at the source level.
Now the two-loop implementation helps preserve the exception safety contract of the operation by not destroying any source objects until we know the construction part of the copy has completed, but it also helps to optimize the copy and destruction when we have trivially-copyable and trivially-destructible objects, respectively. In particular, with simple-traits based selection we can replace the copy with a memmove and the destruction loop can be elided entirely3.
So the two-loop approach helps when those optimizations apply, but it actually hurts in the general case of objects which are neither trivially copyable or destructible. It means you need two passes over the objects and you lose the opportunity to optimize and eliminate code between the copy of the object and it's subsequent destruction. In the unique_ptr case you lose the ability for the compiler to propagate the knowledge that the source unique_ptr will have a NULL internal ptr member and hence skip the if (ptr) delete ptr check entirely4.
Trivially Movable
Now one might ask whether we could apply the same type-traits compile-time optimization to the unique_ptr case. For example, one might look at the trivially copyable requirements and see that they are perhaps too strict for the common move operations in std::vector. Sure, a unique_ptr is evidently not trivially copyable since a bit-wise copy would leave both the source and destination object owing the same pointer (and result in double-deletion), but it seems that it should be bit-wise movable: if you move a unique_ptr from one area of memory to another, such that you no longer consider the source as a live object (and hence won't call its destructor) it should "just work", for the typical unique_ptr implementation.
Unfortunately, no such "trivial move" concept exists, although you could try to roll your own. There seems to be an open debate about whether this is UB or not for objects that can be byte-wise copied and do not depend on their constructor or destructor behavior in the move scenario.
You could always implement your own trivially movable concept, which would be something like (a) the object has a trivial move constructor and (b) when used as the source argument of the move constructor the object is left in a state where it's destructor has no effect. Note that such a definition is currently mostly useless, since "trivial move constructor" (basically element-wise copy and nothing else) is not consistent with any modification of the source object. So for example, a trivial move constructor cannot set the ptr member of the source unique_ptr to zero. So you'd need to jump though some more hoops such as introducing the concept of a destructive move operation which leaves the source object destroyed, rather than in a valid-but-unspecified state.
You can find some more detailed discussion of this "trivially movable" on this thread on the ISO C++ usenet discussion group. In the particular, in the linked reply, the exact issue of vectors of unique_ptr is addressed:
It turns out many smart pointers (unique_ptr and shared_ptr included)
fall into all three of those categories and by applying them you can
have vectors of smart pointers with essentially zero overhead over raw
pointers even in non-optimized debug builds.
See also the relocator proposal.
0 Although the non-vector examples at the end of your question show that this isn't always the case. Here it is due to possible aliasing as zneak explains in his answer. Raw pointers will avoid many of these aliasing issues since they lack the indirection that unique_ptr has (e.g, you pass a raw pointer by value, rather than a structure with a pointer by reference) and can often omit the if (ptr) delete ptr check entirely.
2 This is actually harder than you might think, because memmove, for example, has subtly different semantics than an object copy loop, when the source and destination overlap. Of course the high level type traits code that works for raw points knows (by contract) that there is no overlap, or the behavior of memmove is consistent even if there is overlap, but proving the same thing at some later arbitrary optimization pass may be much harder.
3 It is important to note that these optimizations are more or less independent. For example, many objects are trivially destructible that at are not trivially copyable.
4 Although in my test neither gcc nor clang were able to suppress the check, even with __restrict__ applied, apparently due to insufficiently powerful aliasing analysis, or perhaps because std::move strips the "restrict" qualifier somehow.

I don't have a precise answer for what is biting you in the back with vectors; looks like BeeOnRope might already have one for you.
Luckily, I can tell you what's biting you in the back for your micro-example involving different ways to reset pointers: alias analysis. Specifically, the compilers are unable to prove (or unwilling to infer) that the two unique_ptr references don't overlap. They force themselves to reload the unique_ptr value in case the write to the first one has modified the second one. baz doesn't suffer from it because the compiler can prove that neither parameter, in a well-formed program, could possibly alias with tmp, which has function-local automatic storage.
You can verify this by adding the __restrict__ keyword (which, as the double underscore somewhat implies, is not standard C++) to either unique_ptr reference parameter. That keyword informs the compiler that the reference is the only reference through which that memory can possibly be accessed, and therefore there is no risk that anything else can alias with it. When you do it, all three versions of your function compile to the same machine code and don't bother checking if the unique_ptr needs to be deleted.

Related

Where do standard library or compilers leverage noexcept move semantics (other than vector growth)?

Move operations should be noexcept; in the first place for intuitive and reasonable semantics. The second argument is runtime performance. From the Core Guidelines, C.66, "Make move operations noexcept":
A throwing move violates most people’s reasonably assumptions. A non-throwing move will be used more efficiently by standard-library and language facilities.
The canonical example for the performance-part of this guideline is the case when std::vector::push_back or friends need to grow the buffer. The standard requires a strong exception guarantee here, and this can only move-construct the elements into the new buffer if this is noexcept - otherwise, it must be copied. I get that, and the difference is visible in benchmarks.
However, apart from this, I have a hard time finding real-world evidence of the positive performance impact of noexcept move semantics. Skimming through the standard library (libcxx + grep), we see that std::move_if_noexcept exists, but it's almost not used within the library itself. Similarly, std::is_noexcept_swappable is merely used for fleshing out conditional noexcept qualifiers. This doesn't match existing claims, for example this one from "C++ High Performance" by Andrist and Sehr (2nd ed., p. 153):
All algorithms use std::swap() and std::move() when moving elements around, but only if the move constructor and move assignment are marked noexcept. Therefore, it is important to have these implemented for heavy objects when using algorithms. If they are not available and exception free, the elements will be copied instead.
To break my question into pieces:
Are there code paths in the standard library similar to the std::vector::push_back, that run faster when fed with std::is_nothrow_move_constructible types?
Am I correct to conclude that the cited paragraph from the book is not correct?
Is there an obvious example for when the compiler will reliably generate more runtime-efficient code when a type adheres to the noexcept guideline?
I know the third one might be a bit blurry. But if someone could come up with a simple example, this would be great.

Background: I refer to std::vector's use of noexcept as "the vector pessimization." I claim that the vector pessimization is the only reason anyone ever cared about putting a noexcept keyword into the language. Furthermore, the vector pessimization applies only to the element type's move constructor. I claim that marking your move-assignment or swap operations as noexcept has no "in-game effect"; leaving aside whether it might be philosophically satisfying or stylistically correct, you shouldn't expect it to have any effect on your code's performance.
Let's check a real library implementation and see how close I am to wrong. ;)
Vector reallocation. libc++'s headers use move_if_noexcept only inside __construct_{forward,backward}_with_exception_guarantees, which is used only inside vector reallocation.
Assignment operator for variant. Inside __assign_alt, the code tag-dispatches on is_nothrow_constructible_v<_Tp, _Arg> || !is_nothrow_move_constructible_v<_Tp>. When you do myvariant = arg;, the default "safe" approach is to construct a temporary _Tp from the given arg, and then destroy the currently emplaced alternative, and then move-construct that temporary _Tp into the new alternative (which hopefully won't throw). However, if we know that the _Tp is nothrow-constructible directly from arg, we'll just do that; or, if _Tp's move-constructor is throwing, such that the "safe" approach isn't actually safe, then it's not buying us anything and we'll just do the fast direct-construction approach anyway.
Btw, the assignment operator for optional does not do any of this logic.
Notice that for variant assignment, having a noexcept move constructor actually hurts (unoptimized) performance, unless you have also marked the selected converting constructor as noexcept! Godbolt.
(This experiment also turned up an apparent bug in libstdc++: #99417.)
string appending/inserting/assigning. This is a surprising one. string::append makes a call to __append_forward_unsafe under a SFINAE check for __libcpp_string_gets_noexcept_iterator. When you do s1.append(first, last), we'd like to do s1.resize(s1.size() + std::distance(first, last)) and then copy into those new bytes. However, this doesn't work in three situations: (1) If first, last point into s1 itself. (2) If first, last are exactly input_iterators (e.g. reading from an istream_iterator), such that it's known impossible to iterate the range twice. (3) If it's possible that iterating the range once could put it into a bad state where iterating the second time would throw. That is, if any of the operations in the second loop (++, ==, *) are non-noexcept. So in any of those three situations, we take the "safe" approach of constructing a temporary string s2(first, last) and then s1.append(s2). Godbolt.
I would bet money that the logic controlling this string::append optimization is incorrect. (EDIT: yes, it is.) See "Attribute noexcept_verify" (2018-06-12). Also observe in that godbolt that the operation whose noexceptness matters to libc++ is rv == rv, but the one it actually calls inside std::distance is lv != lv.
The same logic applies even harder in string::assign and string::insert. We need to iterate the range while modifying the string. So we need either a guarantee that the iterator operations are noexcept, or a way to "back out" our changes when an exception is thrown. And of course for assign in particular, there's not going to be any way to "back out" our changes. The only solution in that case is to copy the input range into a temporary string and then assign from that string (because we know string::iterator's operations are noexcept, so they can use the optimized path).
libc++'s string::replace does not do this optimization; it always copies the input range into a temporary string first.
function SBO. libc++'s function uses its small buffer only when the stored callable object is_nothrow_copy_constructible (and of course is small enough to fit). In that case, the callable is treated as a sort of "copy-only type": even when you move-construct or move-assign the function, the stored callable will be copy-constructed, not move-constructed. function doesn't even require that the stored callable be move-constructible at all!
any SBO. libc++'s any uses its small buffer only when the stored callable object is_nothrow_move_constructible (and of course is small enough to fit). Unlike function, any treats "move" and "copy" as distinct type-erased operations.
Btw, libc++'s packaged_task SBO doesn't care about throwing move-constructors. Its noexcept move-constructor will happily call the move-constructor of a user-defined callable: Godbolt. This results in a call to std::terminate if the callable's move-constructor ever actually does throw. (Confusingly, the error message printed to the screen makes it look as if an exception is escaping out the top of main; but that's not actually what's happening internally. It's just escaping out the top of packaged_task(packaged_task&&) noexcept and being halted there by the noexcept.)
Some conclusions:
To avoid the vector pessimization, you must declare your move-constructor noexcept. I still think this is a good idea.
If you declare your move-constructor noexcept, then to avoid the "variant pessimization," you must also declare all your single-argument converting constructors noexcept. However, the "variant pessimization" merely costs a single move-construct; it does not degrade all the way into a copy-construct. So you can probably eat this cost safely.
Declaring your copy constructor noexcept can enable small-buffer optimization in libc++'s function. However, this matters only for things that are (A) callable and (B) very small and (C) not in possession of a defaulted copy constructor. I think this describes the empty set. Don't worry about it.
Declaring your iterator's operations noexcept can enable a (dubious) optimization in libc++'s string::append. But literally nobody cares about this; and besides, the optimization's logic is buggy anyway. I'm very much considering submitting a patch to rip out that logic, which will make this bullet point obsolete. (EDIT: Patch submitted, and also blogged.)
I'm not aware of anywhere else in libc++ that cares about noexceptness. If I missed something, please tell me! I'd also be very interested to see similar rundowns for libstdc++ and Microsoft.

vector push_back, resize, reserve, etc is very important case, as it is expected to be the most used container.
Anyway, take look at std::fuction as well, I'd expect it to take advantage of noexcept move for small object optimization version.
That is, when functor object is small, and it has noexcept move constructor, it can be stored in a small buffer in std::function itself, not on heap. But if the functor doesn't have noexcept move constructor, it has to be on heap (and don't move when std::function is moved)
Overall, there ain't too many cases indeed.

Can we detect "trivial relocatability" in C++17?

In future standards of C++, we will have the concept of "trivial relocatability", which means we can simply copy bytes from one object to an uninitialized chunk of memory, and simply ignore/zero out the bytes of the original object.
this way, we imitate the C-style way of copying/moving objects around.
In future standards, we will probably have something like std::is_trivially_relocatable<type> as a type trait. currently, the closest thing we have is std::is_pod<type> which will be deprecated in C++20.
My question is, do we have a way in the current standard (C++17) to figure out if the object is trivially relocatable?
For example, std::unique_ptr<type> can be moved around by copying its bytes to a new memory address and zeroing out the original bytes, but std::is_pod_v<std::unique_ptr<int>> is false.
Also, currently the standard mandate that every uninitialized chunk of memory must pass through a constructor in order to be considered a valid C++ object. even if we can somehow figure out if the object is trivially relocatable, if we just move the bytes - it's still UB according to the standard.
So another question is - even if we can detect trivial relocatability, how can we implement trivial relocation without causing UB? simply calling memcpy + memset(src,0,...) and casting the memory address to the right type is UB.
`
Thanks!

The whole point of trivial-relocatability would seem to be to enable byte-wise moving of objects even in the presence of a non-trivial move constructor or move assignment operator. Even in the current proposal P1144R3, this ultimately requires that a user manually mark types for which this is possible. For a compiler to figure out whether a given type is trivially-relocatable in general is most-likely equivalent to solving the halting problem (it would have to understand and reason about what an arbitrary, potentially user-defined move constructor or move assignment operator does)…
It is, of course, possible that you define your own is_trivially_relocatable trait that defaults to std::is_trivially_copyable_v and have the user specialize for types that should specifically be considered trivially-relocatable. Even this is problematic, however, because there's gonna be no way to automatically propagate this property to types that are composed of trivially-relocatable types…
Even for trivially-copyable types, you can't just copy the bytes of the object representation to some random memory location and cast the address to a pointer to the type of the original object. Since an object was never created, that pointer will not point to an object. And attempting to access the object that pointer doesn't point to will result in undefined behavior. Trivial-copyabibility means you can copy the bytes of the object representation from one existing object to another existing object and rely on that making the value of the one object equal to the value of the other [basic.types]/3.
To do this for trivially-relocating some object would mean that you have to first construct an object of the given type at your target location, then copy the bytes of the original object into that, and then modify the original object in a way equivalent to what would have happened if you had moved from that object. Which is essentially a complicated way of just moving the object…
There's a reason a proposal to add the concept of trivial-relocatability to the language exists: because you currently just can't do it from within the langugage itself…
Note that, despite all this, just because the compiler frontend cannot avoid generating constructor calls doesn't mean the optimizer cannot eliminate unnecessary loads and stores. Let's have a look at what code the compiler generates for your example of moving a std::vector or std::unique_ptr:
auto test1(void* dest, std::vector<int>& src)
{
return new (dest) std::vector<int>(std::move(src));
}
auto test2(void* dest, std::unique_ptr<int>& src)
{
return new (dest) std::unique_ptr<int>(std::move(src));
}
As you can see, just doing an actual move often already boils down to just copying and overwriting some bytes, even for non-trivial types…

Author of P1144 here; somehow I'm just seeing this SO question now!
std::is_trivially_relocatable<T> is proposed for some-future-version-of-C++, but I don't predict it'll get in anytime soon (definitely not C++23, I bet not C++26, quite possibly not ever). The paper (P1144R6, June 2022) ought to answer a lot of your questions, especially the ones where people are correctly answering that if you could already implement this in present-day C++, we wouldn't need a proposal. See also my 2019 C++Now talk.
Michael Kenzel's answer says that P1144 "ultimately requires that a user manually mark types for which [trivial relocation] is possible"; I want to point out that that's kind of the opposite of the point. The state of the art for trivial relocatability is manual marking ("warranting") of each and every such type; for example, in Folly, you'd say
struct Widget {
std::string s;
std::vector<int> v;
};
FOLLY_ASSUME_FBVECTOR_COMPATIBLE(Widget);
And this is a problem, because the average industry programmer shouldn't be bothered with trying to figure out if std::string is trivially relocatable on their library of choice. (The annotation above is wrong on 1.5 of the big 3 vendors!) Even Folly's own maintainers can't get these manual annotations right 100% of the time.
So the idea of P1144 is that the compiler can just take care of it for you. Your job changes from dangerously warranting things-you-don't-necessarily-know, to merely (and optionally) verifying things-you-want-to-be-true via static_assert (Godbolt):
struct Widget {
std::string s;
std::vector<int> v;
};
static_assert(std::is_trivially_relocatable_v<Widget>);
struct Gadget {
std::string s;
std::list<int> v;
};
static_assert(!std::is_trivially_relocatable_v<Gadget>);
In your (OP's) specific use-case, it sounds like you need to find out whether a given lambda type is trivially relocatable (Godbolt):
void f(std::list<int> v) {
auto widget = [&]() { return v; };
auto gadget = [=]() { return v; };
static_assert(std::is_trivially_relocatable_v<decltype(widget)>);
static_assert(!std::is_trivially_relocatable_v<decltype(gadget)>);
}
This is something you can't really do at all with Folly/BSL/EASTL, because their warranting mechanisms work only on named types at the global scope. You can't exactly FOLLY_ASSUME_FBVECTOR_COMPATIBLE(decltype(widget)).
Inside a std::function-like type, you're correct that it would be useful to know whether the captured type is trivially relocatable or not. But since you can't know that, the next best thing (and what you should do in practice) is to check std::is_trivially_copyable. That's the currently blessed type trait that literally means "This type is safe to memcpy, safe to skip the destructor of" — basically all the things you're going to be doing with it. Even if you knew that the type was exactly std::unique_ptr<int>, or whatever, it would still be undefined behavior to memcpy it in present-day C++, because the current standard says that you're not allowed to memcpy types that aren't trivially copyable.
(Btw, technically, P1144 doesn't change that fact. P1144 merely says that the implementation is allowed to elide the effects of relocation, which is a huge wink-and-nod to implementors that they should just use memcpy. But even P1144R6 doesn't make it legal for ordinary non-implementor programmers to memcpy non-trivially-copyable types: it leaves the door open for some compiler to implement, and some library implementation to use, a __builtin_trivial_relocate function that is in some magical sense distinguishable from a plain old memcpy.)
Finally, your last paragraph refers to memcpy + memset(src,0,...). That's wrong. Trivial relocation is tantamount to just memcpy. If you care about the state of the source object afterward — if you care that it's all-zero-bytes, for example — then that must mean you're going to look at it again, which means you aren't actually treating it as destroyed, which means you aren't actually doing the semantics of a relocate here. "Copy and null out the source" is more often the semantics of a move. The point of relocation is to avoid that extra work.

Why "move semantics" rather than simply memcpy?

Given the following code:
typename std::aligned_storage<sizeof(T), alignof(T)>::type storage_t;
//this moves the back of src to the back of dst:
void push_popped(std::list<storage_t> & dstLst, std::list<storage_t> & srcLst)
{
auto & src = srcLst.back();
dstLst.push_back(storage_t());
auto & dst = dstLst.back();
std::memcpy(&dst, &src, sizeof(T));
srcLst.pop_back();
}
I'm aware of 3 reasons why this approach is not, in general, correct (even though it avoids calling src->~T() and so avoids double-reclamation of T's resources).
object members of type U* that point to other U members of the same object
hidden class members may need to be updated (vtable, for instance)
the system needs to record that no T exists anymore at src and that a T does now exist at dst
(These are mentioned here: http://www.gamedev.net/topic/655730-c-stdmove-vs-stdmemcpy/#entry5148523.)
Assuming that T is not a type whose memory address is a property of its state (std::mutex or std::condition_variable, for instance), are these the only issues with this approach? Or are there other things that could go wrong? I'd like a description of the unknown issues.
I'd like to think I have an "object relocation semantics" developed, but I'd rather not ask people to consider it if there's an obvious hole in it.

The concept of "trivially copyable" implies that a memcpy is safe. You can test if a type is trivially copyable via a trait in std.
It includes the idea that destroying it is a noop; in your case, you want destruction to not be a noop, but rather not done on the source, while being done on the dest.
The concept of "move-and-destroy-source" has been proposed in the C++1z standardization process independent of the "trivially copyable" concept. It was proposed for exception safety; there are types for which a move-construct is not exception-safe, but a move-construct-and-destroy-source would be. And there are thorny problems involving exceptions and container allocations that make a noexcept move-ctor operation very valuable.
If that gets into the standard, then a trivially-copyable-if-you-don't-destroy-source concept could also be added to the standard, if it proves valuable.
It wouldn't apply to everything move semantics can enhance, and it may require effort on the part of programmers (how the compiler can work out that "it is ok to elide a destroyer" is not going to be easy; all non-trivial non-structural properties of Turing Machine behavior are intractable.)

Why use Copy constructor instead of std::memcpy ?
The Move constructor/move assignment oeprator gives you encapsulated opratunity to do some other usefull stuff when you move an object - logging, cleaning up, etc.
preformance wise - in many cases the compiler can optimize many moves to just one (imagin many functions that simply returns some object from one another). with memcpy their abilty is much more restricted.
and finally - because C++ is not about moving bytes around - it's about using objects as a basis for your program.

Are C++11 move semantics doing something new, or just making semantics clearer?

I am basically trying to figure out, is the whole "move semantics" concept something brand new, or it is just making existing code simpler to implement? I am always interested in reducing the number of times I call copy/constructors but I usually pass objects through using reference (and possibly const) and ensure I always use initialiser lists. With this in mind (and having looked at the whole ugly && syntax) I wonder if it is worth adopting these principles or simply coding as I already do? Is anything new being done here, or is it just "easier" syntactic sugar for what I already do?

TL;DR
This is definitely something new and it goes well beyond just being a way to avoid copying memory.
Long Answer: Why it's new and some perhaps non-obvious implications
Move semantics are just what the name implies--that is, a way to explicitly declare instructions for moving objects rather than copying. In addition to the obvious efficiency benefit, this also affords a programmer a standards-compliant way to have objects that are movable but not copyable. Objects that are movable and not copyable convey a very clear boundary of resource ownership via standard language semantics. This was possible in the past, but there was no standard/unified (or STL-compatible) way to do this.
This is a big deal because having a standard and unified semantic benefits both programmers and compilers. Programmers don't have to spend time potentially introducing bugs into a move routine that can reliably be generated by compilers (most cases); compilers can now make appropriate optimizations because the standard provides a way to inform the compiler when and where you're doing standard moves.
Move semantics is particularly interesting because it very well suits the RAII idiom, which is a long-standing a cornerstone of C++ best practice. RAII encompasses much more than just this example, but my point is that move semantics is now a standard way to concisely express (among other things) movable-but-not-copyable objects.
You don't always have to explicitly define this functionality in order to prevent copying. A compiler feature known as "copy elision" will eliminate quite a lot of unnecessary copies from functions that pass by value.
Criminally-Incomplete Crash Course on RAII (for the uninitiated)
I realize you didn't ask for a code example, but here's a really simple one that might benefit a future reader who might be less familiar with the topic or the relevance of Move Semantics to RAII practices. (If you already understand this, then skip the rest of this answer)
// non-copyable class that manages lifecycle of a resource
// note: non-virtual destructor--probably not an appropriate candidate
// for serving as a base class for objects handled polymorphically.
class res_t {
using handle_t = /* whatever */;
handle_t* handle; // Pointer to owned resource
public:
res_t( const res_t& src ) = delete; // no copy constructor
res_t& operator=( const res_t& src ) = delete; // no copy-assignment
res_t( res_t&& src ) = default; // Move constructor
res_t& operator=( res_t&& src ) = default; // Move-assignment
res_t(); // Default constructor
~res_t(); // Destructor
};
Objects of this class will allocate/provision whatever resource is needed upon construction and then free/release it upon destruction. Since the resource pointed to by the data member can never accidentally be transferred to another object, the rightful owner of a resource is never in doubt. In addition to making your code less prone to abuse or errors (and easily compatible with STL containers), your intentions will be immediately recognized by any programmer familiar with this standard practice.

In the Turing Tar Pit, there is nothing new under the sun. Everything that move semantics does, can be done without move semantics -- it just takes a lot more code, and is a lot more fragile.
What move semantics does is takes a particular common pattern that massively increases efficiency and safety in a number of situations, and embeds it in the language.
It increases efficiency in obvious ways. Moving, be it via swap or move construction, is much faster for many data types than copying. You can create special interfaces to indicate when things can be moved from: but honestly people didn't do that. With move semantics, it becomes relatively easy to do. Compare the cost of moving a std::vector to copying it -- move takes roughly copying 3 pointers, while copying requires a heap allocation, copying every element in the container, and creating 3 pointers.
Even more so, compare reserve on a move-aware std::vector to a copy-only aware one: suppose you have a std::vector of std::vector. In C++03, that was performance suicide if you didn't know the dimensions of every component ahead of time -- in C++11, move semantics makes it as smooth as silk, because it is no longer repeatedly copying the sub-vectors whenever the outer vector resizes.
Move semantics makes every "pImpl pattern" type to have blazing fast performance, while means you can start having complex objects that behave like values instead of having to deal with and manage pointers to them.
On top of these performance gains, and opening up complex-class-as-value, move semantics also open up a whole host of safety measures, and allow doing some things that where not very practical before.
std::unique_ptr is a replacement for std::auto_ptr. They both do roughly the same thing, but std::auto_ptr treated copies as moves. This made std::auto_ptr ridiculously dangerous to use in practice. Meanwhile, std::unique_ptr just works. It represents unique ownership of some resource extremely well, and transfer of ownership can happen easily and smoothly.
You know the problem whereby you take a foo* in an interface, and sometimes it means "this interface is taking ownership of the object" and sometimes it means "this interface just wants to be able to modify this object remotely", and you have to delve into API documentation and sometimes source code to figure out which?
std::unique_ptr actually solves this problem -- interfaces that want to take onwership can now take a std::unique_ptr<foo>, and the transfer of ownership is obvious at both the API level and in the code that calls the interface. std::unique_ptr is an auto_ptr that just works, and has the unsafe portions removed, and replaced with move semantics. And it does all of this with nearly perfect efficiency.
std::unique_ptr is a transferable RAII representation of resource whose value is represented by a pointer.
After you write make_unique<T>(Args&&...), unless you are writing really low level code, it is probably a good idea to never call new directly again. Move semantics basically have made new obsolete.
Other RAII representations are often non-copyable. A port, a print session, an interaction with a physical device -- all of these are resources for whom "copy" doesn't make much sense. Most every one of them can be easily modified to support move semantics, which opens up a whole host of freedom in dealing with these variables.
Move semantics also allows you to put your return values in the return part of a function. The pattern of taking return values by reference (and documenting "this one is out-only, this one is in/out", or failing to do so) can be somewhat replaced by returning your data.
So instead of void fill_vec( std::vector<foo>& ), you have std::vector<foo> get_vec(). This even works with multiple return values -- std::tuple< std::vector<A>, std::set<B>, bool > get_stuff() can be called, and you can load your data into local variables efficiently via std::tie( my_vec, my_set, my_bool ) = get_stuff().
Output parameters can be semantically output-only, with very little overhead (the above, in a worst case, costs 8 pointer and 2 bool copies, regardless of how much data we have in those containers -- and that overhead can be as little as 0 pointer and 0 bool copies with a bit more work), because of move semantics.

There is absolutely something new going on here. Consider unique_ptr which can be moved, but not copied because it uniquely holds ownership of a resource. That ownership can then be transferred by moving it to a new unique_ptr if needed, but copying it would be impossible (as you would then have two references to the owned object).
While many uses of moving may have positive performance implications, the movable-but-not-copyable types are a much bigger functional improvement to the language.
In short, use the new techniques where it indicates the meaning of how your class should be used, or where (significant) performance concerns can be alleviated by movement rather than copy-and-destroy.

No answer is complete without a reference to Thomas Becker's painstakingly exhaustive write up on rvalue references, perfect forwarding, reference collapsing and everything related to that.
see here: http://thbecker.net/articles/rvalue_references/section_01.html

I would say yes because a Move Constructor and Move Assignment operator are now compiler defined for objects that do not define/protect a destructor, copy constructor, or copy assignment.
This means that if you have the following code...
struct intContainer
{
std::vector<int> v;
}
intContainer CreateContainer()
{
intContainer c;
c.v.push_back(3);
return c;
}
The code above would be optimized simply by recompiling with a compiler that supports move semantics. Your container c will have compiler defined move-semantics and thus will call the manually defined move operations for std::vector without any changes to your code.

Since move semantics only apply in the presence of rvalue
references, which are declared by a new token, &&, it seems
very clear that they are something new.
In principle, they are purely an optimizing techique, which
means that:
1. you don't use them until the profiler says it is necessary, and
2. in theory, optimizing is the compiler's job, and move
semantics aren't any more necessary than register.
Concerning 1, we may, in time, end up with an ubiquitous
heuristic as to how to use them: after all, passing an argument
by const reference, rather than by value, is also an
optimization, but the ubiquitous convention is to pass class
types by const reference, and all other types by value.
Concerning 2, compilers just aren't there yet. At least, the
usual ones. The basic principles which could be used to make
move semantics irrelevant are (well?) known, but to date, they
tend to result in unacceptable compile times for real programs.
As a result: if you're writing a low level library, you'll
probably want to consider move semantics from the start.
Otherwise, they're just extra complication, and should be
ignored, until the profiler says otherwise.

return value optimization vs auto_ptr for large vectors

If I use auto_ptr as a return value of a function that populates large vectors, this makes the function a source function (it will create an internal auto_ptr and pass over ownership when it returns a non const auto_ptr). However, I cannot use this function with STL algorithms because, in order to access the data, I need to derefference the auto_ptr. A good example I guess would be a field of vectors of size N, with each vector having 100 components. Wether the function returns each 100 component vector by value or by ref is not the same, if N is large.
Also, when I try this very basic code:
class t
{
public:
t() { std::cout << "ctor" << std::endl; }
~t() { std::cout << "dtor" << std::endl; }
};
t valueFun()
{
return t();
}
std::auto_ptr<t> autoFun()
{
return std::auto_ptr(new t());
}
both autoFun and fun calls result with the output
Ctor
Dtor
so I cannot actually see the automatic variable which is being created to be passed away to the return statement. Does this mean that the Return Value Optimization is set for the valueFun call? Does valueFun create two automatic objects at all in this case?
How do I then optimize a population of such a large data structure with a function?

There are many options for this, and dynamic allocation may not be the best.
Before we even delve in this discussion: is this a bottleneck ?
If you did not profile and ensured it was a bottleneck, then this discussion could be completely off... Remember than profiling debug builds is pretty much useless.
Now, in C++03 there are several options, from the most palatable to the least one:
trust the compiler: unnamed variables use RVO even in Debug builds in gcc, for example.
use an "out" parameter (pass by reference)
allocate on the heap and return a pointer (smart or not)
check the compiler output
Personally, I would trust my compiler on this unless a profiler proves I am wrong.
In C++11, move semantics help us getting more confident, because whenever there is a return statement, if RVO cannot kick in, then a move constructor (if available) can be used automatically; and move constructors on vector are dirt cheap.
So it becomes:
trust the compiler: either RVO or move semantics
allocate on the heap and return a unique_ptr
but really the second point should be used only for those few classes where move semantics do not help much: the cost of move semantics is usually proportional to the return of sizeof, for example a std::array<T,10> has a size equal to 10*sizeof(T) so it's not so good and might benefit from heap allocation + unique_ptr.
Tangent: you trust your compiler already. You trust it to warn you about errors, you trust it to warn you about dangerous/probably incorrect constructs, you trust it to correctly translate your code into machine assembly, you trust it to apply meaningful optimization to get a decent speed-up... Not trusting a compiler to apply RVO in obvious cases is like not trusting your heart surgeon with a $10 bill: it's the least of your worries. ;)

I am fairly sure that the compiler will do Return Value Optimization for valueFun. The main cases where return value optimization cannot be applied by the compiler are:
returning parameters
returning a different object based on a conditional
Thus the auto_ptr is not necessary, and would be even slower due to having to use the heap.
If you are still worried about the costs of moving around such a large vector, you might want to look in to using the move semantics(std::vector aCopy(std::move(otherVector)) of C++11. These are almost as fast as RVO and can be used anywhere(it is also guaranteed to be used for return values when RVO is not able to be used.)
I believe most modern compilers support move semantics(or rvalue references technically) at this point

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js