How is the swap function implemented in the STL? Is it as simple as this:
template<typename T> void swap(T& t1, T& t2) {
T tmp(t1);
t1=t2;
t2=tmp;
}
In other posts, they talk about specializing this function for your own class. Why would I need to do this? Why can't I use the std::swap function?
How is std::swap implemented?
Yes, the implementation presented in the question is the classic C++03 one.
A more modern (C++11) implementation of std::swap looks like this:
template<typename T> void swap(T& t1, T& t2) {
T temp = std::move(t1); // or T temp(std::move(t1));
t1 = std::move(t2);
t2 = std::move(temp);
}
This is an improvement over the classic C++03 implementation in terms of resource management because it prevents unneeded copies, etc. It, the C++11 std::swap, requires the type T to be MoveConstructible and MoveAssignable, thus allowing for the implementation and the improvements.
Why would I need to provide a custom implementation?
A custom implementation of swap, for a specific type, is usually advised when your implementation is more efficient or specific than the standard version.
A classic (pre-C++11) example of this is when your class manages a large amount of resources that would be expensive to copy and then delete. Instead, your custom implementation could simply exchange the handles or pointers required to effect the swap.
With the advent of std::move and movable types (and implemented your type as such), circa C++11 and onwards, a lot of the original rationale here is starting to fall away; but nevertheless, if a custom swap would be better than the standard one, implement it.
Generic code will generally be able to use your custom swap if it uses the ADL mechanism appropriately.
How is the swap function implemented in the STL?
Which implementation? It's a specification, not a single concrete library. If you mean how does my compiler's standard library do it, either tell us which compiler that is, or read the code yourself.
Is it as simple as this:
That's essentially the naive version pre-C++11.
This un-specialized implementation forces a copy: for T = std::vector<SomethingExpensive> in your example, the code translates as:
template<typename T> void swap(T& t1, T& t2) {
T tmp(t1); // duplicate t1, making an expensive copy of each element
t1=t2; // discard the original contents of t1,
// and replace them with an expensive duplicate of t2
t2=tmp; // discard the original contents of t2,
// and replace them with an expensive duplicate of tmp
} // implicitly destroy the expensive temporary copy of t1
so to exchange two vectors we essentially created three. There were three dynamic allocations and a lot of expensive objects copied, and any of those operations could throw, possibly leaving the arguments in an indeterminate state.
Since this was obviously awful, overloads were provided for expensive containers, and you were encouraged to write overloads for your own expensive types: eg. the std::vector specialization had access to the vector's internals, and could swap two vectors without all the copying:
template <typename T> void swap(vector<T> &v1, vector<T> &v2) { v1.swap(v2); }
template <typename T> void vector<T>::swap(vector<T>& other) {
swap(this->size_, other.size_); // cheap integer swap of allocated count
swap(this->used_, other.used_); // cheap integer swap of used count
swap(this->data__, other.data_); // cheap pointer swap of data ptr
}
Note that this involves no copies at all of anything expensive, no dynamic (de)allocation, and is guaranteed not to throw.
Now, the reason for this specialization is that vector::swap has access to vector's internals, and can safely and efficiently move them around without copying.
Why would I need to do this [specializing ... for your own class] ?
Pre-C++11, for the same reason as std::vector - to make swapping efficient and exception-safe.
Since C++11, you really don't - if you either provide move construction and assignment, or the compiler can generate sane defaults for you.
The new generic swap:
template <typename T> void swap(T& t1, T& t2) {
T temp = std::move(t1);
t1 = std::move(t2);
t2 = std::move(temp);
}
can use move construction/assignment to get essentially the same behaviour as the custom vector implementation above, without needing to write a custom implementation at all.
Related
My teammates are writing a fixed-size implementation of std::vector for a safety-critical application. We're not allowed to use heap allocation, so they created a simple array wrapper like this:
template <typename T, size_t NUM_ITEMS>
class Vector
{
public:
void push_back(const T& val);
...more vector methods
private:
// Internal storage
T storage_[NUM_ITEMS];
...implementation
};
A problem we encountered with this implementation is that it requires elements present default constructors (which is not a requirement of std::vector and created porting difficulties). I decided to hack on their implementation to make it behave more like std::vector and came up with this:
template <typename T, size_t NUM_ITEMS>
class Vector
{
public:
void push_back(const T& val);
...more vector methods
private:
// Internal storage
typedef T StorageType[NUM_ITEMS];
alignas(T) char storage_[NUM_ITEMS * sizeof(T)];
// Get correctly typed array reference
StorageType& get_storage() { return reinterpret_cast<T(&)[NUM_ITEMS]>(storage_); }
const StorageType& get_storage() const { return reinterpret_cast<const T(&)[NUM_ITEMS]>(storage_); }
};
I was then able to just search and replace storage_ with get_storage() and everything worked. An example implementation of push_back might then look like:
template <typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::push_back(const T& val)
{
get_storage()[size_++] = val;
}
In fact, it worked so easily that it got me thinking.. Is this a good/safe use of reinterpret_cast? Is the code directly above a suitable alternative to placement new, or are there risks associated with copy/move assignment to an uninitialized object?
EDIT: In response to a comment by NathanOliver, I should add that we cannot use the STL, because we cannot compile it for our target environment, nor can we certify it.
The code you've shown is only safe for POD types (Plain Old Data), where the object's representation is trivial and thus assignment to an unconstructed object is ok.
If you want this to work in all generality (which i assume you do due to using a template), then for a type T it is undefined behavior to use the object prior to construction it. That is, you must construct the object before your e.g. assignment to that location. That means you need to call the constructor explicitly on demand. The following code block demonstrates an example of this:
template <typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::push_back(const T& val)
{
// potentially an overflow test here
// explicitly call copy constructor to create the new object in the buffer
new (reinterpret_cast<T*>(storage_) + size_) T(val);
// in case that throws, only inc the size after that succeeds
++size_;
}
The above example demonstrates placement new, which takes the form new (void*) T(args...). It calls the constructor but does not actually perform an allocation. The visual difference is the inclusion of the void* argument to operator new itself, which is the address of the object to act on and call the constructor for.
And of course when you remove an element you'll need to destroy that explicitly as well. To do this for a type T, simply call the pseudo-method ~T() on the object. Under templated context the compiler will work out what this means, either an actual destructor call, or no-op for e.g. int or double. This is demonstrated below:
template<typename T, size_t NUM_ITEMS>
void Vector<T, NUM_ITEMS>::pop_back()
{
if (size_ > 0) // safety test, you might rather this throw, idk
{
// explicitly destroy the last item and dec count
// canonically, destructors should never throw (very bad)
reinterpret_cast<T*>(storage_)[--size_].~T();
}
}
Also, I would avoid returning a refernce to an array in your get_storage() method, as it has length information and would seem to imply that all elements are valid (constructed) objects, which of course they're not. I suggest you provide methods for getting a pointer to the start of the contiguous array of constructed objects, and another method for getting the number of constructed objects. These are the .data() and .size() methods of e.g. std::vector<T>, which would make use of your class less jarring to seasoned C++ users.
Is this a good/safe use of reinterpret_cast?
Is the code directly above a suitable alternative to placement new
No. No.
or are there risks associated with copy/move assignment to an uninitialized object?
Yes. The behaviour is undefined.
Assuming memory is uninitialised, copying the vector has undefined behaviour.
No object of type T has started its lifetime at the memory location. This is super bad when T is not trivial.
The reinterpretation violates the strict aliasing rules.
First is fixed by value-initialising the storage. Or by making the vector non-copyable and non-movable.
Second is fixed by using placement new.
Third is technically fixed by using using the pointer returned by placement new, but you can avoid storing that pointer by std::laundering after reinterpreting the storage.
While looking at the documentation for std::swap, I see a lot of specializations.
It looks like every STL container, as well as many other std facilities have a specialized swap.
I thought with the aid of templates, we wouldn't need all of these specializations?
For example,
If I write my own pair it works correctly with the templated version:
template<class T1,class T2>
struct my_pair{
T1 t1;
T2 t2;
};
int main() {
my_pair<int,char> x{1,'a'};
my_pair<int,char> y{2,'b'};
std::swap(x,y);
}
So what it is gained from specializing std::pair?
template< class T1, class T2 >
void swap( pair<T1,T2>& lhs, pair<T1,T2>& rhs );
I'm also left wondering if I should be writing my own specializations for custom classes,
or simply relying on the template version.
So what it is gained from specializing std::pair?
Performance. The generic swap is usually good enough (since C++11), but rarely optimal (for std::pair, and for most other data structures).
I'm also left wondering if I should be writing my own specializations for custom classes, or simply relying on the template version.
I suggest relying on the template by default, but if profiling shows it to be a bottleneck, know that there is probably room for improvement. Premature optimization and all that...
std::swap is implemented along the lines of the code below:
template<typename T> void swap(T& t1, T& t2) {
T temp = std::move(t1);
t1 = std::move(t2);
t2 = std::move(temp);
}
(See "How does the standard library implement std::swap?" for more information.)
So what it is gained from specializing std::pair?
std::swap can be specialized in the following way (simplified from libc++):
void swap(pair& p) noexcept(is_nothrow_swappable<first_type>{} &&
is_nothrow_swappable<second_type>{})
{
using std::swap;
swap(first, p.first);
swap(second, p.second);
}
As you can see, swap is directly invoked on the elements of the pair using ADL: this allows customized and potentially faster implementations of swap to be used on first and second (those implementations can exploit the knowledge of the internal structure of the elements for more performance).
(See "How does using std::swap enable ADL?" for more information.)
Presumably this is for performance reasons in the case that the pair's contained types are cheap to swap but expensive to copy, like vector. Since it can call swap on first and second instead of doing a copy with temporary objects it may provide a significant improvement to program performance.
The reason is performance, especially pre c++11.
Consider something like a "Vector" type. The Vector has three fields: size, capacity and a pointer to the actual data. It's copy constructor and copy assignment copy the actual data. The C++11 version also has a move constructor and move assignment that steal the pointer, setting the pointer in the source object to null.
A dedicated Vector swap implementation can simply swap the fields.
A generic swap implementation based on the copy constructor, copy assignment and destructor will result in data copying and dynamic memory allocation/deallocation.
A generic swap implementation based on the move constructor, move assignment and destructor will avoid any data copying or memory allocation but it will leave some redundant nulling and null-checks which the optimiser may or may not be able to optimise away.
So why have a specialised swap implementation for "Pair"? For a pair of int and char there is no need. They are plain old data types so a generic swap is just fine.
But what if I have a pair of say Vector and String ? I want to use the specialist swap operations for those types and so I need a swap operation on the pair type that handles it by swapping it's component elements.
The most efficient way to swap two pairs is not the same as the most efficient way to swap two vectors. The two types have a different implementation, different member variables and different member functions.
There is no just generic way to "swap" two objects in this manner.
I mean, sure, for a copyable type you could do this:
T tmp = a;
a = b;
b = tmp;
But that's horrendous.
For a moveable type you can add some std::move and prevent copies, but then you still need "swap" semantics at the next layer down in order to actually have useful move semantics. At some point, you need to specialise.
There is a rule (I think it comes from either Herb Sutter's Exceptional C++ or Scott Meyer's Effective C++ series) that if your type can provide a swap implementation that does not throw, or is faster than the generic std::swap function, it should do so as member function void swap(T &other).
Theoretically, the generic std::swap() function could use template magic to detect the presence of a member-swap and call that instead of doing
T tmp = std::move(lhs);
lhs = std::move(rhs);
rhs = std::move(tmp);
but no-one seems to have thought about that one, yet, so people tend to add overloads of free swap in order to call the (potentially faster) member-swap.
As far as I can tell, the requirements on an allocator to be used with STL
containers are laid out in Table 28 of section 17.6.3.5 of the C++11 standard.
I'm a bit confused about the interaction between some of these requirements.
Given a type X that is an allocator for type T, a type Y that is "the
corresponding allocator class" for type U, instances a, a1, and a2 of
X, and an instance b of Y, the table says:
The expression a1 == a2 evaluates to true only if storage allocated
from a1 can be deallocated by a2, and vice versa.
The expression X a1(a); is well-formed, doesn't exit via an exception,
and afterward a1 == a is true.
The expression X a(b) is well-formed, doesn't exit via an exception, and
afterward a == b.
I read this as saying that all allocators must be copy-constructible in such a
way that the copies are interchangeable with the originals. Worse, the same
true across type boundaries. This seems to be a pretty onerous requirement; as
far as I can tell, it makes impossible a large number of types of allocators.
For example, say I had a freelist class that I wanted to use in my allocator,
in order to cache freed objects. Unless I'm missing something, I couldn't
include an instance of that class in the allocator, because the sizes or
alignments of T and U might differ and therefore the freelist entries are
not compatible.
My questions:
Are my interpretations above correct?
I've read in a few places that C++11 improved support for "stateful
allocators". How is that the case, given these restrictions?
Do you have any suggestions for how to do the sort of thing I'm trying to
do? That is, how do I include allocated-type-specific state in my allocator?
In general, the language around allocators seems sloppy. (For example, the
prologue to Table 28 says to assume that a is of type X&, but some of the
expressions redefine a.) Also, at least GCC's support is non-conformant.
What accounts for this weirdness around allocators? Is it just an infrequently
used feature?
Equality of allocators does not imply that they must have exactly the same internal state, only that they must both be able to deallocate memory that was allocated with either allocator. Cross-type equality of allocators a == b for an allocator a of type X and allocator b of type Y is defined in table 28 as "same as a == Y::template rebind<T>::other(b)". In other words, a == b if memory allocated by a can be deallocated by an allocator instantiated by rebinding b to a's value_type.
Your freelist allocators need not be able to deallocate nodes of arbitrary type, you only need to ensure that memory allocated by FreelistAllocator<T> can be deallocated by FreelistAllocator<U>::template rebind<T>::other. Given that FreelistAllocator<U>::template rebind<T>::other is the same type as FreelistAllocator<T> in most sane implementations, this is fairly easy to achieve.
Simple example (Live demo at Coliru):
template <typename T>
class FreelistAllocator {
union node {
node* next;
typename std::aligned_storage<sizeof(T), alignof(T)>::type storage;
};
node* list = nullptr;
void clear() noexcept {
auto p = list;
while (p) {
auto tmp = p;
p = p->next;
delete tmp;
}
list = nullptr;
}
public:
using value_type = T;
using size_type = std::size_t;
using propagate_on_container_move_assignment = std::true_type;
FreelistAllocator() noexcept = default;
FreelistAllocator(const FreelistAllocator&) noexcept {}
template <typename U>
FreelistAllocator(const FreelistAllocator<U>&) noexcept {}
FreelistAllocator(FreelistAllocator&& other) noexcept : list(other.list) {
other.list = nullptr;
}
FreelistAllocator& operator = (const FreelistAllocator&) noexcept {
// noop
return *this;
}
FreelistAllocator& operator = (FreelistAllocator&& other) noexcept {
clear();
list = other.list;
other.list = nullptr;
return *this;
}
~FreelistAllocator() noexcept { clear(); }
T* allocate(size_type n) {
std::cout << "Allocate(" << n << ") from ";
if (n == 1) {
auto ptr = list;
if (ptr) {
std::cout << "freelist\n";
list = list->next;
} else {
std::cout << "new node\n";
ptr = new node;
}
return reinterpret_cast<T*>(ptr);
}
std::cout << "::operator new\n";
return static_cast<T*>(::operator new(n * sizeof(T)));
}
void deallocate(T* ptr, size_type n) noexcept {
std::cout << "Deallocate(" << static_cast<void*>(ptr) << ", " << n << ") to ";
if (n == 1) {
std::cout << "freelist\n";
auto node_ptr = reinterpret_cast<node*>(ptr);
node_ptr->next = list;
list = node_ptr;
} else {
std::cout << "::operator delete\n";
::operator delete(ptr);
}
}
};
template <typename T, typename U>
inline bool operator == (const FreelistAllocator<T>&, const FreelistAllocator<U>&) {
return true;
}
template <typename T, typename U>
inline bool operator != (const FreelistAllocator<T>&, const FreelistAllocator<U>&) {
return false;
}
1) Are my interpretations above correct?
You are right that your free-list might not be a good fit for allocators, it need be able to handle multiple sizes (and alignments) to fit. That's a problem for the free-list to solve.
2) I've read in a few places that C++11 improved support for "stateful allocators". How is that the case, given these restrictions?
It is not so much improved, than born. In C++03 the standard only nudged implementers toward providing allocators which could support non-equal instances and implementers, effectively making stateful allocators non-portable.
3) Do you have any suggestions for how to do the sort of thing I'm trying to do? That is, how do I include allocated-type-specific state in my allocator?
Your allocator may have to be flexible, because you are not supposed to know exactly what memory (and what types) it is supposed to allocate. This requirement is necessary to insulate you (the user) from the internals of some of the container that uses the allocator such as std::list, std::set or std::map.
You can still use such allocators with simple containers such as std::vector or std::deque.
Yes, it is a costly requirement.
4) In general, the language around allocators seems sloppy. (For example, the prologue to Table 28 says to assume that a is of type X&, but some of the expressions redefine a.) Also, at least GCC's support is non-conformant. What accounts for this weirdness around allocators? Is it just an infrequently used feature?
The Standard in general is not exactly easy to read, not only allocators. You do have to be careful.
To be pedant, gcc does not support allocators (it's a compiler). I surmise that you are speaking about libstdc++ (the Standard Library implementation shipped with gcc). libstdc++ is old, and thus it was tailored to C++03. It has been adapted toward C++11, but is not fully conformant yet (still uses Copy-On-Write for strings, for example). The reason is that libstdc++ has a huge focus on binary compatibility, and a number of changes required by C++11 would break this compatibility; they must therefore be introduced carefully.
I read this as saying that all allocators must be copy-constructible in such a way that the copies are interchangeable with the originals. Worse, the same true across type boundaries. This seems to be a pretty onerous requirement; as far as I can tell, it makes impossible a large number of types of allocators.
It is trivial to meet the requirements if allocators are a lightweight handle onto some memory resource. Just don't try to embed the resource inside individual allocator objects.
For example, say I had a freelist class that I wanted to use in my allocator, in order to cache freed objects. Unless I'm missing something, I couldn't include an instance of that class in the allocator, because the sizes or alignments of T and U might differ and therefore the freelist entries are not compatible.
[allocator.requirements] paragraph 9:
An allocator may constrain the types on which it can be instantiated and the arguments for which its construct member may be called. If a type cannot be used with a particular allocator, the allocator class or the call to construct may fail to instantiate.
It's OK for your allocator to refuse to allocate memory for anything except a given type T. That will prevent it being used in node-based containers such as std::list which need to allocate their own internal node types (not just the container's value_type) but it will work fine for std::vector.
That can be done by preventing the allocator being rebound to other types:
class T;
template<typename ValueType>
class Alloc {
static_assert(std::is_same<ValueType, T>::value,
"this allocator can only be used for type T");
// ...
};
std::vector<T, Alloc<T>> v; // OK
std::list<T, Alloc<T>> l; // Fails
Or you could only support types that can fit in sizeof(T):
template<typename ValueType>
class Alloc {
static_assert(sizeof(ValueType) <= sizeof(T),
"this allocator can only be used for types not larger than sizeof(T)");
static_assert(alignof(ValueType) <= alignof(T),
"this allocator can only be used for types with alignment not larger than alignof(T)");
// ...
};
Are my interpretations above correct?
Not entirely.
I've read in a few places that C++11 improved support for "stateful allocators". How is that the case, given these restrictions?
The restrictions before C++11 were even worse!
It is now clearly specified how allocators propagate between containers when copied and moved, and how various container operations behave when their allocator instance is replaced by a different instance that might not compare equal to the original. Without those clarifications it was not clear what was supposed to happen if e.g. you swapped two containers with stateful allocators.
Do you have any suggestions for how to do the sort of thing I'm trying to do? That is, how do I include allocated-type-specific state in my allocator?
Don't embed it directly in the allocator, store it separately and have the allocator refer to it by a pointer (possibly smart pointer, depending on how you design the lifetime management of the resource). The actual allocator object should be a lightweight handle on to some external source of memory (e.g. an arena, or pool, or something managing a freelist). Allocator objects that share the same source should compare equal, this is true even for allocators with different value types (see below).
I also suggest that you don't try to support allocation for all types if you only need to support it for one.
In general, the language around allocators seems sloppy. (For example, the prologue to Table 28 says to assume that a is of type X&, but some of the expressions redefine a.)
Yes, as you reported at https://github.com/cplusplus/draft/pull/334 (thanks).
Also, at least GCC's support is non-conformant.
It's not 100%, but will be in the next release.
What accounts for this weirdness around allocators? Is it just an infrequently used feature?
Yes. And there's a lot of historical baggage, and it's difficult to specify to be widely useful. My ACCU 2012 presentation has some details, I'll be very surprised if after reading that you think you can make it simpler ;-)
Regarding when allocators compare equal, consider:
MemoryArena m;
Alloc<T> t_alloc(&m);
Alloc<T> t_alloc_copy(t_alloc);
assert( t_alloc_copy == t_alloc ); // share same arena
Alloc<U> u_alloc(t_alloc);
assert( t_alloc == u_alloc ); // share same arena
MemoryArena m2
Alloc<T> a2(&m2);
assert( a2 != t_alloc ); // using different arenas
The meaning of allocator equality is that the objects can free each other's memory, so if you allocate some memory from t_alloc and (t_alloc == u_alloc) is true, then it means you can deallocate that memory using u_alloc. If they're not equal, u_alloc can't deallocate memory that came from t_alloc.
If you just have a freelist where any memory can get added to any other freelist then maybe all your allocator objects would compare equal to each other.
I could not sleep last night and started thinking about std::swap. Here is the familiar C++98 version:
template <typename T>
void swap(T& a, T& b)
{
T c(a);
a = b;
b = c;
}
If a user-defined class Foo uses external ressources, this is inefficient. The common idiom is to provide a method void Foo::swap(Foo& other) and a specialization of std::swap<Foo>. Note that this does not work with class templates since you cannot partially specialize a function template, and overloading names in the std namespace is illegal. The solution is to write a template function in one's own namespace and rely on argument dependent lookup to find it. This depends critically on the client to follow the "using std::swap idiom" instead of calling std::swap directly. Very brittle.
In C++0x, if Foo has a user-defined move constructor and a move assignment operator, providing a custom swap method and a std::swap<Foo> specialization has little to no performance benefit, because the C++0x version of std::swap uses efficient moves instead of copies:
#include <utility>
template <typename T>
void swap(T& a, T& b)
{
T c(std::move(a));
a = std::move(b);
b = std::move(c);
}
Not having to fiddle with swap anymore already takes a lot of burden away from the programmer.
Current compilers do not generate move constructors and move assignment operators automatically yet, but as far as I know, this will change. The only problem left then is exception-safety, because in general, move operations are allowed to throw, and this opens up a whole can of worms. The question "What exactly is the state of a moved-from object?" complicates things further.
Then I was thinking, what exactly are the semantics of std::swap in C++0x if everything goes fine? What is the state of the objects before and after the swap? Typically, swapping via move operations does not touch external resources, only the "flat" object representations themselves.
So why not simply write a swap template that does exactly that: swap the object representations?
#include <cstring>
template <typename T>
void swap(T& a, T& b)
{
unsigned char c[sizeof(T)];
memcpy( c, &a, sizeof(T));
memcpy(&a, &b, sizeof(T));
memcpy(&b, c, sizeof(T));
}
This is as efficient as it gets: it simply blasts through raw memory. It does not require any intervention from the user: no special swap methods or move operations have to be defined. This means that it even works in C++98 (which does not have rvalue references, mind you). But even more importantly, we can now forget about the exception-safety issues, because memcpy never throws.
I can see two potential problems with this approach:
First, not all objects are meant to be swapped. If a class designer hides the copy constructor or the copy assignment operator, trying to swap objects of the class should fail at compile-time. We can simply introduce some dead code that checks whether copying and assignment are legal on the type:
template <typename T>
void swap(T& a, T& b)
{
if (false) // dead code, never executed
{
T c(a); // copy-constructible?
a = b; // assignable?
}
unsigned char c[sizeof(T)];
std::memcpy( c, &a, sizeof(T));
std::memcpy(&a, &b, sizeof(T));
std::memcpy(&b, c, sizeof(T));
}
Any decent compiler can trivially get rid of the dead code. (There are probably better ways to check the "swap conformance", but that is not the point. What matters is that it's possible).
Second, some types might perform "unusual" actions in the copy constructor and copy assignment operator. For example, they might notify observers of their change. I deem this a minor issue, because such kinds of objects probably should not have provided copy operations in the first place.
Please let me know what you think of this approach to swapping. Would it work in practice? Would you use it? Can you identify library types where this would break? Do you see additional problems? Discuss!
So why not simply write a swap template that does exactly that: swap the object representations*?
There's many ways in which an object, once being constructed, can break when you copy the bytes it resides in. In fact, one could come up with a seemingly endless number of cases where this would not do the right thing - even though in practice it might work in 98% of all cases.
That's because the underlying problem to all this is that, other than in C, in C++ we must not treat objects as if they are mere raw bytes. That's why we have construction and destruction, after all: to turn raw storage into objects and objects back into raw storage. Once a constructor has run, the memory where the object resides is more than only raw storage. If you treat it as if it weren't, you will break some types.
However, essentially, moving objects shouldn't perform that much worse than your idea, because, once you start to recursively inline the calls to std::move(), you usually ultimately arrive at where built-ins are moved. (And if there's more to moving for some types, you'd better not fiddle with the memory of those yourself!) Granted, moving memory en bloc is usually faster than single moves (and it's unlikely that a compiler might find out that it could optimize the individual moves to one all-encompassing std::memcpy()), but that's the price we pay for the abstraction opaque objects offer us. And it's quite small, especially when you compare it to the copying we used to do.
You could, however, have an optimized swap() using std::memcpy() for aggregate types.
This will break class instances that have pointers to their own members. For example:
class SomeClassWithBuffer {
private:
enum {
BUFSIZE = 4096,
};
char buffer[BUFSIZE];
char *currentPos; // meant to point to the current position in the buffer
public:
SomeClassWithBuffer();
SomeClassWithBuffer(const SomeClassWithBuffer &that);
};
SomeClassWithBuffer::SomeClassWithBuffer():
currentPos(buffer)
{
}
SomeClassWithBuffer::SomeClassWithBuffer(const SomeClassWithBuffer &that)
{
memcpy(buffer, that.buffer, BUFSIZE);
currentPos = buffer + (that.currentPos - that.buffer);
}
Now, if you just do memcpy(), where would currentPos point? To the old location, obviously. This will lead to very funny bugs where each instance actually uses another's buffer.
Some types can be swapped but cannot be copied. Unique smart pointers are probably the best example. Checking for copyability and assignability is wrong.
If T isn't a POD type, using memcpy to copy/move is undefined behavior.
The common idiom is to provide a method void Foo::swap(Foo& other) and a specialization of std::swap<Foo>. Note that this does not work with class templates, …
A better idiom is a non-member swap and requiring users to call swap unqualified, so ADL applies. This also works with templates:
struct NonTemplate {};
void swap(NonTemplate&, NonTemplate&);
template<class T>
struct Template {
friend void swap(Template &a, Template &b) {
using std::swap;
#define S(N) swap(a.N, b.N);
S(each)
S(data)
S(member)
#undef S
}
};
The key is the using declaration for std::swap as a fallback. The friendship for Template's swap is nice for simplifying the definition; the swap for NonTemplate might also be a friend, but that's an implementation detail.
I deem this a minor issue, because
such kinds of objects probably should
not have provided copy operations in
the first place.
That is, quite simply, a load of wrong. Classes that notify observers and classes that shouldn't be copied are completely unrelated. How about shared_ptr? It obviously should be copyable, but it also obviously notifies an observer- the reference count. Now it's true that in this case, the reference count is the same after the swap, but that's definitely not true for all types and it's especially not true if multi-threading is involved, it's not true in the case of a regular copy instead of a swap, etc. This is especially wrong for classes that can be moved or swapped but not copied.
because in general, move operations
are allowed to throw
They are most assuredly not. It is virtually impossible to guarantee strong exception safety in pretty much any circumstance involving moves when the move might throw. The C++0x definition of the Standard library, from memory, explicitly states any type usable in any Standard container must not throw when moving.
This is as efficient as it gets
That is also wrong. You're assuming that the move of any object is purely it's member variables- but it might not be all of them. I might have an implementation-based cache and I might decide that within my class, I should not move this cache. As an implementation detail it is entirely within my rights not to move any member variables that I deem are not necessary to be moved. You, however, want to move all of them.
Now, it's true that your sample code should be valid for a lot of classes. However, it's extremely very definitely not valid for many classes that are completely and totally legitimate, and more importantly, it's going to compile down to that operation anyway if the operation can be reduced to that. This is breaking perfectly good classes for absolutely no benefit.
your swap version will cause havoc if someone uses it with polymorphic types.
consider:
Base *b_ptr = new Base(); // Base and Derived contain definitions
Base *d_ptr = new Derived(); // of a virtual function called vfunc()
yourmemcpyswap( *b_ptr, *d_ptr );
b_ptr->vfunc(); //now calls Derived::vfunc, while it should call Base::vfunc
d_ptr->vfunc(); //now calls Base::vfunc while it should call Derived::vfunc
//...
this is wrong, because now b contains the vtable of the Derived type, so Derived::vfunc is invoked on a object which isnt of type Derived.
The normal std::swap only swaps the data members of Base, so this is OK with std::swap
I've just come across a nice STL-like tree container class written by Kasper Peeters:
http://tree.phi-sci.com/
However, because it's STL-like, it's geared towards having a single class type in the tree; i.e. template <class T>. The trouble is, like STL-lists, if it suffers from the polymorphic class problem, in that the objects in the tree that are pointers to heap based objects (like pointers to a base class), aren't destroyed when nodes are deleted.
So, my options:
1: Use a tree of boost::shared_ptr, although this is more expensive/overkill than I'd like.
2: Write a little pointer wrapper like the one I've written below. The idea being that it wraps a pointer, which when it goes out of scope, deletes its pointer. It's not ref counted, it's just guarantees the pointer destruction.
template<class T>
class TWSafeDeletePtr
{
private:
T *_ptr;
public:
TWSafeDeletePtr() : _ptr(NULL) {}
TWSafeDeletePtr(T *ptr) : _ptr(ptr) {}
~TWSafeDeletePtr()
{
delete _ptr;
}
T &operator=(T *ptr)
{
assert(!_ptr);
delete _ptr;
_ptr=ptr;
return *ptr;
}
void set(T *ptr)
{
*this=ptr;
}
T &operator*() const { return *_ptr; }
T *operator->() const { return _ptr; }
};
3: Write my own allocator which allocates the node objects from a pool in the allocate() and deletes the pointed to memory in the deallocate().
4: Specialise the code to make a tree of pointers, avoiding the initial allocation and copy construction, plus innately knowing how to delete the pointed-to data.
I already have option 2 working, but I'm not really happy with it, because I have to actually insert an empty ptr to begin with, then set() the pointer when the insert returns an iterator. This is because the tree uses copy construction, and hence the temporary object passed on the stack will ultimate delete the pointer when it goes out of scope. So I set the pointer upon return. It works, it's hidden, but I don't like it.
Option 3 is looking like the best candidate, however I thought I'd ask if anyone else has already done this, or has any other suggestions?
Ok, I'ved decided to go with option 1 (tree of shared_ptrs), mainly because it's using standard libraries, but also because the extra refcount per node won't break the bank. Thanks for the replies everyone :)
Cheers,
Shane
I don't like the allocator version, because allocators are supposed to allocate memory, not construct objects. So there's no guarantee that the number of requested allocations to the allocator matches the number of objects to be constructed; it would depend on the implementation whether you get away with it.
The tree calls the copy constructor on an inserted or appended value after the allocator has allocated the memory for it, so you would be hard pressed to write something which worked with polymorphic objects - alloc_.allocate doesn't know the runtime type of x before the constructor is called (lines 886 on):
tree_node* tmp = alloc_.allocate(1,0);
kp::constructor(&tmp->data, x);
Also looking at the code it doesn't seem to use assignment at all, and your wrapper only supplies the default copy constructor, so I can't see any of your suggested mechanisms working - when a node is assigned to the same value as it already holds with this code called (lines 1000 on):
template <class T, class tree_node_allocator>
template <class iter>
iter tree<T, tree_node_allocator>::replace(iter position, const T& x)
{
kp::destructor(&position.node->data);
kp::constructor(&position.node->data, x);
return position;
}
your smart pointer would destruct their referent when their destructor is called here; you may get away with it by passing a reference counted pointer instead (so x doesn't destroy its referent until it goes out of scope, rather than the position.node->data destructor destroying it).
If you want to use this tree, then I would either use it as an index into data owned elsewhere rather than the tree owning the data, or stick with the shared_ptr approach.
[Edit] Shane has chosen to go with the boost::shared_ptr solution and has pointed out that he needs to store polymorphic base pointers. Should memory/processing efficiency ever become a concern (after profiling, of course), consider a base pointer wrapper with safe copying behavior (e.g., deep-copying the pointee through a clone method) and the fast swap idiom shown in #5. This would be similar to suggested solution #2, but safe and without making assumptions on the data type being used (ex: trying to use auto_ptr with containers).
I think you should consider option #5.
1: Use a tree of boost::shared_ptr,
although this is more
expensive/overkill than I'd like.
First of all, do you realize that any linked structure like std::list, std::set, std::map requires a separate memory allocation/deallocation per node but doesn't require copying nodes to do things like rebalance the tree? The only time the reference counter will amount to any processing overhead is when you insert to the tree.
2: Write a little pointer wrapper like
the one I've written below. The idea
being that it wraps a pointer, which
when it goes out of scope, deletes its
pointer. It's not ref counted, it's
just guarantees the pointer
destruction.
For this tree you might be able to get away with it since you have the tree implementation, but it's a heavy assumption. Consider at least making the pointer wrapper non-copyable so that you'll get a compiler error if you ever try to use it for something which does copy node elements.
3: Write my own allocator which
allocates the node objects from a pool
in the allocate() and deletes the
pointed to memory in the deallocate().
If it's an STL-compliant memory allocator, it should not be making such assumptions about the memory contents in deallocate. Nevertheless, writing a fast memory allocator which can assume fixed allocation sizes can really speed up any linked structure. Writing a fast memory allocator that consistently outperforms malloc/free is non-trivial work, however, and there are issues to consider like memory alignment.
4: Specialise the code to make a tree
of pointers, avoiding the initial
allocation and copy construction, plus
innately knowing how to delete the
pointed-to data.
Making a wrapper for the tree is probably the most robust solution. You'll have full control over when to insert and remove elements (nodes) and can do whatever you like in the mean time.
Option #5: just store the element directly in the tree and focus on making the element fast.
This is your best bet if you ask me. Instead of map<int, ExpensiveElement*> or map<int, shared_ptr<ExpensiveElement> >, consider simply map<int, ExpensiveElement>.
After all, you obviously want the tree to be the memory manager (deleting a node deletes the element). That happens when we avoid the indirection of a pointer to the element already.
However, your concern seems to be the overhead of the copy-in policy of insert (copy ctor overhead on ExpensiveElement). No problem! Just use operator[] instead of insert:
map<int, ExpensiveElement> my_map;
// default constructs ExpensiveElement
// and calls ExpensiveElement::do_something().
// No copies of ExpensiveElement are made!
my_map[7].do_something();
Tada! No copying, no need to worry about proper destruction, and no memory allocation/deallocation overhead per element.
If default constructing ExpensiveElement won't suffice, then consider making default construction super cheap (practically free) and implement a swap method.
map<int, ExpensiveElement> my_map;
// construct an ExpensiveElement and swap it into the map
// this avoids redundant work and copying and can be very
// efficient if the default constructor of ExpensiveElement
// is cheap to call
ExpensiveElement element(...);
my_map[7].swap(element);
To make the default construction super cheap and allow for a swap method, you could implement a fast pimpl on ExpensiveElement. You can make it so the default ctor doesn't even allocate the pimpl, making it a null pointer, while the swap method swaps the two pimpls of ExpensiveElement. Now you have super cheap default construction and a way to swap properly constructed ExpensiveElements into the map, avoiding the redundancy of deep copies all together.
What if ExpensiveElement cannot have a default ctor?
Then make a wrapper which does. The approach can be similar to the pointer wrapper you suggested, except it will be a complete class with valid (safe) copying behavior. The copy ctor can deep copy the pointee, for example, if reference counting is undesired. The difference may sound subtle, but this way it's a very safe and complete solution which doesn't make assumptions about how the tree is implemented; safe and general like boost::shared_ptr but without the reference counting. Just provide a swap method as your one and only means to shallow swap data without requiring a deep copy and use it to swap things into the tree.
What if we need to store polymorphic base pointers?
See answer immediately above and modify the wrapper to call something like clone (prototype pattern) to deep copy the pointee.
First of all, you could benefit from move semantics here. If you have access to C++0x.
Otherwise, the Boost Pointer Container library has solved the issue of the STL containers of pointers by... recoding it all.
Another issue with containers of pointers that you did not mention is the copy of the container. In your case the original container and its copy both point to the same objects, so changing one will not change the other.
You can of course alleviate this by writing a proxy class which wraps the pointer and provides deep copying semantic (clone method in the object wrapped). But you will then copy the data more often that if the container is pointer aware.... it's less work though.
/// Requirement: T is a model of Cloneable
template <class T>
class Proxy
{
template <class> friend class Proxy;
public:
// Constructor
Proxy(): mPointer(0) {}
explicit Proxy(T* t): mPointer(t) {}
template <class U>
explicit Proxy(std::auto_ptr<U> t): mPointer(t.release()) {}
template <class U>
explicit Proxy(std::unique_ptr<U> t): mPointer(t.release()) {}
// Copy Constructor
Proxy(Proxy const& rhs):
mPointer(rhs.mPointer ? rhs.mPointer->clone() : 0) {}
template <class U>
Proxy(Proxy<U> const& rhs):
mPointer(rhs.mPointer ? rhs.mPointer->clone() : 0) {}
// Assignment Operator
Proxy& operator=(Proxy const& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
template <class U>
Proxy& operator=(Proxy<U> const& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
// Move Constructor
Proxy(Proxy&& rhs): mPointer(rhs.release()) {}
template <class U>
Proxy(Proxy<U>&& rhs): mPointer(rhs.release()) {}
// Move assignment
Proxy& operator=(Proxy&& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
template <class U>
Proxy& operator=(Proxy&& rhs)
{
Proxy tmp(rhs);
this->swap(tmp);
return *this;
}
// Destructor
~Proxy() { delete mPointer; }
void swap(Proxy& rhs)
{
T* tmp = rhs.mPointer;
rhs.mPointer = mPointer;
mPointer = tmp;
}
T& operator*() { return *mPointer; }
T const& operator*() const { return *mPointer; }
T* operator->() { return mPointer; }
T const* operator->() const { return mPointer; }
T* release() { T* tmp = mPointer; mPointer = 0; return tmp; }
private:
T* mPointer;
};
// Free functions
template <class T>
void swap(Proxy<T>& lhs, Proxy<T>& rhs) { lhs.swap(rhs); }
Note that as well as providing deep-copying semantics, it provides deep-constness. This may or may not be to your taste.
It would also be good taste to provide equivalent to static_cast and dynamic_cast operations, this is left as an exercise to the reader ;)
It seems that the cleanest solution would be a container adaptor in the style of Boost Pointer Container. This would smooth the syntax a lot as well. However writing such an adaptor is tedious and repetive as you would have to "lift" typedefs and repeat every member function of the class that is to be adapted.
It looks like option 1 is probably the best. shared_ptr is very common and most people should know how it works. Is there a problem with the syntax map_obj[key].reset(new ValueType);?
Unless you have measurement data that your wrapper for option 2 is a significant savings in use compared to shared_ptr, shared_ptr seems safer since people will know about its semantics right away.
Option three seems complex for what it provides. You need to implement the allocate/construct and deallocate/destruct pairs, and make sure that if a node is copied around it will not have deletion problems.
Option four is probably the second-best option. I wouldn't suggest using it unless profiling shows that the shared_ptr operations really are that expensive though, since this requires reinventing code that's already been written and debugged in the standard library.
I'ved decided to go with option 1 (tree of shared_ptrs), mainly because it's using standard libraries, but also because the extra refcount per node won't break the bank. Thanks for the replies everyone :)
1.
I already have option 1 working, but I'm not really happy with it, because I have to actually insert an empty ptr to begin with, then set() the pointer when the insert returns an iterator. This is because the tree uses copy construction, and hence the temporary object passed on the stack will ultimate delete the pointer when it goes out of scope. So I set the pointer upon return. It works, it's hidden, but I don't like it.
Until there is at least one copy of the same shared_ptr, pointed object won't be destroyed so there is no problem you writing about.
2.Your class is pointless. Use shared_ptr.
3.The allocator would have to know what kind of object to create when asked for a piece of bytes, this is not well solution.
4.Too much complications.
I suggest solution 1.