Why have move semantics? - c++

Let me preface by saying that I have read some of the many questions already asked regarding move semantics. This question is not about how to use move semantics, it is asking what the purpose of it is - if I am not mistaken, I do not see why move semantics is needed.
Background
I was implementing a heavy class, which, for the purposes of this question, looked something like this:
class B;
class A
{
private:
std::array<B, 1000> b;
public:
// ...
}
When it came time to make a move assignment operator, I realized that I could significantly optimize the process by changing the b member to std::array<B, 1000> *b; - then movement could just be a deletion and pointer swap.
This lead me to the following thought: now, shouldn't all non-primitive type members be pointers to speed up movement (corrected below [1] [2]) (there is a case to be made for cases where memory should not be dynamically allocated, but in these cases optimizing movement is not an issue since there is no way to do so)?
Here is where I had the following realization - why create a class A which really just houses a pointer b so swapping later is easier when I can simply make a pointer to the entire A class itself. Clearly, if a client expects movement to be significantly faster than copying, the client should be OK with dynamic memory allocation. But in this case, why does the client not just dynamically allocate the whole A class?
The Question
Can't the client already take advantage of pointers to do everything move semantics gives us? If so, then what is the purpose of move semantics?
Move semantics:
std::string f()
{
std::string s("some long string");
return s;
}
int main()
{
// super-fast pointer swap!
std::string a = f();
return 0;
}
Pointers:
std::string *f()
{
std::string *s = new std::string("some long string");
return s;
}
int main()
{
// still super-fast pointer swap!
std::string *a = f();
delete a;
return 0;
}
And here's the strong assignment that everyone says is so great:
template<typename T>
T& strong_assign(T *&t1, T *&t2)
{
delete t1;
// super-fast pointer swap!
t1 = t2;
t2 = nullptr;
return *t1;
}
#define rvalue_strong_assign(a, b) (auto ___##b = b, strong_assign(a, &___##b))
Fine - the latter in both examples may be considered "bad style" - whatever that means - but is it really worth all the trouble with the double ampersands? If an exception might be thrown before delete a is called, that's still not a real problem - just make a guard or use unique_ptr.
Edit [1] I just realized this wouldn't be necessary with classes such as std::vector which use dynamic memory allocation themselves and have efficient move methods. This just invalidates a thought I had - the question below still stands.
Edit [2] As mentioned in the discussion in the comments and answers below this whole point is pretty much moot. One should use value semantics as much as possible to avoid allocation overhead since the client can always move the whole thing to the heap if needed.

I thoroughly enjoyed all the answers and comments! And I agree with all of them. I just wanted to stick in one more motivation that no one has yet mentioned. This comes from N1377:
Move semantics is mostly about performance optimization: the ability
to move an expensive object from one address in memory to another,
while pilfering resources of the source in order to construct the
target with minimum expense.
Move semantics already exists in the current language and library to a
certain extent:
copy constructor elision in some contexts
auto_ptr "copy"
list::splice
swap on containers
All of these operations involve transferring resources from one object
(location) to another (at least conceptually). What is lacking is
uniform syntax and semantics to enable generic code to move arbitrary
objects (just as generic code today can copy arbitrary objects). There
are several places in the standard library that would greatly benefit
from the ability to move objects instead of copy them (to be discussed
in depth below).
I.e. in generic code such as vector::erase, one needs a single unified syntax to move values to plug the hole left by the erased valued. One can't use swap because that would be too expensive when the value_type is int. And one can't use copy assignment as that would be too expensive when value_type is A (the OP's A). Well, one could use copy assignment, after all we did in C++98/03, but it is ridiculously expensive.
shouldn't all non-primitive type members be pointers to speed up movement
This would be horribly expensive when the member type is complex<double>. Might as well color it Java.

Your example gives it away: your code is not exception-safe, and it makes use of the free-store (twice), which can be nontrivial. To use pointers, in many/most situations you have to allocate stuff on the free store, which is much slower than automatic storage, and does not allow for RAII.
They also let you more efficiently represent non-copyable resources, like sockets.
Move semantics aren't strictly necessary, as you can see that C++ has existed for 40 years a while without them. They are simply a better way to represent certain concepts, and an optimization.

Can't the client already take advantage of pointers to do everything move semantics gives us? If so, then what is the purpose of move semantics?
Your second example gives one very good reason why move semantics is a good thing:
std::string *f()
{
std::string *s = new std::string("some long string");
return s;
}
int main()
{
// still super-fast pointer swap!
std::string *a = f();
delete a;
return 0;
}
Here, the client has to examine the implementation to figure out who is responsible for deleting the pointer. With move semantics, this ownership issue won't even come up.
If an exception might be thrown before delete a is called, that's still not a real problem just make a guard or use unique_ptr.
Again, the ugly ownership issue shows up if you don't use move semantics. By the way, how
would you implement unique_ptr without move semantics?
I know about auto_ptr and there are good reasons why it is now deprecated.
is it really worth all the trouble with the double ampersands?
True, it takes some time to get used to it. After you are familiar and comfortable with it, you will be wondering how you could live without move semantics.

Your string example is great. The short string optimization means that short std::strings do not exist in the free store: instead they exist in automatic storage.
The new/delete version means that you force every std::string into the free store. The move version only puts large strings into the free store, and small strings stay (and are possibly copied) in automatic storage.
On top of that your pointer version lacks exception safety, as it has non-RAII resource handles. Even if you do not use exceptions, naked pointer resource owners basically forces single exit point control flow to manage cleanup. On top of that, use of naked pointer ownership leads to resource leaks and dangling pointers.
So the naked pointer version is worse in piles of ways.
move semantics means you can treat complex objects as normal values. You move when you do not want duplicate state, and copy otherwise. Nearly normal types that cannot be copied can expose move only (unique_ptr), others can optimize for it (shared_ptr). Data stored in containers, like std::vector, can now include abnormal types because it is move aware. The std::vector of std::vector goes from ridiculously inefficient and hard to use to easy and fast at the stroke of a standard version.
Pointers place the resource management overhead into the clients, while good C++11 classes handle that problem for you. move semantics makes this both easier to maintain, and far less error prone.

Related

How to enable Rust Ownership paradigm in C++

The system programming language Rust uses the ownership paradigm to ensure at compile time with zero cost for the runtime when a resource has to be freed.
In C++ we commonly use smart pointers to achieve the same goal of hiding the complexity of managing resource allocation. There are a couple of differences though:
In Rust there is always only one owner, whereas C++ shared_ptr can easily leak ownership.
In Rust we can borrow references we do not own, whereas C++ unique_ptr cannot be shared in a safe way via weak_ptr and lock().
Reference counting of shared_ptr is costly.
My question is: How can we emulate the ownership paradigm in C++ within the following constraints:
Only one owner at any time
Possibility to borrow a pointer and use it temporarily without fear of the resource going out of scope (observer_ptr is useless for this)
As much compile-time checks as possible.
Edit: Given the comments so far, we can conclude:
No compile-time support for this (I was hoping for some decltype/template magic unknown to me) in the compilers. Might be possible using static analysis elsewhere (taint?)
No way to get this without reference counting.
No standard implementation to distinguish shared_ptrs with owning or borrowing semantic
Could roll your own by creating wrapper types around shared_ptr and weak_ptr:
owned_ptr: non-copyable, move-semantics, encapsulates shared_ptr, access to borrowed_ptr
borrowed_ptr: copyable, encapsulates weak_ptr, lock method
locked_ptr: non-copyable, move-semantics, encapsulates shared_ptr from locking weak_ptr
You can't do this with compile-time checks at all. The C++ type system is lacking any way to reason about when an object goes out of scope, is moved, or is destroyed — much less turn this into a type constraint.
What you could do is have a variant of unique_ptr that keeps a counter of how many "borrows" are active at run time. Instead of get() returning a raw pointer, it would return a smart pointer that increments this counter on construction and decrements it on destruction. If the unique_ptr is destroyed while the count is non-zero, at least you know someone somewhere did something wrong.
However, this is not a fool-proof solution. Regardless of how hard you try to prevent it, there will always be ways to get a raw pointer to the underlying object, and then it's game over, since that raw pointer can easily outlive the smart pointer and the unique_ptr. It will even sometimes be necessary to get a raw pointer, to interact with an API that requires raw pointers.
Moreover, ownership is not about pointers. Box/unique_ptr allows you to heap allocate an object, but it changes nothing about ownership, life time, etc. compared to putting the same object on the stack (or inside another object, or anywhere else really). To get the same mileage out of such a system in C++, you'd have to make such "borrow counting" wrappers for all objects everywhere, not just for unique_ptrs. And that is pretty impractical.
So let's revisit the compile time option. The C++ compiler can't help us, but maybe lints can? Theoretically, if you implement the whole life time part of the type system and add annotations to all APIs you use (in addition to your own code), that may work.
But it requires annotations for all functions used in the whole program. Including private helper function of third party libraries. And those for which no source code is available. And for those whose implementation that are too complicated for the linter to understand (from Rust experience, sometimes the reason something is safe are too subtle to express in the static model of lifetimes and it has to be written slightly differently to help the compiler). For the last two, the linter can't verify that the annotation is indeed correct, so you're back to trusting the programmer. Additionally, some APIs (or rather, the conditions for when they are safe) can't really be expressed very well in the lifetime system as Rust uses it.
In other words, a complete and practically useful linter for this this would be substantial original research with the associated risk of failure.
Maybe there is a middle ground that gets 80% of the benefits with 20% of the cost, but since you want a hard guarantee (and honestly, I'd like that too), tough luck. Existing "good practices" in C++ already go a long way to minimizing the risks, by essentially thinking (and documenting) the way a Rust programmer does, just without compiler aid. I'm not sure if there is much improvement over that to be had considering the state of C++ and its ecosystem.
tl;dr Just use Rust ;-)
What follows are some examples of ways people have tried to emulate parts of Rust's ownership paradigm in C++, with limited success:
Lifetime safety: Preventing common dangling. The most thorough and rigorous approach, involving several additions to the language to support the necessary annotations. If the effort is still alive (last commit was in 2019), getting this analysis added to a mainstream compiler is probably the most likely route to "borrow checked" C++. Discussed on IRLO.
Borrowing Trouble: The Difficulties Of A C++ Borrow-Checker
Is it possible to achieve Rust's ownership model with a generic C++ wrapper?
C++Now 2017: Jonathan Müller “Emulating Rust's borrow checker in C++" (video) and associated code, about which the author says, "You're not actually supposed to use that, if you need such a feature, you should use Rust."
Emulating the Rust borrow checker with C++ move-only types and part II (which is actually more like emulating RefCell than the borrow checker, per se)
I believe you can get some of the benefits of Rust by enforcing some strict coding conventions (which is after all what you'd have to do anyway, since there's no way with "template magic" to tell the compiler not to compile code that doesn't use said "magic"). Off the top of my head, the following could get you...well...kind of close, but only for single-threaded applications:
Never use new directly; instead, use make_unique. This goes partway toward ensuring that heap-allocated objects are "owned" in a Rust-like manner.
"Borrowing" should always be represented via reference parameters to function calls. Functions that take a reference should never create any sort of pointer to the refered-to object. (It may in some cases be necessary to use a raw pointer as a paramter instead of a reference, but the same rule should apply.)
Note that this works for objects on the stack or on the heap; the function shouldn't care.
Transfer of ownership is, of course, represented via R-value references (&&) and/or R-value references to unique_ptrs.
Unfortunately, I can't think of any way to enforce Rust's rule that mutable references can only exist anywhere in the system when there are no other extant references.
Also, for any kind of parallelism, you would need to start dealing with lifetimes, and the only way I can think of to permit cross-thread lifetime management (or cross-process lifetime management using shared memory) would be to implement your own "ptr-with-lifetime" wrapper. This could be implemented using shared_ptr, because here, reference-counting would actually be important; it's still a bit of unnecessary overhead, though, because reference-count blocks actually have two reference counters (one for all the shared_ptrs pointing to the object, another for all the weak_ptrs). It's also a little... odd, because in a shared_ptr scenario, everybody with a shared_ptr has "equal" ownership, whereas in a "borrowing with lifetime" scenario, only one thread/process should actually "own" the memory.
I think one could add a degree of compile-time introspection and custom sanitisation by introducing custom wrapper classes that track ownership and borrowing.
The code below is a hypothetical sketch, and not a production solution which would need a lot more tooling, e.g. #def out the checks when not sanitising. It uses a very naive lifetime checker to 'count' borrow errors in ints, in this instance during compilation. static_asserts are not possible as the ints are not constexpr, but the values are there and can be interrogated before runtime. I believe this answers your 3 constraints, regardless of whether these are heap allocations, so I'm using a simple int type to demo the idea, rather than a smart pointer.
Try uncommenting the use cases in main() below (run in compiler explorer with -O3 to see boilerplate optimise away), and you'll see the warning counters change.
https://godbolt.org/z/Pj4WMr
// Hypothetical Rust-like owner / borrow wrappers in C++
// This wraps types with data which is compiled away in release
// It is not possible to static_assert, so this uses static ints to count errors.
#include <utility>
// Statics to track errors. Ideally these would be static_asserts
// but they depen on Owner::has_been_moved which changes during compilation.
static int owner_already_moved = 0;
static int owner_use_after_move = 0;
static int owner_already_borrowed = 0;
// This method exists to ensure static errors are reported in compiler explorer
int get_fault_count() {
return owner_already_moved + owner_use_after_move + owner_already_borrowed;
}
// Storage for ownership of a type T.
// Equivalent to mut usage in Rust
// Disallows move by value, instead ownership must be explicitly moved.
template <typename T>
struct Owner {
Owner(T v) : value(v) {}
Owner(Owner<T>& ov) = delete;
Owner(Owner<T>&& ov) {
if (ov.has_been_moved) {
owner_already_moved++;
}
value = std::move(ov.value);
ov.has_been_moved = true;
}
T& operator*() {
if (has_been_moved) {
owner_use_after_move++;
}
return value;
}
T value;
bool has_been_moved{false};
};
// Safely borrow a value of type T
// Implicit constuction from Owner of same type to check borrow is safe
template <typename T>
struct Borrower {
Borrower(Owner<T>& v) : value(v.value) {
if (v.has_been_moved) {
owner_already_borrowed++;
}
}
const T& operator*() const {
return value;
}
T value;
};
// Example of function borrowing a value, can only read const ref
static void use(Borrower<int> v) {
(void)*v;
}
// Example of function taking ownership of value, can mutate via owner ref
static void use_mut(Owner<int> v) {
*v = 5;
}
int main() {
// Rather than just 'int', Owner<int> tracks the lifetime of the value
Owner<int> x{3};
// Borrowing value before mutating causes no problems
use(x);
// Mutating value passes ownership, has_been_moved set on original x
use_mut(std::move(x));
// Uncomment for owner_already_borrowed = 1
//use(x);
// Uncomment for owner_already_moved = 1
//use_mut(std::move(x));
// Uncomment for another owner_already_borrowed++
//Borrower<int> y = x;
// Uncomment for owner_use_after_move = 1;
//return *x;
}
The use of static counters is obviously not desirable, but it is not possible to use static_assert as owner_already_moved is non-const. The idea is these statics give hints to errors appearing, and in final production code they could be #defed out.
You can use an enhanced version of a unique_ptr (to enforce a unique owner) together with an enhanced version of observer_ptr (to get a nice runtime exception for dangling pointers, i.e. if the original object maintained through unique_ptr went out of scope). The Trilinos package implements this enhanced observer_ptr, they call it Ptr. I have implemented the enhanced version of unique_ptr here (I call it UniquePtr): https://github.com/certik/trilinos/pull/1
Finally, if you want the object to be stack allocated, but still be able to pass safe references around, you need to use the Viewable class, see my initial implementation here: https://github.com/certik/trilinos/pull/2
This should allow you to use C++ just like Rust for pointers, except that in Rust you get a compile time error, while in C++ you get a runtime exception. Also, it should be noted, that you only get a runtime exception in Debug mode. In Release mode, the classes do not do these checks, so they are as fast as in Rust (essentially as fast as raw pointers), but then they can segfault. So one has to make sure the whole test suite runs in Debug mode.

C++ weak_ptr creation performance

I've read that creating or copying a std::shared_ptr involves some overhead (atomic increment of reference counter etc..).
But what about creating a std::weak_ptr from it instead:
Obj * obj = new Obj();
// fast
Obj * o = obj;
// slow
std::shared_ptr<Obj> a(o);
// slow
std::shared_ptr<Obj> b(a);
// slow ?
std::weak_ptr<Obj> c(b);
I was hoping in some faster performance, but i know that the shared pointer still have to increment the weak references counter..
So is this still as slow as copying a shared_ptr into another?
This is from my days with game engines
The story goes:
We need a fast shared pointer implementation, one that doesn't thrash the cache (caches are smarter now btw)
A normal pointer:
XXXXXXXXXXXX....
^--pointer to data
Our shared pointer:
iiiiXXXXXXXXXXXXXXXXX...
^ ^---pointer stored in shared pointer
|
+---the start of the allocation, the allocation is sizeof(unsigned int)+sizeof(T)
The unsigned int* used for the count is at ((unsigned int*)ptr)-1
that way a "shared pointer" is pointer sized,and the data it contains is the pointer to the actual data. So (because template=>inline and any compiler would inline an operator returning a data member) it was the same "overhead" for access as a normal pointer.
Creation of pointers took like 3 more CPU instructions than normal (access to a location-4 is on operation, the add of 1 and the write to location -4)
Now we'd only use weak-pointers when we were debugging (so we'd compile with DEBUG defined (macro definition)) because then we'd like to see all allocations and whats going on and such. It was useful.
The weak-pointers must know when what they point to is gone, NOT keep the thing they point to alive (in my case, if the weak pointer kept the allocation alive the engine would never get to recycle or free any memory, then it's basically a shared pointer anyway)
So each weak-pointer has a bool, alive or something, and is a friend of shared_pointer
When debugging our allocation looked like this:
vvvvvvvviiiiXXXXXXXXXXXXX.....
^ ^ ^ the pointer we stored (to the data)
| +that pointer -4 bytes = ref counter
+Initial allocation now
sizeof(linked_list<weak_pointer<T>*>)+sizeof(unsigned int)+sizeof(T)
The linked list structure you use depends on what you care about, we wanted to stay as close to sizeof(T) as we could (we managed memory using the buddy algorithm) so we stored a pointer to the weak_pointer and used the xor trick.... good times.
Anyway: the weak pointers to something shared_pointers point to are put in a list, stored somehow in the "v"s above.
When the reference count hits zero, you go through that list (which is a list of pointers to actual weak_pointers, they remove themselves when deleted obviously) and you set alive=false (or something) to each weak_pointer.
The weak_pointers now know what they point to is no longer there (so threw when de-referenced)
In this example
There is no overhead (the alignment was 4 bytes with the system. 64 bit systems tend to like 8 byte alignments.... union the ref-counter with an int[2] in there to pad it out in that case. Remember this involves inplace news (nobody downvote because I mentioned them :P) and such. You need to make sure the struct you impose on the allocation matches what you allocated and made. Compilers can align stuff for themselves (hence int[2] not int,int).
You can de-reference the shared_pointer with no overhead at all.
New shared pointers being made do not thrash the cache at all and require 3 CPU instructions, they are not very... pipe-line-able but the compiler will inline getters and setters always (if not probably always :P) and there'll be something around the call-site that can fill the pipeline.
The destructor of a shared pointer also does very little (decrements, that's it) so is great!
High performance note
If you have a situation like:
f() {
shared_pointer<T> ptr;
g(ptr);
}
There's no guarantee that the optimiser will dare to not do the adds and subtractions from passing shared_pointer "by value" to g.
This is where you'd use a normal reference (which is implemented as a pointer)
so you'd do g(ptr.extract_reference()); instead - again the compiler will inline the simple getter.
now you have a T&, because ptr's scope entirely surrounds g (assuming g has no side-effects and so forth) that reference will be valid for the duration of g.
deleting references is very ugly and you probably couldn't do it by accident (we relied on this fact).
In hindsight
I should have created a type called "extracted_pointer" or something, it'd be really hard to type that by mistake for a class member.
The weak/shared pointers used by stdlib++
http://gcc.gnu.org/onlinedocs/libstdc++/manual/shared_ptr.html
Not as fast...
But don't worry about the odd cache miss unless you're making a game engine that isn't running a decent workload > 120fps easily :P Still miles better than Java.
The stdlib way is nicer. Each object has it's own allocation and job. With our shared_pointer it was a true case of "trust me it works, try not to worry about how" (not that it is hard) because the code looked really messy.
If you undid the ... whatever they've done to the names of variables in their implementation it'd be far easier to read. See Boost's implementation, as it says in that documents.
Other than variable names the GCC stdlib implementation is lovely. You can read it easily, it does it's job properly (following the OO principle) but is a little slower and MAY thrash the cache on crappy chips these days.
UBER high performance note
You may be thinking, why not have XXXX...XXXXiiii (the reference count at the end) then you'll get the alignment that's best fro the allocator!
Answer:
Because having to do pointer+sizeof(T) may not be one CPU instruction! (Subtracting 4 or 8 is something a CPU can do easy simply because it makes sense, it'll be doing this a lot)
In addition to Alec's very interesting description of the shared/weak_ptr system used in his previous projects, I wanted to give a little more detail on what is likely to be happening for a typical std::shared_ptr/weak_ptr implementation:
// slow
std::shared_ptr<Obj> a(o);
The main expense in the above construction is to allocate a block of memory to hold the two reference counts. No atomic operations need be done here (aside from what the implementation may or may not do under operator new).
// slow
std::shared_ptr<Obj> b(a);
The main expense in the copy construction is typically a single atomic increment.
// slow ?
std::weak_ptr<Obj> c(b);
The main expense in the this weak_ptr constructor is typically a single atomic increment. I would expect the performance of this constructor to be nearly identical to that of the shared_ptr copy constructor.
Two other important constructors to be aware of are:
std::shared_ptr<Obj> d(std::move(a)); // shared_ptr(shared_ptr&&);
std::weak_ptr<Obj> e(std::move( c )); // weak_ptr(weak_ptr&&);
(And matching move assignment operators as well)
The move constructors do not require any atomic operations at all. They just copy the reference count from the rhs to the lhs, and make the rhs == nullptr.
The move assignment operators require an atomic decrement only if the lhs != nullptr prior to the assignment. The bulk of the time (e.g. within a vector<shared_ptr<T>>) the lhs == nullptr prior to a move assignment, and so there are no atomic operations at all.
The latter (the weak_ptr move members) are not actually C++11, but are being handled by LWG 2315. However I would expect it to already be implemented by most implementations (I know it is already implemented in libc++).
These move members will be used when scooting smart pointers around in containers, e.g. under vector<shared_ptr<T>>::insert/erase, and can have a measurable positive impact compared to the use of the smart pointer copy members.
I point it out so that you will know that if you have the opportunity to move instead of copy a shared_ptr/weak_ptr, it is worth the trouble to type the few extra characters to do so.

malloc & placement new vs. new

I've been looking into this for the past few days, and so far I haven't really found anything convincing other than dogmatic arguments or appeals to tradition (i.e. "it's the C++ way!").
If I'm creating an array of objects, what is the compelling reason (other than ease) for using:
#define MY_ARRAY_SIZE 10
// ...
my_object * my_array=new my_object [MY_ARRAY_SIZE];
for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i]=my_object(i);
over
#define MEMORY_ERROR -1
#define MY_ARRAY_SIZE 10
// ...
my_object * my_array=(my_object *)malloc(sizeof(my_object)*MY_ARRAY_SIZE);
if (my_object==NULL) throw MEMORY_ERROR;
for (int i=0;i<MY_ARRAY_SIZE;++i) new (my_array+i) my_object (i);
As far as I can tell the latter is much more efficient than the former (since you don't initialize memory to some non-random value/call default constructors unnecessarily), and the only difference really is the fact that one you clean up with:
delete [] my_array;
and the other you clean up with:
for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i].~T();
free(my_array);
I'm out for a compelling reason. Appeals to the fact that it's C++ (not C) and therefore malloc and free shouldn't be used isn't -- as far as I can tell -- compelling as much as it is dogmatic. Is there something I'm missing that makes new [] superior to malloc?
I mean, as best I can tell, you can't even use new [] -- at all -- to make an array of things that don't have a default, parameterless constructor, whereas the malloc method can thusly be used.
I'm out for a compelling reason.
It depends on how you define "compelling". Many of the arguments you have thus far rejected are certainly compelling to most C++ programmers, as your suggestion is not the standard way to allocate naked arrays in C++.
The simple fact is this: yes, you absolutely can do things the way you describe. There is no reason that what you are describing will not function.
But then again, you can have virtual functions in C. You can implement classes and inheritance in plain C, if you put the time and effort into it. Those are entirely functional as well.
Therefore, what matters is not whether something can work. But more on what the costs are. It's much more error prone to implement inheritance and virtual functions in C than C++. There are multiple ways to implement it in C, which leads to incompatible implementations. Whereas, because they're first-class language features of C++, it's highly unlikely that someone would manually implement what the language offers. Thus, everyone's inheritance and virtual functions can cooperate with the rules of C++.
The same goes for this. So what are the gains and the losses from manual malloc/free array management?
I can't say that any of what I'm about to say constitutes a "compelling reason" for you. I rather doubt it will, since you seem to have made up your mind. But for the record:
Performance
You claim the following:
As far as I can tell the latter is much more efficient than the former (since you don't initialize memory to some non-random value/call default constructors unnecessarily), and the only difference really is the fact that one you clean up with:
This statement suggests that the efficiency gain is primarily in the construction of the objects in question. That is, which constructors are called. The statement presupposes that you don't want to call the default constructor; that you use a default constructor just to create the array, then use the real initialization function to put the actual data into the object.
Well... what if that's not what you want to do? What if what you want to do is create an empty array, one that is default constructed? In this case, this advantage disappears entirely.
Fragility
Let's assume that each object in the array needs to have a specialized constructor or something called on it, such that initializing the array requires this sort of thing. But consider your destruction code:
for (int i=0;i<MY_ARRAY_SIZE;++i) my_array[i].~T();
For a simple case, this is fine. You have a macro or const variable that says how many objects you have. And you loop over each element to destroy the data. That's great for a simple example.
Now consider a real application, not an example. How many different places will you be creating an array in? Dozens? Hundreds? Each and every one will need to have its own for loop for initializing the array. Each and every one will need to have its own for loop for destroying the array.
Mis-type this even once, and you can corrupt memory. Or not delete something. Or any number of other horrible things.
And here's an important question: for a given array, where do you keep the size? Do you know how many items you allocated for every array that you create? Each array will probably have its own way of knowing how many items it stores. So each destructor loop will need to fetch this data properly. If it gets it wrong... boom.
And then we have exception safety, which is a whole new can of worms. If one of the constructors throws an exception, the previously constructed objects need to be destructed. Your code doesn't do that; it's not exception-safe.
Now, consider the alternative:
delete[] my_array;
This can't fail. It will always destroy every element. It tracks the size of the array, and it's exception-safe. So it is guaranteed to work. It can't not work (as long as you allocated it with new[]).
Of course, you could say that you could wrap the array in an object. That makes sense. You might even template the object on the type elements of the array. That way, all the desturctor code is the same. The size is contained in the object. And maybe, just maybe, you realize that the user should have some control over the particular way the memory is allocated, so that it's not just malloc/free.
Congratulations: you just re-invented std::vector.
Which is why many C++ programmers don't even type new[] anymore.
Flexibility
Your code uses malloc/free. But let's say I'm doing some profiling. And I realize that malloc/free for certain frequently created types is just too expensive. I create a special memory manager for them. But how to hook all of the array allocations to them?
Well, I have to search the codebase for any location where you create/destroy arrays of these types. And then I have to change their memory allocators accordingly. And then I have to continuously watch the codebase so that someone else doesn't change those allocators back or introduce new array code that uses different allocators.
If I were instead using new[]/delete[], I could use operator overloading. I simply provide an overload for operators new[] and delete[] for those types. No code has to change. It's much more difficult for someone to circumvent these overloads; they have to actively try to. And so forth.
So I get greater flexibility and reasonable assurance that my allocators will be used where they should be used.
Readability
Consider this:
my_object *my_array = new my_object[10];
for (int i=0; i<MY_ARRAY_SIZE; ++i)
my_array[i]=my_object(i);
//... Do stuff with the array
delete [] my_array;
Compare it to this:
my_object *my_array = (my_object *)malloc(sizeof(my_object) * MY_ARRAY_SIZE);
if(my_object==NULL)
throw MEMORY_ERROR;
int i;
try
{
for(i=0; i<MY_ARRAY_SIZE; ++i)
new(my_array+i) my_object(i);
}
catch(...) //Exception safety.
{
for(i; i>0; --i) //The i-th object was not successfully constructed
my_array[i-1].~T();
throw;
}
//... Do stuff with the array
for(int i=MY_ARRAY_SIZE; i>=0; --i)
my_array[i].~T();
free(my_array);
Objectively speaking, which one of these is easier to read and understand what's going on?
Just look at this statement: (my_object *)malloc(sizeof(my_object) * MY_ARRAY_SIZE). This is a very low level thing. You're not allocating an array of anything; you're allocating a hunk of memory. You have to manually compute the size of the hunk of memory to match the size of the object * the number of objects you want. It even features a cast.
By contrast, new my_object[10] tells the story. new is the C++ keyword for "create instances of types". my_object[10] is a 10 element array of my_object type. It's simple, obvious, and intuitive. There's no casting, no computing of byte sizes, nothing.
The malloc method requires learning how to use malloc idiomatically. The new method requires just understanding how new works. It's much less verbose and much more obvious what's going on.
Furthermore, after the malloc statement, you do not in fact have an array of objects. malloc simply returns a block of memory that you have told the C++ compiler to pretend is a pointer to an object (with a cast). It isn't an array of objects, because objects in C++ have lifetimes. And an object's lifetime does not begin until it is constructed. Nothing in that memory has had a constructor called on it yet, and therefore there are no living objects in it.
my_array at that point is not an array; it's just a block of memory. It doesn't become an array of my_objects until you construct them in the next step. This is incredibly unintuitive to a new programmer; it takes a seasoned C++ hand (one who probably learned from C) to know that those aren't live objects and should be treated with care. The pointer does not yet behave like a proper my_object*, because it doesn't point to any my_objects yet.
By contrast, you do have living objects in the new[] case. The objects have been constructed; they are live and fully-formed. You can use this pointer just like any other my_object*.
Fin
None of the above says that this mechanism isn't potentially useful in the right circumstances. But it's one thing to acknowledge the utility of something in certain circumstances. It's quite another to say that it should be the default way of doing things.
If you do not want to get your memory initialized by implicit constructor calls, and just need an assured memory allocation for placement new then it is perfectly fine to use malloc and free instead of new[] and delete[].
The compelling reasons of using new over malloc is that new provides implicit initialization through constructor calls, saving you additional memset or related function calls post an malloc And that for new you do not need to check for NULL after every allocation, just enclosing exception handlers will do the job saving you redundant error checking unlike malloc.
These both compelling reasons do not apply to your usage.
which one is performance efficient can only be determined by profiling, there is nothing wrong in the approach you have now. On a side note I don't see a compelling reason as to why use malloc over new[] either.
I would say neither.
The best way to do it would be:
std::vector<my_object> my_array;
my_array.reserve(MY_ARRAY_SIZE);
for (int i=0;i<MY_ARRAY_SIZE;++i)
{ my_array.push_back(my_object(i));
}
This is because internally vector is probably doing the placement new for you. It also managing all the other problems associated with memory management that you are not taking into account.
You've reimplemented new[]/delete[] here, and what you have written is pretty common in developing specialized allocators.
The overhead of calling simple constructors will take little time compared the allocation. It's not necessarily 'much more efficient' -- it depends on the complexity of the default constructor, and of operator=.
One nice thing that has not been mentioned yet is that the array's size is known by new[]/delete[]. delete[] just does the right and destructs all elements when asked. Dragging an additional variable (or three) around so you exactly how to destroy the array is a pain. A dedicated collection type would be a fine alternative, however.
new[]/delete[] are preferable for convenience. They introduce little overhead, and could save you from a lot of silly errors. Are you compelled enough to take away this functionality and use a collection/container everywhere to support your custom construction? I've implemented this allocator -- the real mess is creating functors for all the construction variations you need in practice. At any rate, you often have a more exact execution at the expense of a program which is often more difficult to maintain than the idioms everybody knows.
IMHO there both ugly, it's better to use vectors. Just make sure to allocate the space in advance for performance.
Either:
std::vector<my_object> my_array(MY_ARRAY_SIZE);
If you want to initialize with a default value for all entries.
my_object basic;
std::vector<my_object> my_array(MY_ARRAY_SIZE, basic);
Or if you don't want to construct the objects but do want to reserve the space:
std::vector<my_object> my_array;
my_array.reserve(MY_ARRAY_SIZE);
Then if you need to access it as a C-Style pointer array just (just make sure you don't add stuff while keeping the old pointer but you couldn't do that with regular c-style arrays anyway.)
my_object* carray = &my_array[0];
my_object* carray = &my_array.front(); // Or the C++ way
Access individual elements:
my_object value = my_array[i]; // The non-safe c-like faster way
my_object value = my_array.at(i); // With bounds checking, throws range exception
Typedef for pretty:
typedef std::vector<my_object> object_vect;
Pass them around functions with references:
void some_function(const object_vect& my_array);
EDIT:
IN C++11 there is also std::array. The problem with it though is it's size is done via a template so you can't make different sized ones at runtime and you cant pass it into functions unless they are expecting that exact same size (or are template functions themselves). But it can be useful for things like buffers.
std::array<int, 1024> my_array;
EDIT2:
Also in C++11 there is a new emplace_back as an alternative to push_back. This basically allows you to 'move' your object (or construct your object directly in the vector) and saves you a copy.
std::vector<SomeClass> v;
SomeClass bob {"Bob", "Ross", 10.34f};
v.emplace_back(bob);
v.emplace_back("Another", "One", 111.0f); // <- Note this doesn't work with initialization lists ☹
Oh well, I was thinking that given the number of answers there would be no reason to step in... but I guess I am drawn in as the others. Let's go
Why your solution is broken
C++11 new facilities for handling raw memory
Simpler way to get this done
Advices
1. Why your solution is broken
First, the two snippets you presented are not equivalent. new[] just works, yours fails horribly in the presence of Exceptions.
What new[] does under the cover is that it keeps track of the number of objects that were constructed, so that if an exception occurs during say the 3rd constructor call it properly calls the destructor for the 2 already constructed objects.
Your solution however fails horribly:
either you don't handle exceptions at all (and leak horribly)
or you just try to call the destructors on the whole array even though it's half built (likely crashing, but who knows with undefined behavior)
So the two are clearly not equivalent. Yours is broken
2. C++11 new facilities for handling raw memory
In C++11, the comittee members have realized how much we liked fiddling with raw memory and they have introduced facilities to help us doing so more efficiently, and more safely.
Check cppreference's <memory> brief. This example shows off the new goodies (*):
#include <iostream>
#include <string>
#include <memory>
#include <algorithm>
int main()
{
const std::string s[] = {"This", "is", "a", "test", "."};
std::string* p = std::get_temporary_buffer<std::string>(5).first;
std::copy(std::begin(s), std::end(s),
std::raw_storage_iterator<std::string*, std::string>(p));
for(std::string* i = p; i!=p+5; ++i) {
std::cout << *i << '\n';
i->~basic_string<char>();
}
std::return_temporary_buffer(p);
}
Note that get_temporary_buffer is no-throw, it returns the number of elements for which memory has actually been allocated as a second member of the pair (thus the .first to get the pointer).
(*) Or perhaps not so new as MooingDuck remarked.
3. Simpler way to get this done
As far as I am concered, what you really seem to be asking for is a kind of typed memory pool, where some emplacements could not have been initialized.
Do you know about boost::optional ?
It is basically an area of raw memory that can fit one item of a given type (template parameter) but defaults with having nothing in instead. It has a similar interface to a pointer and let you query whether or not the memory is actually occupied. Finally, using the In-Place Factories you can safely use it without copying objects if it is a concern.
Well, your use case really looks like a std::vector< boost::optional<T> > to me (or perhaps a deque?)
4. Advices
Finally, in case you really want to do it on your own, whether for learning or because no STL container really suits you, I do suggest you wrap this up in an object to avoid the code sprawling all over the place.
Don't forget: Don't Repeat Yourself!
With an object (templated) you can capture the essence of your design in one single place, and then reuse it everywhere.
And of course, why not take advantage of the new C++11 facilities while doing so :) ?
You should use vectors.
Dogmatic or not, that is exactly what ALL the STL container do to allocate and initialize.
They use an allocator then allocates uninitialized space and initialize it by means of the container constructors.
If this (like many people use to say) "is not c++" how can be the standard library just be implemented like that?
If you just don't want to use malloc / free, you can allocate "bytes" with just new char[]
myobjet* pvext = reinterpret_cast<myobject*>(new char[sizeof(myobject)*vectsize]);
for(int i=0; i<vectsize; ++i) new(myobject+i)myobject(params);
...
for(int i=vectsize-1; i!=0u-1; --i) (myobject+i)->~myobject();
delete[] reinterpret_cast<char*>(myobject);
This lets you take advantage of the separation between initialization and allocation, still taking adwantage of the new allocation exception mechanism.
Note that, putting my first and last line into an myallocator<myobject> class and the second ands second-last into a myvector<myobject> class, we have ... just reimplemented std::vector<myobject, std::allocator<myobject> >
What you have shown here is actually the way to go when using a memory allocator different than the system general allocator - in that case you would allocate your memory using the allocator (alloc->malloc(sizeof(my_object))) and then use the placement new operator to initialize it. This has many advantages in efficient memory management and quite common in the standard template library.
If you are writing a class that mimics functionality of std::vector or needs control over memory allocation/object creation (insertion in array / deletion etc.) - that's the way to go. In this case, it's not a question of "not calling default constructor". It becomes a question of being able to "allocate raw memory, memmove old objects there and then create new objects at the olds' addresses", question of being able to use some form of realloc and so on. Unquestionably, custom allocation + placement new are way more flexible... I know, I'm a bit drunk, but std::vector is for sissies... About efficiency - one can write their own version of std::vector that will be AT LEAST as fast ( and most likely smaller, in terms of sizeof() ) with most used 80% of std::vector functionality in, probably, less than 3 hours.
my_object * my_array=new my_object [10];
This will be an array with objects.
my_object * my_array=(my_object *)malloc(sizeof(my_object)*MY_ARRAY_SIZE);
This will be an array the size of your objects, but they may be "broken". If your class has virtual funcitons for instance, then you won't be able to call those. Note that it's not just your member data that may be inconsistent, but the entire object is actully "broken" (in lack of a better word)
I'm not saying it's wrong to do the second one, just as long as you know this.

c++: Excessive copying of large objects

While there is quite a few questions about copy constructors/assignment operators on SO already, I did not find an answer that fit my problem.
I have a class like
class Foo
{
// ...
private:
std::vector<int> vec1;
std::vector<int> vec2;
boost::bimap<unsigned int, unsigned int> bimap;
// And a couple more
};
Now it seems that there is some quite excessive copying going on (based on profile data).. So my question is how to best tackle this?
Should I implement custom copy constructor/assignment operator and use swap? Or should I define my own swap method and use that (where appropriate) instead of assignment?
As I am not a c++ expert, examples that show how to properly handle this situation are greatly appreciated.
UPDATE: It appears I was not terribly clear.. Let me try to explain. The program is basically an on-the-fly breadth-first search program, and for each step taken I need to store metadata about the step (which is the Foo class).. Now the problem is that there is (usually) exponentially steps, so you can imagine a large number of these objects needs to be stored.. I do pass by (const) reference always as far as I know.. Each time I calculate a successor from a node in the graph I need to create and store ONE Foo object (however, some of the data members will be added to this one foo further on in the processing of this successor)..
My profile data shows roughly something like this (I don't have the actual numbers on this machine):
SearchStrategy::Search 13s
FooStore::Save 10s
So you can see I spend nearly as much time saving this meta data as I do searching through the graph.. Oh, and FooStore saves Foo in a google::sparse_hash_map<long long, Foo, boost::hash<long long> >.
Compiler is g++4.4 or g++4.5 (I'm not at my dev. machine, so I cannot check at the moment)..
UPDATE 2 I assign some of the members after construction to a Foo instance like
void SetVec1(const std::vector<int>& vec1) { this->vec1 = vec1; };
I guess tomorrow, I should change this to use the swap method, which should definitely improve this a bit..
I'm sorry if I'm not entirely clear about what semantics I'm trying to achieve, but the reason is that I am not quite sure.
Regards,
Morten
Everything depends on what copying this object means in your case :
it means copying it's whole value
it means the copied object will refer to the same content
If it's 1, then this class seem correct. You're not very clear about the operations that you say does make lot of copies so I'm assuming you try to copy the whole object.
If it's 2, then you need to use something like shared_ptr to share the containers between the objects. Just using shared_ptr instead of real objects as member will implicitely allow the buffers to be refered by both objects (the copy and the copied).
That's the easier way (using boost::shared_ptr or std::shared_ptr if you have a C++0x enabled compiler providing it).
There are harder ways but they will certainly become a problem later.
Of course, and everyone says this, don't optimize prematurely. Don't bother with this until and unless you prove a) that your program goes too slowly, and b) it would go faster if you didn't copy so much data.
If your program design requires you to hold multiple simultaneous copies of the data, there is nothing you can do. You just have to bite the bullet and copy the data. No, implementing a custom copy constructor and custom assignment operator won't make it go faster.
If your program doesn't require multiple simultaneous copies of this data, then you do have a couple of tricks to reduce the number of copies you perform.
Instrument your copy methods If it were me, the first thing I would do, even before trying to improve anything, is to count the number of times my copy methods were
invoked.
class Foo {
private:
static int numberOfConstructors;
static int numberofCopyConstructors;
static int numberofAssignments;
Foo() { ++numberOfConstructors; ...; }
Foo(const Foo& f) : vec1(f.vec1), vec2(f.vec2), bimap(f.bimap) {
++numberOfCopyConstructors;
...;
}
Foo& operator=(const Foo& f) {
++numberOfAssignments;
...;
}
};
Run your program with and without your improvements. Print out the value of those static members to see if your changes had any effect.
Avoid assignments in function calls by using references If you pass objects of type Foo to functions, consider if you can do it by reference. If you don't change the passed copy, passing it by const reference is a no-brainer.
// WAS:
extern SomeFuncton(Foo f);
// EASY change -- if this compiles, you know that it is correct
extern SomeFunction(const Foo& f);
// HARD change -- you have to examine your code to see if this is safe
extern SomeFunction(Foo& f);
Avoid copies by using Foo::swap If you use the copy methods (either explicitly or implicitly) a lot, consider whether the assigned-from item could give up its data, rather than copying it.
// Was:
vectorOfFoo.push_back(myFoo);
// maybe faster:
vectorOfFoo.push_back(Foo());
vectorOfFoo.back().swap(myFoo);
// Was:
newFoo = oldFoo;
// maybe faster
newfoo.swap(oldFoo);
Of course, this only works if myFoo and oldFoo no longer need access to their data. And, you have to implement Foo::swap
void Foo::swap(Foo& old) {
std::swap(this->vec1, old.vec1);
std::swap(this->vec2, old.vec2);
...
}
Whatever you do, measure your program before and after your change. Measure the number of times your copy methods are invoked, and the total time improvement in your program.
Your class doesn't seem that bad, but you do not show how you use it.
If there is lots of copying, then you need to pass objects of those class by reference (or if possible const reference).
If that class has to be copied, then you can not do anything.
If it's really a problem, you might consider implementing the pimpl idiom. But I doubt it's a problem, though I'd have to see your use of the class to be sure.
Copying of huge vectors unlikely can be cheap. The most promising way is to copy rarer. While it's quite easy (may be too easy) in C++ to invoke copy without intention, there are ways to avoid needless copying:
passing by const and non-const reference
move-constructors
smart pointers with ownership transfer
These techniques may leave only copies which are required by algorithm.
Sometimes it's possible to avoid even some of those copying. For example, if you need two objects where the second one is reversed copy of the first one, a wrapper object may be created which acts like reversed, but instead of storing entire copy has only a reference.
The obvious way to reduce copying is to use something like a shared_ptr. With multithreading, however, this cure can be worse than the disease -- incrementing and decrementing reference counts needs to be done atomically, which can be quite expensive. If, however, you typically end up modifying the copies and need each copy to act unique (i.e., modifying a copy doesn't affect the original) you can end up with worse performance still, paying for the atomic increment/decrement for reference counting, and still doing lots of copies anyway.
There are a couple of obvious ways to avoid that. One is to move unique objects instead of copying at all -- this is great if you can make it work. Another is to use non-atomic reference counting most of the time, and do deep copies only when moving data between threads.
There is no one answer that'a universal and really clean though.

Container of Pointers vs Container of Objects - Performance

I was wondering if there is any difference in performance when you compare/contrast
A) Allocating objects on the heap, putting pointers to those objects in a container, operating on the container elsewhere in the code
Ex:
std::list<SomeObject*> someList;
// Somewhere else in the code
SomeObject* foo = new SomeObject(param1, param2);
someList.push_back(foo);
// Somewhere else in the code
while (itr != someList.end())
{
(*itr)->DoStuff();
//...
}
B) Creating an object, putting it in a container, operating on that container elsewhere in the code
Ex:
std::list<SomeObject> someList;
// Somewhere else in the code
SomeObject newObject(param1, param2);
someList.push_back(newObject);
// Somewhere else in the code
while (itr != someList.end())
{
itr->DoStuff();
...
}
Assuming the pointers are all deallocated correctly and everything works fine, my question is...
If there is a difference, what would yield better performance, and how great would the difference be?
There is a performance hit when inserting objects instead of pointers to objects.
std::list as well as other std containers make a copy of the parameter that you store (for std::map both key and value is copied).
As your someList is a std::list the following line copies your object:
Foo foo;
someList.push_back(foo); // copy foo object
It will get copied again when you retrieve it from list. So you are making of copies of the whole object compared to making copies of pointer when using:
Foo * foo = new Foo();
someList.push_back(foo); // copy of foo*
You can double check by inserting print statements into Foo's constructor, destructor, copy constructor.
EDIT: As mentioned in comments, pop_front does not return anything. You usually get reference to front element with front then you pop_front to remove the element from list:
Foo * fooB = someList.front(); // copy of foo*
someList.pop_front();
OR
Foo fooB = someList.front(); // front() returns reference to element but if you
someList.pop_front(); // are going to pop it from list you need to keep a
// copy so Foo fooB = someList.front() makes a copy
Like most performance questions, this doesn't have one clear cut answer.
For one thing, it depends on what exactly you're doing with the list. Pointers might make it easier to do various operations (like sorting). That's because comparing pointers and swapping pointers is probably going to be faster than comparing/swapping SomeObject (of course, it depends on the implementation of SomeObject).
On the other hand, dynamic memory allocation tends to be worse than allocating on the stack. So, assuming you have enough memory on the stack for all the objects, that's another thing to consider.
In the end, I would personally recommend the best piece of advice I've ever gotten: It's pointless trying to guess what will perform better. Code it the way that makes the most sense (easiest to implement/maintain). If, and only if* you later discover there is a performance problem, run a profiler and figure out why. Chances are, most programs won't need all these optimizations, and this will turn out to be a moot point.
It depends how you use the list. Do you just fill it with stuff, and do lookups, or do you insert and remove data regularly. Lookups may be marginally faster without pointers, while adding and removing elements will be faster with pointers.
With objects it is going to be memberwise copy (thus new object creation and copy of members) assuming there aren't any copy constructors and = operator overloads. Therefore, using pointers is efficient std::auto_ptr or boost's smart pointers better, but that is beyond the scope of this question.
If you still have to use object syntax using reference.
Some additional things to consider (You have already been made aware of the copy semantics of STL containers):
Are your objects really smaller than pointers to them? This becomes more relevant if you use any kind of smart pointer as those have a tendency to be larger.
Copy operations are (often?) optimized to use memcpy() by the compiler. Especially this is probably not true for smart pointers.
Additional dereferencing caused by pointers
All the things I have mentioned are micro optimizations considerations and I'd discourage even thinking about them and go with them. On the other hand: A lot of my claims would need verification and would make for interesting test cases. Feel free to benchmark them.