Why is C++ "known" for making lots of copies? - c++

Quite recently, I ran across an old (but still funny) "If Programming Languages Were Essays" comic. I'm quite familiar with the majority of the languages on it, but I was a little confused about the one on C++.
Having just started C++ recently, I wasn't entirely sure why C++ is known for making tonnes of copies of objects. I went to do a little research, and found that when arguments are passed by value, a copy of the object is passed. However, plenty of languages do passing by value as a default so I don't think I'm hitting the right reason. As well, I got into copy constructors and how C++ has (unlike Java) a default copy constructor that does shallow copies, but that doesn't have me convinced either.
Can anybody shed some light on this conception of C++?

Pass by value and return by value is something C++ inherited from C. It was simple because data types in C (structs) were essentially packs of data. With some hacking with function pointers you can get "member functions", but it's not really the same as structs in C++.
However, as of the new standard, move semantics enables "moves" rather than copies. Moving an object from one place to another involves casting it down to an rvalue reference using std::move, and then passing it to either a move constructor or a move assignment operator that takes in an rvalue reference.
However, moving an object from one place to the other leaves it's "original" state in a valid but unknown state. For example, moving std::string objects from one place to another (i.e to another variable) yields the "original" to be the empty string.

Pass by value is not a problem for languages which default to making everything a reference.
But in C++, parameters default to value types unless you explicitly specify that they are taken by reference.
Furthermore, full copies happen on every assignment unless the assignment operator is overwritten to do something else.
class Foo {
int a;
double d;
uint64_t z;
}
Foo foo;
Foo bar = foo; // just made a copy of all of the guts of Foo.
In Java, that would have been assigning a reference.

Related

C++ pass-by-value with non-primitive types?

I must have a fundamental misunderstanding about C++11. My professors told me it wasn't possible to pass a non-primitive type to a function except by reference or pointer. However, the following code works just fine
#include <iostream>
using namespace std;
class MyClass
{
public:
int field1;
};
void print_string(string s) {
cout << s << endl;
}
void print_myclass(MyClass c) {
cout << c.field1 << endl;
}
int main(int argc, char *argv[])
{
string mystr("this is my string");
print_string(mystr); // works
MyClass m;
m.field1=9;
print_myclass(m);
return 0;
}
Running the program yields the following output
this is my string
9
RUN SUCCESSFUL (total time: 67ms)
I'm using MinGW/g++ on Win7
Why does that work? I thought non-primitive types couldn't be passed by value?!
Non-primitive types can certainly be passed by value. (This is covered in section 5.2.2 [expr.call] of the C++ Standard.)
However, there are a few reasons why this is often discouraged, especially in C++03 code.
First, for large objects, it is less efficient to do so (when compared with passing by reference), as the data is passed on the stack. A reference will take one word on the stack, so passing any object via the stack which is larger than one word will necessarily be slower.
Second, passing by value invokes the copy constructor (or, as #templatetypedef points out, potentially the move constructor in C++11). This additional processing could incur a certain amount of overhead.
Third, you may have intended to modify the passed in object, but by passing in a copy (by value), any changes you make within the function will not affect the original object. So it is important to get the semantics correct (ie. whether or not you want to modify the original). Hence this is a potential bug in some circumstances.
Finally, if there is a poorly written class with no copy constructor or assignment operator, the compiler will automatically generate a default one for you. This will perform a shallow copy, which could cause problems such as memory leaks. This is yet another good reason why it is very important to implement these special methods. Full details are in this article:
The Rule of Three in C++
In general for C++03 code, you would normally pass by a const& reference if you don't intend to modify the object, or by normal & reference if you need to modify the object. Use a pointer if the parameter is optional.
Some good answers and discussion are also found in these questions, especially the discussion on move semantics:
Pass by Reference / Value in C++
C++: Reasons for passing objects by value
What are move semantics?
What's the difference between passing by reference vs. passing by value?
A complete answer for C++11 is more complicated:
Is pass-by-value a reasonable default in C++11?
Probably the best summary of which approach to use:
How to pass objects to functions in C++?
Your professor is just flat out wrong, maybe he was thinking about JAVA or C#? Everything is passed by value in C++. To pass something by reference you need to pass it with the & modifier.
Non-primitive types can indeed be passed by value in C++. If you try to do this, C++ will use a special function called the copy constructor (or in some cases in C++11, the move constructor) to initialize the parameter as a copy of the argument. Writing copy constructors and assignment operators is known to be a tricky part of C++ (getting it wrong is easy and getting it right is hard), so it may be the case that the professors were trying to discourage you from doing so. Failing to write a copy constructor or doing so incorrectly can easily lead to program crashes and is a common source of confusion for new C++ programmers.
I'd suggest doing a Google search for "C++ Rule of 3" or "copy constructor assignment operator" to learn more about how to write functions that copy objects intelligently. It takes a bit to get up to speed with how to do this, but once you understand the concepts it's not too hard.
Hope this helps!

Does D have something akin to C++0x's move semantics?

A problem of "value types" with external resources (like std::vector<T> or std::string) is that copying them tends to be quite expensive, and copies are created implicitly in various contexts, so this tends to be a performance concern. C++0x's answer to this problem is move semantics, which is conceptionally based on the idea of resource pilfering and technically powered by rvalue references.
Does D have anything similar to move semantics or rvalue references?
I believe that there are several places in D (such as returning structs) that D manages to make them moves whereas C++ would make them a copy. IIRC, the compiler will do a move rather than a copy in any case where it can determine that a copy isn't needed, so struct copying is going to happen less in D than in C++. And of course, since classes are references, they don't have the problem at all.
But regardless, copy construction already works differently in D than in C++. Generally, instead of declaring a copy constructor, you declare a postblit constructor: this(this). It does a full memcpy before this(this) is called, and you only make whatever changes are necessary to ensure that the new struct is separate from the original (such as doing a deep copy of member variables where needed), as opposed to creating an entirely new constructor that must copy everything. So, the general approach is already a bit different from C++. It's also generally agreed upon that structs should not have expensive postblit constructors - copying structs should be cheap - so it's less of an issue than it would be in C++. Objects which would be expensive to copy are generally either classes or structs with reference or COW semantics.
Containers are generally reference types (in Phobos, they're structs rather than classes, since they don't need polymorphism, but copying them does not copy their contents, so they're still reference types), so copying them around is not expensive like it would be in C++.
There may very well be cases in D where it could use something similar to a move constructor, but in general, D has been designed in such a way as to reduce the problems that C++ has with copying objects around, so it's nowhere near the problem that it is in C++.
I think all answers completely failed to answer the original question.
First, as stated above, the question is only relevant for structs. Classes have no meaningful move. Also stated above, for structs, a certain amount of move will happen automatically by the compiler under certain conditions.
If you wish to get control over the move operations, here's what you have to do. You can disable copying by annotating this(this) with #disable. Next, you can override C++'s constructor(constructor &&that) by defining this(Struct that). Likewise, you can override the assign with opAssign(Struct that). In both cases, you need to make sure that you destroy the values of that.
For assignment, since you also need to destroy the old value of this, the simplest way is to swap them. An implementation of C++'s unique_ptr would, therefore, look something like this:
struct UniquePtr(T) {
private T* ptr = null;
#disable this(this); // This disables both copy construction and opAssign
// The obvious constructor, destructor and accessor
this(T* ptr) {
if(ptr !is null)
this.ptr = ptr;
}
~this() {
freeMemory(ptr);
}
inout(T)* get() inout {
return ptr;
}
// Move operations
this(UniquePtr!T that) {
this.ptr = that.ptr;
that.ptr = null;
}
ref UniquePtr!T opAssign(UniquePtr!T that) { // Notice no "ref" on "that"
swap(this.ptr, that.ptr); // We change it anyways, because it's a temporary
return this;
}
}
Edit:
Notice I did not define opAssign(ref UniquePtr!T that). That is the copy assignment operator, and if you try to define it, the compiler will error out because you declared, in the #disable line, that you have no such thing.
D have separate value and object semantics :
if you declare your type as struct, it will have value semantic by default
if you declare your type as class, it will have object semantic.
Now, assuming you don't manage the memory yourself, as it's the default case in D - using a garbage collector - you have to understand that object of types declared as class are automatically pointers (or "reference" if you prefer) to the real object, not the real object itself.
So, when passing vectors around in D, what you pass is the reference/pointer. Automatically. No copy involved (other than the copy of the reference).
That's why D, C#, Java and other language don't "need" moving semantic (as most types are object semantic and are manipulated by reference, not by copy).
Maybe they could implement it, I'm not sure. But would they really get performance boost as in C++? By nature, it don't seem likely.
I somehow have the feeling that actually the rvalue references and the whole concept of "move semantics" is a consequence that it's normal in C++ to create local, "temporary" stack objects. In D and most GC languages, it's most common to have objects on the heap, and then there's no overhead with having a temporary object copied (or moved) several times when returning it through a call stack - so there's no need for a mechanism to avoid that overhead too.
In D (and most GC languages) a class object is never copied implicitly and you're only passing the reference around most of the time, so this may mean that you don't need any rvalue references for them.
OTOH, struct objects are NOT supposed to be "handles to resources", but simple value types behaving similar to builtin types - so again, no reason for any move semantics here, IMHO.
This would yield a conclusion - D doesn't have rvalue refs because it doesn't need them.
However, I haven't used rvalue references in practice, I've only had a read on them, so I might have skipped some actual use cases of this feature. Please treat this post as a bunch of thoughts on the matter which hopefully would be helpful for you, not as a reliable judgement.
I think if you need the source to loose the resource you might be in trouble. However being GC'ed you can often avoid needing to worry about multiple owners so it might not be an issue for most cases.

Why does void setOutputFormat(ostream out, int decimal_places) cause an error?

If I change it to void setOutputFormat(ostream& out, int decimal_places),
with a call by reference, it works. I don't understand why though?
What is the difference between a struct and a class, besides struct members are by default public, and class members are by default private?
You're right that there is no difference between class and struct, except the default private vs private.
The problem here is that ostream doesn't have a copy constructor, so you can't pass it by value.
When you attempt to pass the ostream by value, you attempt to make a copy of the stream, which is not valid because stream objects are noncopyable, that is, they do not define a copy constructor. When you pass the stream by reference, however, the function receives a modifiable alias to the ostream instance. Take for instance:
void increment(int n) {
// Increment local copy of value.
++n;
}
int x = 5;
increment(x);
// x is still 5.
Versus:
void increment(int& n) {
// Increment value itself.
++n;
}
int x = 5;
increment(x);
// x is now 6.
So passing the stream by reference is the only way that makes sense, since you want setOutputFormat to modify the original stream in-place. Hope this clarifies the issue somewhat.
As other said, you're trying to create a copy of a noncopyable object (the stream), which results in that error.
In C++ when you pass a var as a parameter, you make a copy of it (opposed to C#, where, for reference types, you're always implicitly passing a reference to it).
By default C++ provides a bitwise copy constructor for every class, but often it's not what is required: think, for example, to a class that owns a resource handle: if you make a perfect clone of an object of that type you'll have two class who think to own such resource, and both will try to destroy it at their destruction, which clearly isn't nice.
Because of this, C++ lets you provide a copy constructor for each class, which is called when a copy of an object has to be created. Since for many objects (streams included) creating copies isn't desired (because it makes no sense, because it's not convenient or because the trouble isn't worth the work) often the copy constructor is disabled (by marking it as private or protected), and you can't create copies of such objects.
Moreover, in general you must be careful with assignments and copies by value with object belonging to complicated class hierarchies, because you may incur in object slicing and other subtle problems. Actually, it's common practice to block copy and assignment in classes intended to be base classes.
The solution, in most cases (including yours) is to pass such objects by reference, thus avoiding making copies at all; see #Jon Purdy's answer for an example.
By the way, often even with copyable objects (e.g. std::strings) it's better to just pass references, to avoid all the work associated with copying; if you're passing a reference just for the sake of efficiency but you don't want to have your object modified, the best solution usually is a const reference.
Copies are also used in some other places in C++; I advise you to have a look at wikipedia page about copy constructors to understand a bit better what's going on, but, over all, to grab a C++ book and read it: C# is different from C++ in a lot of ways, and there are many fake-similarities that may confuse you.

Would this constructor be acceptable practice?

Let's assume I have a c++ class that have properly implemented a copy constructor and an overloaded = operator. By properly implemented I mean they are working and perform a deep copy:
Class1::Class1(const Class1 &class1)
{
// Perform copy
}
Class1& Class1::operator=(const Class1 *class1)
{
// perform copy
return *this;
}
Now lets say I have this constructor as well:
Class1::Class1(Class1 *class1)
{
*this = *class1;
}
My question is would the above constructor be acceptable practice? This is code that i've inherited and maintaining.
I would say, "no", for the following reasons:
A traditional copy constructor accepts its argument as a const reference, not as a pointer.
Even if you were to accept a pointer as a parameter, it really ought to be const Class1* to signify that the argument will not be modified.
This copy constructor is inefficient (or won't work!) because all members of Class1 are default-initialized, and then copied using operator=
operator= has the same problem; it should accept a reference, not a pointer.
The traditional way to "re-use" the copy constructor in operator= is the copy-and-swap idiom. I would suggest implementing the class that way.
Personally, I don't think it's good practice.
For the constructor, it's hard to think of a place where an implicit conversion from a pointer to an object to the object itself would be useful.
There's no reason for the pointer to be to non-const, and if you have available pointer to the class it is not hard to dereference it, and so clearly state your intention of wanting to copy the object using the copy constructor.
Similarly, for the non-standard assignment operator why allow assignment from a pointer when correctly dereferencing at the call site is clearer and more idiomatic?
I believe a somewhat more important issue than what has been discussed so far is that your non-standard assignment operator does not stop the compiler from generating the standard one. Since you've decided that you need to create an assignment operator (good bet since you made the copy constructor), the default is almost certainly not sufficient. Thus a user of this class could fall prey to this problem during what would seem very basic and standard use of an object to almost anyone.
Objects and pointers to objects are two very different things. Typically, when you're passing objects around, you expect that they're going to be copied (though, ideally functions would take const refs where possible to reduce/eliminate unnecessary copies). When you're passing a pointer around, you don't expect any copying to take place. You're passing around a pointer to a specific object and, depending on the code, it could really matter that you deal with that specific object and not a copy of it.
Assignment operators and constructors that take pointers to the type - especially constructors which can be used for implicit conversion - are really going to muddle things and stand a high chance of creating unintended copies, which not only could be a performance issue, but it could cause bugs.
I can't think of any good reason why you would ever want to make the conversion between a pointer to a type and the type itself implicit - or even explicit. The built-in way to do that is to dereference the object. I suppose that there might be some set of specific circumstances which I can't think of where this sort of thing might be necessary or a good idea, but I really doubt it. Certainly, I would strongly advise against doing it unless you have a specific and good reason for doing so.

Is there anything wrong with returning default constructed values?

Suppose I have the following code:
class some_class{};
some_class some_function()
{
return some_class();
}
This seems to work pretty well and saves me the trouble of having to declare a variable just to make a return value. But I don't think I've ever seen this in any kind of tutorial or reference. Is this a compiler-specific thing (visual C++)? Or is this doing something wrong?
No this is perfectly valid. This will also be more efficient as the compiler is actually able to optimise away the temporary.
Returning objects from a function call is the "Factory" Design Pattern, and is used extensively.
However, you will want to be careful whether you return objects, or pointers to objects. The former of these will introduce you to copy constructors / assignment operators, which can be a pain.
It is valid, but performance may not be ideal depending on how it is called.
For example:
A a;
a = fn();
and
A a = fn();
are not the same.
In the first case the default constructor is called, and then the assignment operator is invoked on a which requires a temporary variable to be constructed.
In the second case the copy constructor is used.
An intelligent enough compiler will work out what optimizations are possible. But, if the copy constructor is user supplied then I don't see how the compiler can optimize out the temporary variable. It has to invoke the copy constructor, and to do that it has to have another instance.
The difference between Rob Walker's example is called Return Value Optimisation (RVO) if you want to google for it.
Incidentally, if you want to enure your object gets returned in the most efficient manner, create the object on the heap (ie via new) using a shared_ptr and return a shared_ptr instead. The pointer gets returned and reference counts correctly.
That is perfectly reasonable C++.
This is perfectly legal C++ and any compiler should accept it. What makes you think it might be doing something wrong?
That's the best way to do it if your class is pretty lightweight - I mean that it isn't very expensive to make a copy of it.
One side effect of that method though is that it does tend to make it more likely to have temporary objects created, although that can depend on how well the compiler can optimize things.
For more heavyweight classes that you want to make sure are not copied (say for example a large bitmap image) then it is a good idea to pass stuff like that around as a reference parameter which then gets filled in, just to make absolutely sure that there won't be any temporary objects created.
Overall it can happen that simplifying syntax and making things turned more directly can have a side effect of creating more temporary objects in expressions, just something that you should keep in mind when designing the interfaces for more heavyweight objects.