Consider the sample application below. It demonstrates what I would call a flawed class design.
#include <iostream>
using namespace std;
struct B
{
B() : m_value(1) {}
long m_value;
};
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// assert(this != &b);
m_B.m_value += b.m_value;
m_B.m_value += b.m_value;
}
protected:
B m_B;
};
int main(int argc, char* argv[])
{
A a;
cout << "Original value: " << a.GetB().m_value << endl;
cout << "Expected value: 3" << endl;
a.Foo(a.GetB());
cout << "Actual value: " << a.GetB().m_value << endl;
return 0;
}
Output:
Original value: 1
Expected value: 3
Actual value: 4
Obviously, the programmer is fooled by the constness of b. By mistake b points to this, which yields the undesired behavior.
My question: What const-rules should you follow when designing getters/setters?
My suggestion: Never return a reference to a member variable if it can be set by reference through a member function. Hence, either return by value or pass parameters by value. (Modern compilers will optimize away the extra copy anyway.)
Obviously, the programmer is fooled by the constness of b
As someone once said, You keep using that word. I do not think it means what you think it means.
Const means that you cannot change the value. It does not mean that the value cannot change.
If the programmer is fooled by the fact that some other code else can change something that they cannot, they need a better grounding in aliasing.
If the programmer is fooled by the fact that the token 'const' sounds a bit like 'constant' but means 'read only', they need a better grounding in the semantics of the programming language they are using.
So if you have a getter which returns a const reference, then it is an alias for an object you don't have the permission to change. That says nothing about whether its value is immutable.
Ultimately, this comes down to a lack of encapsulation, and not applying the Law of Demeter. In general, don't mutate the state of other objects. Send them a message to ask them to perform an operation, which may (depending on their own implementation details) mutate their state.
If you make B.m_value private, then you can't write the Foo you have. You either make Foo into:
void Foo(const B &b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
void B::increment_by (const B& b)
{
// assert ( this != &b ) if you like
m_value += b.m_value;
}
or, if you want to ensure that the value is constant, use a temporary
void Foo(B b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
Now, incrementing a value by itself may or may not be reasonable, and is easily tested for within B::increment_by. You could also test whether &m_b==&b in A::Foo, though once you have a couple of levels of objects and objects with references to other objects rather than values (so &a1.b.c == &a2.b.c does not imply that &a1.b==&a2.b or &a1==&a2), then you really have to just be aware that any operation is potentially aliased.
Aliasing means that incrementing by an expression twice is not the same as incrementing by the value of the expression the first time you evaluated it; there's no real way around it, and in most systems the cost of copying the data isn't worth the risk of avoiding the alias.
Passing in arguments which have the least structure also works well. If Foo() took a long rather than an object which it has to get a long from, then it would not suffer aliasing, and you wouldn't need to write a different Foo() to increment m_b by the value of a C.
I propose a slightly different solution to this that has several advantages (especially in an every increasing, multi-threaded world). Its a simple idea to follow, and that is to "commit" your changes last.
To explain via your example you would simply change the 'A' class to:
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// copy out what we are going to change;
int itm_value = m_b.m_value;
// perform operations on the copy, not our internal value
itm_value += b.m_value;
itm_value += b.m_value;
// copy over final results
m_B.m_value = itm_value ;
}
protected:
B m_B;
};
The idea here is to place all assignment to memory viewable above the current function at the end, where they pretty much can't fail. This way, if an error is thrown (say there was a divide in the middle of those 2 operations, and if it just happens to be 0) in the middle of the operation, then we aren't left with half baked data in the middle.
Furthermore, in a multi-threading situation, you can do all of the operation, and then just check at the end if anything has changed before your "commit" (an optimistic approach, which will usually pass and usually yield much better results than locking the structure for the entire operation), if it has changed, you simply discard the values and try again (or return a value saying it has failed if there is something it can do instead).
On top of this, the compiler can usually optimise this better, because it is no longer required to write the variables being modified to memory (we are only forcing one read of the value to be changed and one write). This way, the compiler has the option of just keeping the relevant data in a register, saves L1 cache access if not cache misses. Otherwise the compiler will probably make it write to the memory as it doesn't know what aliasing might be taking place (so it can't ensure those values stay the same, if they are all local, it knows it can't be aliasing because the current function is the only one that knows about it).
There's a lot of different things that can happen with the original code posted. I wouldn't be surprised if some compilers (with optimizations enabled) will actually produce code that produces the "expected" result, whereas others won't. All of this is simply because the point at which variables, that aren't 'volatile', are actually written/read from memory isn't well defined within the c++ standards.
The real problem here is atomicity. The precondition of the Foo function is that it's argument doesn't change while in use.
If e.g. Foo had been specified with a value-argument i.s.o. reference argument, no problem would have shown.
Frankly, A::Foo() rubs me the wrong way more than your original problem. Anyhow I look at it, it must be B::Foo(). And inside B::Foo() check for this wouldn't be that outlandish.
Otherwise I do not see how one can specify a generic rule to cover that case. And keep teammates sane.
From past experience, I would treat that as a plain bug and would differentiate two cases: (1) B is small and (2) B is large. If B is small, then simply make A::getB() to return a copy. If B is large, then you have no choice but to handle the case that objects of B might be both rvalue and lvalue in the same expression.
If you have such problems constantly, I'd say simpler rule would be to always return a copy of an object instead of a reference. Because quite often, if object is large, then you have to handle it differently from the rest anyway.
My stupid answer, I leave it here just in case someone else comes up with the same bad idea:
The problem is I think that the object referred to is not const (B const & vs const B &), only the reference is const in your code.
Related
I come from a java background but am now working on large C++ code bases. I often see this pattern:
void function(int value, int& result);
And above method is called like so:
int result = 0;
function(42, result);
std::cout << "Result is " << result << std::endl;
In java, the following would be more common:
int result = function(42);
Although the above is perfectly possible in C++, how come the former appears more common (in the codebase I'm working on at least)? Is it stylistic or something more?
First, this used to be an established technique to have more than one output of a function. E.g. in this signature,
int computeNumberButMightFail(int& error_code);
you would have both the payload int as the return value, and a reference to some error variable that is set from within the function to signal an error. It is clear these days that there are better techniques, e.g. std::optional<T> is a good return value, there might be a more flexible std::expected<T, ...>, and with newer C++ standards, we can return multiple values with std::make_tuple and destructure them at the call side with structured bindings. For exceptional error scenarios, the usual approach is to use... well... exceptions.
Second, this is an optimization technique from the days in which (N)RVO wasn't widely available: if the output of a function is an object that is expensive to copy, you wanted to make sure no unnecessary copies are made:
void fillThisHugeBuffer(std::vector<LargeType>& output);
means we pass a reference to the data in order to avoid an unnessecary copy when returning it by value. However, this is outdated, too, and returning large objects by value is usually considered the more idiomatic approach, because C++17 guarantees something called materialization of temporaries and (name) return value optimization is implemented by all major compilers.
See also the core guidelines:
F.20 - "For “out” output values, prefer return values to output parameters".
as far as I know, this case is not common in C++ at least not with primitive data types as return values. There are a few cases to consider:
If you working with plain C or in a very restricted context, where C++ exceptions are not allowed (like realtime applications). Then the return value of a function is often used to indicate the success of a function. An in C could be:
#include <stdio.h>
#include <errno.h>
int func(int arg, int* res) {
if(arg > 10) {
return EINVAL; //this is an error code from errnoe
}
... //do stuff
*res = my_result;
}
This is sometimes used in C++ as well and so the result must by assigned by reference/pointer.
When your result is struct or object which exists before the call of your function and the purpose of your function is to modify attributes inside the struct or object. This is a common pattern because you have to pass the argument by reference (to avoid a copy) anyway. So it is not necessary to return the same object as you pass to the function. An example in C++ could be:
#include <iostream>
struct Point {
int x = 0;
int y = 0;
};
void fill_point(Point& p, int x, int y) {
p.x = x;
p.y = y;
}
int main() {
Point p();
fill_point(p);
return EXIT_SUCCESS;
}
However, this is a trivial and there are better solutions like defining the fill-function as a method in the object. But sometimes with regard to the single-responsible paradigm of objects this pattern is common under more complex circumstances.
In Java you can't control your heap. Every object you define is on the heap and automatically passed by reference to a function. In C++ you have the choice where you wan't your object stored (heap or stack) and how to pass the object to a function. It is important to keep in mind that a pass by value of an object copies it and returning an object from a function by value also copies the object. For returning an object by reference you have to ensure that its lifecycle exceeds the scope of your function by placing it on the heap or by passing it to the function by reference.
Modifiable parameters that receive values as a side effect of a function call are called out parameters. They are generally accepted as a bit archaic, and have fallen somewhat out of fashion as better techniques are available in C++. As you suggested, returning computed values from functions is the ideal.
But real-world constraints sometimes drive people toward out parameters:
returning objects by value is too expensive due to the cost of copying large objects or those with non-trivial copy constructors
returning multiple values, and creating a tuple or struct to contain them is awkward, expensive, or not possible.
When objects cannot be copied (possible private or deleted copy constructor) but must be created "in place"
Most of these issues face legacy code, because C++11 gained "move semantics" and C++17 gained "guaranteed copy elision" which obviate most of these cases.
In any new code, it's usually considered bad style or a code smell to use out parameters, and most likely an acquired habit that carried over from the past (when this was a more relevant technique.) It's not wrong, but one of those things we try to avoid if it's not strictly necessary.
There are several reasons why an out parameter might be used in a C++ codebase.
For example:
You have multiple outputs:
void compute(int a, int b, int &x, int &y) { x=a+b; y=a-b; }
You need the return value for something else: For example, in PEG parsing you might find something like this:
if (parseSymbol(pos,symbolName) && parseToken(pos,"=") && parseExpression(pos,exprNode)) {...}
where the parse functions look like
bool parseSymbol(int &pos, string &symbolName);
bool parseToken(int &pos, const char *token);
and so on.
To avoid object copies.
Programmer didn't knew better.
But basically I think, any answer is opinion based, because it's matter of style and coding policies if and how out-parameters are used or not.
A local variable (say an int) can be stored in a processor register, at least as long as its address is not needed anywhere. Consider a function computing something, say, a complicated hash:
int foo(int const* buffer, int size)
{
int a; // local variable
// perform heavy computations involving frequent reads and writes to a
return a;
}
Now assume that the buffer does not fit into memory. We write a class for computing the hash from chunks of data, calling foo multiple times:
struct A
{
void foo(int const* buffer, int size)
{
// perform heavy computations involving frequent reads and writes to a
}
int a;
};
A object;
while (...more data...)
{
A.foo(buffer, size);
}
// do something with object.a
The example may be a bit contrived. The important difference here is that a was a local variable in the free function and now is a member variable of the object, so the state is preserved across multiple calls.
Now the question: would it be legal for the compiler to load a at the beginning of the foo method into a register and store it back at the end? In effect this would mean that a second thread monitoring the object could never observe an intermediate value of a (synchronization and undefined behavior aside). Provided that speed is a major design goal of C++, this seems to be reasonable behavior. Is there anything in the standard that would keep a compiler from doing this? If no, do compilers actually do this? In other words, can we expect a (possibly small) performance penalty for using a member variable, aside from loading and storing it once at the beginning and the end of the function?
As far as I know, the C++ language itself does not even specify what a register is. However, I think that the question is clear anyway. Whereever this matters, I appreciate answers for a standard x86 or x64 architecture.
The compiler can do that if (and only if) it can prove that nothing else will access a during foo's execution.
That's a non-trivial problem in general; I don't think any compiler attempts to solve it.
Consider the (even more contrived) example
struct B
{
B (int& y) : x(y) {}
void bar() { x = 23; }
int& x;
};
struct A
{
int a;
void foo(B& b)
{
a = 12;
b.bar();
}
};
Looks innocent enough, but then we say
A baz;
B b(baz.a);
baz.foo(b);
"Optimising" this would leave 12 in baz.a, not 23, and that is clearly wrong.
Short answer to "Can a member variable (attribute) reside in a register?": yes.
When iterating through a buffer and writing the temporary result to any sort of primitive, wherever it resides, keeping the temporary result in a register would be a good optimization. This is done frequently in compilers. However, it is implementation based, even influenced by passed flags, so to know the result, you should check the generated assembly.
Compare the following two pieces of code, the first using a reference to a large object, and the second has the large object as the return value. The emphasis on a "large object" refers to the fact that repeated copies of the object, unnecessarily, is wasted cycles.
Using a reference to a large object:
void getObjData( LargeObj& a )
{
a.reset() ;
a.fillWithData() ;
}
int main()
{
LargeObj a ;
getObjData( a ) ;
}
Using the large object as a return value:
LargeObj getObjData()
{
LargeObj a ;
a.fillWithData() ;
return a ;
}
int main()
{
LargeObj a = getObjData() ;
}
The first snippet of code does not require copying the large object.
In the second snippet, the object is created inside the function, and so in general, a copy is needed when returning the object. In this case, however, in main() the object is being declared. Will the compiler first create a default-constructed object, then copy the object returned by getObjData(), or will it be as efficient as the first snippet?
I think the second snippet is easier to read but I am afraid it is less efficient.
Edit: Typically, I am thinking of cases LargeObj to be generic container classes that, for the sake of argument, contains thousands of objects inside of them. For example,
typedef std::vector<HugeObj> LargeObj ;
so directly modifying/adding methods to LargeObj isn't a directly accessible solution.
The second approach is more idiomatic, and expressive. It is clear when reading the code that the function has no preconditions on the argument (it does not have an argument) and that it will actually create an object inside. The first approach is not so clear for the casual reader. The call implies that the object will be changed (pass by reference) but it is not so clear if there are any preconditions on the passed object.
About the copies. The code you posted is not using the assignment operator, but rather copy construction. The C++ defines the return value optimization that is implemented in all major compilers. If you are not sure you can run the following snippet in your compiler:
#include <iostream>
class X
{
public:
X() { std::cout << "X::X()" << std::endl; }
X( X const & ) { std::cout << "X::X( X const & )" << std::endl; }
X& operator=( X const & ) { std::cout << "X::operator=(X const &)" << std::endl; }
};
X f() {
X tmp;
return tmp;
}
int main() {
X x = f();
}
With g++ you will get a single line X::X(). The compiler reserves the space in the stack for the x object, then calls the function that constructs the tmp over x (in fact tmp is x. The operations inside f() are applied directly on x, being equivalent to your first code snippet (pass by reference).
If you were not using the copy constructor (had you written: X x; x = f();) then it would create both x and tmp and apply the assignment operator, yielding a three line output: X::X() / X::X() / X::operator=. So it could be a little less efficient in cases.
Use the second approach. It may seem that to be less efficient, but the C++ standard allows the copies to be evaded. This optimization is called Named Return Value Optimization and is implemented in most current compilers.
Yes in the second case it will make a copy of the object, possibly twice - once to return the value from the function, and again to assign it to the local copy in main. Some compilers will optimize out the second copy, but in general you can assume at least one copy will happen.
However, you could still use the second approach for clarity even if the data in the object is large without sacrificing performance with the proper use of smart pointers. Check out the suite of smart pointer classes in boost. This way the internal data is only allocated once and never copied, even when the outer object is.
The way to avoid any copying is to provide a special constructor. If you
can re-write your code so it looks like:
LargeObj getObjData()
{
return LargeObj( fillsomehow() );
}
If fillsomehow() returns the data (perhaps a "big string" then have a constructor that takes a "big string". If you have such a constructor, then the compiler will very likelt construct a single object and not make any copies at all to perform the return. Of course, whether this is userful in real life depends on your particular problem.
A somewhat idiomatic solution would be:
std::auto_ptr<LargeObj> getObjData()
{
std::auto_ptr<LargeObj> a(new LargeObj);
a->fillWithData();
return a;
}
int main()
{
std::auto_ptr<LargeObj> a(getObjData());
}
Alternatively, you can avoid this issue all together by letting the object get its own data, i. e. by making getObjData() a member function of LargeObj. Depending on what you are actually doing, this may be a good way to go.
Depending on how large the object really is and how often the operation happens, don't get too bogged down in efficiency when it will have no discernible effect either way. Optimization at the expense of clean, readable code should only happen when it is determined to be necessary.
The chances are that some cycles will be wasted when you return by copy. Whether it's worth worrying about depends on how large the object really is, and how often you invoke this code.
But I'd like to point out that if LargeObj is a large and non-trivial class, then in any case its empty constructor should be initializing it to a known state:
LargeObj::LargeObj() :
m_member1(),
m_member2(),
...
{}
That wastes a few cycles too. Re-writing the code as
LargeObj::LargeObj()
{
// (The body of fillWithData should ideally be re-written into
// the initializer list...)
fillWithData() ;
}
int main()
{
LargeObj a ;
}
would probably be a win-win for you: you'd have the LargeObj instances getting initialized into known and useful states, and you'd have fewer wasted cycles.
If you don't always want to use fillWithData() in the constructor, you could pass a flag into the constructor as an argument.
UPDATE (from your edit & comment) : Semantically, if it's worthwhile to create a typedef for LargeObj -- i.e., to give it a name, rather than referencing it simply as typedef std::vector<HugeObj> -- then you're already on the road to giving it its own behavioral semantics. You could, for example, define it as
class LargeObj : public std::vector<HugeObj> {
// constructor that fills the object with data
LargeObj() ;
// ... other standard methods ...
};
Only you can determine if this is appropriate for your app. My point is that even though LargeObj is "mostly" a container, you can still give it class behavior if doing so works for your application.
Your first snippet is especially useful when you do things like have getObjData() implemented in one DLL, call it from another DLL, and the two DLLs are implemented in different languages or different versions of the compiler for the same language. The reason is because when they are compiled in different compilers they often use different heaps. You must allocate and deallocate memory from within the same heap, else you will corrupt memory. </windows>
But if you don't do something like that, I would normally simply return a pointer (or smart pointer) to memory your function allocates:
LargeObj* getObjData()
{
LargeObj* ret = new LargeObj;
ret->fillWithData() ;
return ret;
}
...unless I have a specific reason not to.
In a project I maintain, I see a lot of code like this for simple get/set methods
const int & MyClass::getFoo() { return m_foo; }
void MyClass::setFoo(const int & foo) { m_foo = foo; }
What is the point in doing that instead of the following?
int MyClass::getFoo() { return m_foo; } // Removed 'const' and '&'
void MyClass::setFoo(const int foo) { m_foo = foo; } // Removed '&'
Passing a reference to a primitive type should require the same (or more) effort as passing the type's value itself, right?
It's just a number after all...
Is this just some attempted micro-optimization or is there a true benefit?
The difference is that if you get that result into a reference yourself you can track the changes of the integer member variable in your own variable name without recalling the function.
const &int x = myObject.getFoo();
cout<<x<<endl;
//...
cout<<x<<endl;//x might have changed
It's probably not the best design choice, and it's very dangerous to return a reference (const or not), in case a variable that gets freed from scope is returned. So if you return a reference, be careful to be sure it is not a variable that goes out of scope.
There is a slight difference for the modifier too, but again probably not something that is worth doing or that was intended.
void test1(int x)
{
cout<<x<<endl;//prints 1
}
void test2(const int &x)
{
cout<<x<<endl;//prints 1 or something else possibly, another thread could have changed x
}
int main(int argc, char**argv)
{
int x = 1;
test1(x);
//...
test2(x);
return 0;
}
So the end result is that you obtain changes even after the parameters are passed.
To me, passing a const reference for primitives is a mistake. Either you need to modify the value, and in that case you pass a non-const reference, or you just need to access the value and in that case you pass a const.
Const references should only be used for complex classes, when copying objects could be a performance problem. In the case of primitives, unless you need to modify the value of the variable you shouldn't pass a reference. The reason is that references take more computation time than non-references, since with references, the program needs to look up in a table to find the address of the object. When this look-up time is shorter than the copying time, references are an improvement.
Generally, ints and addresses have the same byte length in low-level implementations. So the time of copying an int as a return value for a function is equivalent to the time of copying an address. But in the case where an int is returned, no look up is performed, therefore performance is increased.
The main difference between returning a value and returning a const reference is that you then can const_cast that reference and alter the value.
It's an example of bad design and an attempt to create a smart design where easy and concise design would be more than enough. Instead of just returning a value the author makes readers of code think what intention he might have had.
There is not much benefit. I have seen this in framework or macro generated getters and setters before. The macro code did not distinguish between primitive and non-POD types and just used const type& across the board for setters. I doubt that it is an efficiency issue or a genuine misunderstanding; chances are this is a consistency issue.
I think this type of code is written who have misunderstood the concept of references and use it for everything including primitive data types. I've also seen some code like this and can't see any benefit of doing this.
There is no point and benefit except
void MyClass::setFoo(const int foo)
void MyClass::setFoo(const int& foo)
as then you won't be able to reuse 'foo' variable inside 'setFoo' implementation. And I believe that 'int&' is just because Guy just get used to pass all things by const reference and there is nothing wrong with that.
Why would one use func( const Class &value ) rather than just func( Class value )? Surely modern compilers will do the most efficient thing using either syntax. Is this still necessary or just a hold over from the days of non-optimizing compilers?
Just to add, gcc will produce similar assembler code output for either syntax. Perhaps other compilers do not?
Apparently, this is just not the case. I had the impression from some code long ago that gcc did this, but experimentation proves this wrong. Credit is due to to Michael Burr, whose answer to a similar question would be nominated if given here.
There are 2 large semantic differences between the 2 signatures.
The first is the use of & in the type name. This signals the value is passed by reference. Removing this causes the object to be passed by value which will essentially pass a copy of the object into the function (via the copy constructor). For operations which simply need to read data (typical for a const &) doing a full copy of the object creates unnecssary overhead. For classes which are not small or are collections, this overhead is not trivial.
The second is the use of const. This prevents the function from accidentally modifying the contents of value via the value reference. It allows the caller some measure of assurance the value will not be mutated by the function. Yes passing a copy gives the caller a much deeper assurance of this in many cases.
The first form doesn't create a copy of the object, it just passes a reference (pointer) to the existing copy. The second form creates a copy, which can be expensive. This isn't something that is optimized away: there are semantic differences between having a copy of an object vs. having the original, and copying requires a call to the class's copy constructor.
For very small classes (like <16 bytes) with no copy constructor it is probably more efficient to use the value syntax rather than pass references. This is why you see void foo(double bar) and not void foo(const double &var). But in the interests of not micro-optimizing code that doesn't matter, as a general rule you should pass all real-deal objects by reference and only pass built-in types like int and void * by value.
There is a huge difference which nobody has mentioned yet: object slicing. In some cases, you may need const& (or &) to get correct behavior.
Consider another class Derived which inherits from Class. In client code, you create an instance of Derived which you pass to func(). If you have func(const Class&), that same instance will get passed. As others have said, func(Class) will make a copy, you will have a new (temporary) instance of Class (not Derived) in func.
This difference in behavior (not performance) can be important if func in turn does a downcast. Compare the results of running the following code:
#include <typeinfo.h>
struct Class
{
virtual void Foo() {};
};
class Derived : public Class {};
void f(const Class& value)
{
printf("f()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
void g(Class value)
{
printf("g()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
int _tmain(int argc, _TCHAR* argv[])
{
Derived d;
f(d);
g(d);
return 0;
}
Surely modern compilers will do the
most efficient thing using either
syntax
The compiler doesn't compile what you "mean", it compiles what you tell it to. Compilers are only smart for lower level optimizations and problems the programmer overlooks (such as computation inside a for loop, dead code etc).
What you tell the compiler to do in the second example, is to make a copy of the class - which it will do without thinking - even if you didn't use it, that's what you asked the compiler to do.
The second example explicitly asks the compiler to use the same variable - conserving space and precious cycles (no copy is needed). The const is there for mistakes - since Class &value can be written to (sometimes it's desired).
Here are the differences between some parameter declarations:
copied out modifiable
func(Class value) Y N Y
func(const Class value) Y N N
func(Class &value) N Y Y
func(const Class &value) N N N
where:
copied: a copy of the input parameter is made when the function is called
out: value is an "out" parameter, which means modifications made within func() will be visible outside the function after it returns
modifiable: value can be modified within func()
So the differences between func(Class value) and func(const Class &value) are:
The first one makes a copy of the input parameter (by calling the Class copy constructor), and allows code inside func() to modify value
The second one does not make a copy, and does not allow code inside func() to modify value
If you use the former, and then try to change value, by accident, the compiler will give you an error.
If you use the latter, and then try to change value, it won't.
Thus the former makes it easier to catch mistakes.
The first example is pass by reference. Rather than pass the type, C++ will pass a reference to the object (generally, references are implemented with pointers... So it's likely an object of size 4 bytes)... In the second example, the object is passed by value... if it is a big, complex object then likely it's a fairly heavyweight operation as it involves copy construction of a new "Class".
The reason that an optimizing compiler can't handle this for you is the issue of separate compilation. In C++, when the compiler is generating code for a caller, it may not have access to the code of the function itself. The most common calling convention that I know of usually has the caller invoke the copy-constructor which means it's not possible for the compilation of the function itself to prevent the copy constructor if it's not necessary.
The only time that passing a parameter by value is preferable is when you are going to copy the parameter anyway.
std::string toUpper( const std::string &value ) {
std::string retVal(value);
transform(retVal.begin(), retVal.end(), charToUpper());
return retVal;
}
Or
std::string toUpper( std::string value ) {
transform(value.begin(), value.end(), charToUpper());
return value;
}
In this case the second example is the same speed as the first if the value parameter is a regular object, but faster if the value parameter is a R-Value.
Although most compilers will do this optimisation already I don't expect to rely on this feature till C++0X, esp since I expect it could confuse most programmers who would probably change it back.
See Want Speed? Pass by Value. for a better explaination than I could give.