I come from a java background but am now working on large C++ code bases. I often see this pattern:
void function(int value, int& result);
And above method is called like so:
int result = 0;
function(42, result);
std::cout << "Result is " << result << std::endl;
In java, the following would be more common:
int result = function(42);
Although the above is perfectly possible in C++, how come the former appears more common (in the codebase I'm working on at least)? Is it stylistic or something more?
First, this used to be an established technique to have more than one output of a function. E.g. in this signature,
int computeNumberButMightFail(int& error_code);
you would have both the payload int as the return value, and a reference to some error variable that is set from within the function to signal an error. It is clear these days that there are better techniques, e.g. std::optional<T> is a good return value, there might be a more flexible std::expected<T, ...>, and with newer C++ standards, we can return multiple values with std::make_tuple and destructure them at the call side with structured bindings. For exceptional error scenarios, the usual approach is to use... well... exceptions.
Second, this is an optimization technique from the days in which (N)RVO wasn't widely available: if the output of a function is an object that is expensive to copy, you wanted to make sure no unnecessary copies are made:
void fillThisHugeBuffer(std::vector<LargeType>& output);
means we pass a reference to the data in order to avoid an unnessecary copy when returning it by value. However, this is outdated, too, and returning large objects by value is usually considered the more idiomatic approach, because C++17 guarantees something called materialization of temporaries and (name) return value optimization is implemented by all major compilers.
See also the core guidelines:
F.20 - "For “out” output values, prefer return values to output parameters".
as far as I know, this case is not common in C++ at least not with primitive data types as return values. There are a few cases to consider:
If you working with plain C or in a very restricted context, where C++ exceptions are not allowed (like realtime applications). Then the return value of a function is often used to indicate the success of a function. An in C could be:
#include <stdio.h>
#include <errno.h>
int func(int arg, int* res) {
if(arg > 10) {
return EINVAL; //this is an error code from errnoe
}
... //do stuff
*res = my_result;
}
This is sometimes used in C++ as well and so the result must by assigned by reference/pointer.
When your result is struct or object which exists before the call of your function and the purpose of your function is to modify attributes inside the struct or object. This is a common pattern because you have to pass the argument by reference (to avoid a copy) anyway. So it is not necessary to return the same object as you pass to the function. An example in C++ could be:
#include <iostream>
struct Point {
int x = 0;
int y = 0;
};
void fill_point(Point& p, int x, int y) {
p.x = x;
p.y = y;
}
int main() {
Point p();
fill_point(p);
return EXIT_SUCCESS;
}
However, this is a trivial and there are better solutions like defining the fill-function as a method in the object. But sometimes with regard to the single-responsible paradigm of objects this pattern is common under more complex circumstances.
In Java you can't control your heap. Every object you define is on the heap and automatically passed by reference to a function. In C++ you have the choice where you wan't your object stored (heap or stack) and how to pass the object to a function. It is important to keep in mind that a pass by value of an object copies it and returning an object from a function by value also copies the object. For returning an object by reference you have to ensure that its lifecycle exceeds the scope of your function by placing it on the heap or by passing it to the function by reference.
Modifiable parameters that receive values as a side effect of a function call are called out parameters. They are generally accepted as a bit archaic, and have fallen somewhat out of fashion as better techniques are available in C++. As you suggested, returning computed values from functions is the ideal.
But real-world constraints sometimes drive people toward out parameters:
returning objects by value is too expensive due to the cost of copying large objects or those with non-trivial copy constructors
returning multiple values, and creating a tuple or struct to contain them is awkward, expensive, or not possible.
When objects cannot be copied (possible private or deleted copy constructor) but must be created "in place"
Most of these issues face legacy code, because C++11 gained "move semantics" and C++17 gained "guaranteed copy elision" which obviate most of these cases.
In any new code, it's usually considered bad style or a code smell to use out parameters, and most likely an acquired habit that carried over from the past (when this was a more relevant technique.) It's not wrong, but one of those things we try to avoid if it's not strictly necessary.
There are several reasons why an out parameter might be used in a C++ codebase.
For example:
You have multiple outputs:
void compute(int a, int b, int &x, int &y) { x=a+b; y=a-b; }
You need the return value for something else: For example, in PEG parsing you might find something like this:
if (parseSymbol(pos,symbolName) && parseToken(pos,"=") && parseExpression(pos,exprNode)) {...}
where the parse functions look like
bool parseSymbol(int &pos, string &symbolName);
bool parseToken(int &pos, const char *token);
and so on.
To avoid object copies.
Programmer didn't knew better.
But basically I think, any answer is opinion based, because it's matter of style and coding policies if and how out-parameters are used or not.
Related
This question already has answers here:
Why should I use reference variables at all? [closed]
(8 answers)
Closed 5 years ago.
I am brand new to C++. We have recently begun exploring reference variables in class, and I am very confused about them. Not necessarily how to do them, as I understand that they switch variable values, but more along the lines of WHY a developer would want to do such a thing? What do they accomplish? Do they save memory? Do they avoid having to return information?
Here is part of the project we are working on. We need to include at least one reference variable. I can see how I would write the program without the reference variable, but I don't see where a reference variable would be useful or necessary.
"The user may wish to get an estimate for one to many rooms. The rates are based on the square footage of the walls and/or ceiling. The company estimates that it takes 2.5 hours to paint 200 SF of wall space and 3.2 hours to paint the same area on a ceiling. The labor rate is $40 per hour. If the job for painting WALLS totals more than 1400 SF of space, then the customer receives a 15% discount for all square footage above 1400 square feet. There is no discount for painting ceilings.
The program shall print out a final report of the estimated costs in a professional format.
The program shall ask the user if they want to make more calculations before exiting."
I'm not looking for you guys to do my homework for me, and for reference, we have only just finished with learning functions. I'm pretty good, but there are a LOT of things reading through these sites that I do not understand.
And, essentially, studentID would be set to 21654. Am I understanding this correctly?
Let us try this again:
I have reviewed this suggested duplication. While it does cover the basics of the pros/cons of using reference variables instead of pointers and discusses multitudes of reasons for using both, I am still questioning the basic idea of when (when is is appropriate vs. not necessary) and why (why is appropriate in certain circumstances, what advantages does it give to the program?)
I should use such variables as well as how (the actual syntax and placement). Almost everyone here has been great, and I have learned so much on the subject through my interactions with you. Even as much of this is repetitive and irritating to seasoned coders, it is all new to me, and I needed to be involved in the conversation as much as I needed the information. I have used Stack Overflow for many projects, learning about Java's newString.equalsIgnoreCase(), for instance, and I admire your knowledge. I can only tell you the truth, if that is not good enough then it is what it is.
Alright, let me review my understanding so far:
Reference variables tend to cut down on unwanted modification of variables within a function and/or program.
Reference variables are used to modify existing variables within functions
This is useful as it "moves" values around while minimizing copying of those values.
Reference variables modify existing variables within functions/programs
I don't know if you guys can still read this or not since it has been flagged a duplicate. I've been playing with a few of the mini-programs you guys have given me, re-read portions of my book, done further research, etc., and I think I understand on a rudimentary level. These reference variables allow you to alter and/or use other variables within your code without pulling them directly into your code. I can't remember which user was using the foo(hubble, bubble) example, but it was his/her code that finally made it click. Instead of just using the value, you are actually using and/or reassigning the variable.
A reference variable is nothing but an alias name of the variable. You would use it when you wanted to just pass the value around instead of copying the same variable into memory at a different location. So, using reference, copy can be avoidable which saves the memory.
According to Bjarne Stroustrup's FAQ:
C++ inherited pointers from C, so I couldn't remove them without
causing serious compatibility problems. References are useful for
several things, but the direct reason I introduced them in C++ was to
support operator overloading. For example:
void f1(const complex* x, const complex* y) // without references
{
complex z = *x+*y; // ugly
// ...
}
void f2(const complex& x, const complex& y) // with references
{
complex z = x+y; // better
// ...
}
More generally, if you want to have both the functionality of pointers
and the functionality of references, you need either two different
types (as in C++) or two different sets of operations on a single
type. For example, with a single type you need both an operation to
assign to the object referred to and an operation to assign to the
reference/pointer. This can be done using separate operators (as in
Simula). For example:
Ref<My_type> r :- new My_type;
r := 7; // assign to object
r :- new My_type; // assign to reference
Alternatively, you could rely on type checking (overloading). For
example:
Ref<My_type> r = new My_type;
r = 7; // assign to object
r = new My_type; // assign to reference
Also, read this Stack Overflow question about the differences between a pointer variable and a reference variable.
I will give three reasons, but there are many more.
Avoiding unnecessary copies.
Suppose you write a function like so:
double price(std::vector<Room> rooms)
{
...
}
Now, every time you call it, the vector of Room will be copied. If you only compute the prices of a few rooms that's fine, but if you want to compute the cost of repainting the entirety of the offices of the Empire State Building, you will start to copy huge objects, and this takes time.
It is better in this case to use a constant reference that provides read-only access to the data:
double price(const std::vector<Room>& rooms) { ... }
Using polymorphism
Suppose you now have different types of rooms, perhaps a CubicRoom and a CylindricalRoom, that both inherit from the same base class, Room.
It is not possible to write:
double price(Room room) { ... }
and then call
price(CylindricalRoom());
//or
price(CubicRoom());
but you can if you define price as follows:
double price(Room& room);
Everything then works the same as if you passed by value.
Avoiding returns
Suppose that each time you compute a price, you want to add a formatted quote to a report. In C++ you can only return a single object from a function, so you can not write:
return price, fmtQuote
However, you can do:
double price(Room room, std::vector<std::string>& quotes)
{
...
quotes.push_back(fmtQuote);
return price
}
Obviously, you could return a pair of objects std::pair<double, std::string>, but this means that the caller has to unpack the result. If you intend to call often the above function, this will quickly become ugly. In this case, this ties in to the first point: the log of all quotes will grow, and you do not want to copy it for each call.
This is a typical access pattern for shared resources: you want a few functions/objects to get a handle on a resource, not a copy of that resource.
You're mixing up two completely separate things here. Three examples to show how the two things work, individually and then together...
A function can take a parameter passed by value, and return a value.
double foo (double y)
{
y = y + 200.0;
return y;
}
void main(void)
{
double hubble = 50.0;
double bubble = 100.0;
hubble = foo(bubble);
std::cout << "hubble=" << hubble << ", bubble=" << bubble << std::endl;
}
Note that because this is passed by value, even though foo() changes y, bubble does not change. hubble is set to the value returned by foo().
Then you get
hubble=300, bubble=100
A function can take a parameter passed by reference, and modify that parameter.
void foo (double& y)
{
y = y + 200.0;
}
void main(void)
{
double hubble = 50.0;
double bubble = 100.0;
foo(bubble);
std::cout << "hubble=" << hubble << ", bubble=" << bubble << std::endl;
}
Then you get
hubble=50, bubble=300
Of course hubble hasn't changed. But because bubble was passed by reference, the change to y inside foo() changes bubble, because that change is happening on the actual variable passed and not on a copied value.
Note that you do not have a "return" statement here. The function does not return anything - it simply modifies the variable which is passed to it.
And of course you can use both together.
double foo (double& y)
{
y = y + 200.0;
return y + 400.0;
}
void main(void)
{
double hubble = 50.0;
double bubble = 100.0;
hubble = foo(bubble);
std::cout << "hubble=" << hubble << ", bubble=" << bubble << std::endl;
}
Then you get
hubble=700, bubble=300
As before, changing y inside foo() changes bubble. But now the function is returning a value as well, which sets hubble.
Why would you choose to return a value, or to modify the value passed in, or to do both? That entirely depends on how you write your code.
I agree with you that you don't have to use a pass-by-reference here. Myself, I'd probably just return a value. But this is a learning exercise, and you've been told to do it that way, so you've got to. Suppose your pass-by-reference is the discount? So a function "void discount(double& value)" takes the value passed and multiplies it by 0.85. It's a bit artificial, but it would demonstrate the principle.
Reference variables are a safer alternative to pointers. Usually, when dealing with pointers you don't really care about the pointer (ptr) so much as what it points to (*ptr); and yet, all the time programmers screw up and manipulate ptr instead of *ptr and so on. Consider this code:
void zeroize_by_pointer(int* p)
{
p = 0; // this compiles, but doesn't do what you want
}
Compare to the reference version,
void zeroize_by_reference(int& p)
{
p = 0; // works fine
}
There are many other reasons why references are a good idea, but for someone starting out in C++ I'd suggest focusing on this one: it makes it slightly harder to shoot yourself in the foot. Whenever you deal with pointers you're going to be dealing on some level with the machine's memory model, and that's a good thing to avoid when possible.
There is another, more general advantage of references that pointers do not provide. References by their very nature allow you to express through the function signature that the object referred to must exist at the time the function is called No nulls allowed.
The caller cannot reasonably expect a function that takes a reference to check the validity of that reference..
Pointers, on the other hand, may validly be null. If I write a function that accepts a pointer...
void increment(int* val)
{
(*val)++;
}
...and the caller supplies null, my program is probably going to crash. I can write all the documentation I want stating that the pointer must not be null but the fact is it's pretty easy for someone to pass it in accidentally. So if I want to be safe, I must check for it.
But write this function with a reference and the intent is clear. No nulls allowed.
References were introduced primarily to support operator overloading. Using pointers for "passing via reference" would give you unacceptable syntax according to Bjarne Stroustrup. They also allow aliasing.
In addition, they allow object-oriented programming with a nicer syntax than using pointer explicitly. If you are using classes you must pass references to avoid object slicing.
In summary, you should always prefer using references over bare pointers.
You could almost always use reference variables (instead of ever passing by value): for example ...
// this function creates an estimate
// input parameter is the Rooms to be painted
// passed as a const reference because this function doesn't modify the rooms
// return value is the estimated monetary cost
Money createEstimate(const Rooms& rooms)
{
...
}
// this function adds paint to the rooms
// input parameter is the Rooms to be painted
// passed as a non-const reference because this function modifies the rooms
void paintRooms(Rooms& rooms)
{
...
}
When you pass-by-value instead of pass-by-reference then you implicitly create and pass a copy of the thing ...
// creates and passes a copy of the Rooms to the createEstimate function
Money createEstimate(Rooms rooms)
{
...
}
... which (creating a copy) is (often, slightly) slower than passing by reference (furthermore, creating a copy may have side-effects).
As a possible slight performance optimization, and by convention (because people don't care), it's common to pass-by-value instead of pass-be-reference when the type is small and simple (a.k.a. a "primitive" type), for example:
// passes a copy of the x and y values
// returns the sum
int add(int x, int y)
{
...
}
... instead of ...
// passes a reference to x and y
// returns the sum
int add(const int& x, const int& y)
{
...
}
See also Passing a modifiable parameter to c++ function as well as Why have pointer parameters?
There are also different kinds of references. We have lvalue and rvalue references, designated by & and &&, respectively. Generally, a reference tells us something about the lifetime of the object it references, a pointer does not. Compare
void foo(int* i);
void foo(int& i);
void foo(int&& i);
In the first case, i might point to an object we can assign to, but more importantly, it may also be a nullptr or point to one-past-the-end of an array. Thus, dereferencing it may lead to undefined behaviour. Checking for a nullptr is easy enough, the other check is not.
The the second case and third case, i must always reference an valid int we can assign too.
The difference between rvalue and lvalue references is that rvalue/&& references convey the meaning that the referenced value is not needed by anyone else and as such, allows for optimizations. Read up on std::move and move constructors to see what I mean.
To summarize: references tell us something about the object's lifetime. Sure, this could be stated in the documentation, but with pointers, violations of that contract might be hard to catch. References enforce the contract (to a high degree) at compile time and as such provide documentation to the code implicitly. This allows for some quick, uncomplicated optimizations by using e.g. move constructors or perfect forwarding in some cases.
Reference arguments are more used when you pass an object as argument. That way you don't copy the whole variable; usually they come with a const modifier like:
void printDescription(const Person& person) { ... }
That way you don't copy the object.
Sometime the return type is also set as a reference. That way you are returning the same object (and not a copy of it). Have a look at the << operator of ostream. ostream& operator<< (streambuf* sb );.
With variables you can think about the case where you can swap values.
void swap(int & a, int & b) {
int aux = a;
int a = b;
int b = aux;
}
This case in Java, for example, has to be done in a more complex way.
Reference variables are pointers without a * and practically without pointer arithmetics.
They are not needed from the C++, they are only syntactic sugar around them.
The initial idea of the creators was probably to make C++ code better comprehensible, although they reached its exact opposite.
My opinion is that a C++ program is better if it entirely misses reference variables and it uses only pointers.
Your function in the form
double foo (double studentID* y)
{
*y = 21654;
return *y;
}
...would do exactly the same, but it would be actually better comprehensible.
I need a once-and-for-all clarification on passing by value/pointer/reference.
If I have a variable such as
int SomeInt = 10;
And I want to pass it to a function like
void DoSomething(int Integer)
{
Integer = 1;
}
In my current scenario when passing SomeInt to DoSomething() I want SomeInt's value to be updated based on whatever we do to it inside of DoSomething() as well as be most efficient on memory and performance so I'm not copying the variable around?. That being said which of the following prototypes would accomplish this task?
void DoSomething(int* Integer);
void DoSomething(int& Integer);
How would I actually pass the variable into the function? What is the difference between the previous two prototypes?
Finally if using a function within a class
class SomeClass
{
int MyInteger;
public:
void ChangeValue(int& NewValue)
{
MyInteger = NewValue;
}
};
If I pass an integer into ChangeValue, when the integer I passed in get's deleted will that mean when I try to use MyInteger from within the class it will no longer be useable?
Thank you all for your time, I know this is kind of a basic question but the explanations I keep running into confuse me further.
Functionally, all three of these work:
pass an int and change the return type to int so you can return the new value, usage: x = f(x);
when you plan to set the value without needing to read the initial value, it's much better to use a function like int DoSomething(); so the caller can just say int x = f(); without having to create x on an earlier line and wondering/worrying whether it needs to be initialised to anything before the call.
pass an int& and set it inside the function, usage: int x; x = ? /* if an input */; f(x);
pass an int* and set the pointed-to int inside the function, usage: int x; x = ?; f(&x);
most efficient on memory and performance so I'm not copying the variable around
Given the C++ Standard doesn't dictate how references should be implemented by the compiler, it's a bit dubious trying to reason about their characteristics - if you care compile your code to assembly or machine code and see how it works out on your particular compiler (for specific compiler commandline options etc.). If you need a rule of thumb, assume that references have identical performance characteristics to pointers unless profiling or generated-code inspection suggests otherwise.
For an int you can expect the first version above to be no slower than the pointer version, and possibly be faster, because the int parameter can be passed and returned in a register without ever needing a memory address.
If/when/where the by-pointer version is inlined there's more chance that the potentially slow "needing a memory address so we can pass a pointer" / "having to dereference a pointer to access/update the value" aspect of the pass-by-pointer version can be optimised out (if you've asked the compiler to try), leaving both versions with identical performance....
Still, if you need to ask a question like this I can't imagine you're writing code where these are the important optimisation choices, so a better aim is to do what gives you the cleanest, most intuitive and robust usage for the client code... now - whether that's x = f(x); (where you might forget the leading x =), or f(x) where you might not realise x could be modified, or f(&x) (where some caller might think they can pass nullptr is a reasonable question in its own right, but separate from your performance concerns. FWIW, the C++ FAQ Lite recommends references over pointers for this kind of situation, but I personally reject its reasoning and conclusions - it all boils down to familiarity with either convention, and how often you need to pass const pointer values, or pointer values where nullptr is a valid sentinel, that could be confused with the you-may-modify-me implication hoped for in your scenario... that depends a lot on your coding style, libraries you use, problem domain etc..
Both of your examples
void DoSomething(int* Integer);
void DoSomething(int& Integer);
will accomplish the task. In the first case - with pointer - you need to call the function with DoSomething(&SomeInt);, in the second case - with reference - simpler as DoSomething(SomeInt);
The recommended way is to use references whenever they are sufficient, and pointers only if they are necessary.
You can use either. Function call for first prototype would be
DoSomething(&SomeInt);
and for second prototype
DoSomething(SomeInt);
As was already said before, you can use both. The advantage of the
void DoSomething(int* Integer)
{
*Integer=0xDEADBEEF;
}
DoSomething(&myvariable);
pattern is that it becomes obvious from the call that myvariable is subject to change.
The advantage of the
void DoSomething(int& Integer)
{
Integer=0xDEADBEEF;
}
DoSomething(myvariable);
pattern is that the code in DoSomething is a bit cleaner, DoSomething has a harder time to mess with memory in bad ways and that you might get better code out of it. Disadvantage is that it isn't immediately obvious from reading the call that myvariable might get changed.
Consider the sample application below. It demonstrates what I would call a flawed class design.
#include <iostream>
using namespace std;
struct B
{
B() : m_value(1) {}
long m_value;
};
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// assert(this != &b);
m_B.m_value += b.m_value;
m_B.m_value += b.m_value;
}
protected:
B m_B;
};
int main(int argc, char* argv[])
{
A a;
cout << "Original value: " << a.GetB().m_value << endl;
cout << "Expected value: 3" << endl;
a.Foo(a.GetB());
cout << "Actual value: " << a.GetB().m_value << endl;
return 0;
}
Output:
Original value: 1
Expected value: 3
Actual value: 4
Obviously, the programmer is fooled by the constness of b. By mistake b points to this, which yields the undesired behavior.
My question: What const-rules should you follow when designing getters/setters?
My suggestion: Never return a reference to a member variable if it can be set by reference through a member function. Hence, either return by value or pass parameters by value. (Modern compilers will optimize away the extra copy anyway.)
Obviously, the programmer is fooled by the constness of b
As someone once said, You keep using that word. I do not think it means what you think it means.
Const means that you cannot change the value. It does not mean that the value cannot change.
If the programmer is fooled by the fact that some other code else can change something that they cannot, they need a better grounding in aliasing.
If the programmer is fooled by the fact that the token 'const' sounds a bit like 'constant' but means 'read only', they need a better grounding in the semantics of the programming language they are using.
So if you have a getter which returns a const reference, then it is an alias for an object you don't have the permission to change. That says nothing about whether its value is immutable.
Ultimately, this comes down to a lack of encapsulation, and not applying the Law of Demeter. In general, don't mutate the state of other objects. Send them a message to ask them to perform an operation, which may (depending on their own implementation details) mutate their state.
If you make B.m_value private, then you can't write the Foo you have. You either make Foo into:
void Foo(const B &b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
void B::increment_by (const B& b)
{
// assert ( this != &b ) if you like
m_value += b.m_value;
}
or, if you want to ensure that the value is constant, use a temporary
void Foo(B b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
Now, incrementing a value by itself may or may not be reasonable, and is easily tested for within B::increment_by. You could also test whether &m_b==&b in A::Foo, though once you have a couple of levels of objects and objects with references to other objects rather than values (so &a1.b.c == &a2.b.c does not imply that &a1.b==&a2.b or &a1==&a2), then you really have to just be aware that any operation is potentially aliased.
Aliasing means that incrementing by an expression twice is not the same as incrementing by the value of the expression the first time you evaluated it; there's no real way around it, and in most systems the cost of copying the data isn't worth the risk of avoiding the alias.
Passing in arguments which have the least structure also works well. If Foo() took a long rather than an object which it has to get a long from, then it would not suffer aliasing, and you wouldn't need to write a different Foo() to increment m_b by the value of a C.
I propose a slightly different solution to this that has several advantages (especially in an every increasing, multi-threaded world). Its a simple idea to follow, and that is to "commit" your changes last.
To explain via your example you would simply change the 'A' class to:
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// copy out what we are going to change;
int itm_value = m_b.m_value;
// perform operations on the copy, not our internal value
itm_value += b.m_value;
itm_value += b.m_value;
// copy over final results
m_B.m_value = itm_value ;
}
protected:
B m_B;
};
The idea here is to place all assignment to memory viewable above the current function at the end, where they pretty much can't fail. This way, if an error is thrown (say there was a divide in the middle of those 2 operations, and if it just happens to be 0) in the middle of the operation, then we aren't left with half baked data in the middle.
Furthermore, in a multi-threading situation, you can do all of the operation, and then just check at the end if anything has changed before your "commit" (an optimistic approach, which will usually pass and usually yield much better results than locking the structure for the entire operation), if it has changed, you simply discard the values and try again (or return a value saying it has failed if there is something it can do instead).
On top of this, the compiler can usually optimise this better, because it is no longer required to write the variables being modified to memory (we are only forcing one read of the value to be changed and one write). This way, the compiler has the option of just keeping the relevant data in a register, saves L1 cache access if not cache misses. Otherwise the compiler will probably make it write to the memory as it doesn't know what aliasing might be taking place (so it can't ensure those values stay the same, if they are all local, it knows it can't be aliasing because the current function is the only one that knows about it).
There's a lot of different things that can happen with the original code posted. I wouldn't be surprised if some compilers (with optimizations enabled) will actually produce code that produces the "expected" result, whereas others won't. All of this is simply because the point at which variables, that aren't 'volatile', are actually written/read from memory isn't well defined within the c++ standards.
The real problem here is atomicity. The precondition of the Foo function is that it's argument doesn't change while in use.
If e.g. Foo had been specified with a value-argument i.s.o. reference argument, no problem would have shown.
Frankly, A::Foo() rubs me the wrong way more than your original problem. Anyhow I look at it, it must be B::Foo(). And inside B::Foo() check for this wouldn't be that outlandish.
Otherwise I do not see how one can specify a generic rule to cover that case. And keep teammates sane.
From past experience, I would treat that as a plain bug and would differentiate two cases: (1) B is small and (2) B is large. If B is small, then simply make A::getB() to return a copy. If B is large, then you have no choice but to handle the case that objects of B might be both rvalue and lvalue in the same expression.
If you have such problems constantly, I'd say simpler rule would be to always return a copy of an object instead of a reference. Because quite often, if object is large, then you have to handle it differently from the rest anyway.
My stupid answer, I leave it here just in case someone else comes up with the same bad idea:
The problem is I think that the object referred to is not const (B const & vs const B &), only the reference is const in your code.
Compare the following two pieces of code, the first using a reference to a large object, and the second has the large object as the return value. The emphasis on a "large object" refers to the fact that repeated copies of the object, unnecessarily, is wasted cycles.
Using a reference to a large object:
void getObjData( LargeObj& a )
{
a.reset() ;
a.fillWithData() ;
}
int main()
{
LargeObj a ;
getObjData( a ) ;
}
Using the large object as a return value:
LargeObj getObjData()
{
LargeObj a ;
a.fillWithData() ;
return a ;
}
int main()
{
LargeObj a = getObjData() ;
}
The first snippet of code does not require copying the large object.
In the second snippet, the object is created inside the function, and so in general, a copy is needed when returning the object. In this case, however, in main() the object is being declared. Will the compiler first create a default-constructed object, then copy the object returned by getObjData(), or will it be as efficient as the first snippet?
I think the second snippet is easier to read but I am afraid it is less efficient.
Edit: Typically, I am thinking of cases LargeObj to be generic container classes that, for the sake of argument, contains thousands of objects inside of them. For example,
typedef std::vector<HugeObj> LargeObj ;
so directly modifying/adding methods to LargeObj isn't a directly accessible solution.
The second approach is more idiomatic, and expressive. It is clear when reading the code that the function has no preconditions on the argument (it does not have an argument) and that it will actually create an object inside. The first approach is not so clear for the casual reader. The call implies that the object will be changed (pass by reference) but it is not so clear if there are any preconditions on the passed object.
About the copies. The code you posted is not using the assignment operator, but rather copy construction. The C++ defines the return value optimization that is implemented in all major compilers. If you are not sure you can run the following snippet in your compiler:
#include <iostream>
class X
{
public:
X() { std::cout << "X::X()" << std::endl; }
X( X const & ) { std::cout << "X::X( X const & )" << std::endl; }
X& operator=( X const & ) { std::cout << "X::operator=(X const &)" << std::endl; }
};
X f() {
X tmp;
return tmp;
}
int main() {
X x = f();
}
With g++ you will get a single line X::X(). The compiler reserves the space in the stack for the x object, then calls the function that constructs the tmp over x (in fact tmp is x. The operations inside f() are applied directly on x, being equivalent to your first code snippet (pass by reference).
If you were not using the copy constructor (had you written: X x; x = f();) then it would create both x and tmp and apply the assignment operator, yielding a three line output: X::X() / X::X() / X::operator=. So it could be a little less efficient in cases.
Use the second approach. It may seem that to be less efficient, but the C++ standard allows the copies to be evaded. This optimization is called Named Return Value Optimization and is implemented in most current compilers.
Yes in the second case it will make a copy of the object, possibly twice - once to return the value from the function, and again to assign it to the local copy in main. Some compilers will optimize out the second copy, but in general you can assume at least one copy will happen.
However, you could still use the second approach for clarity even if the data in the object is large without sacrificing performance with the proper use of smart pointers. Check out the suite of smart pointer classes in boost. This way the internal data is only allocated once and never copied, even when the outer object is.
The way to avoid any copying is to provide a special constructor. If you
can re-write your code so it looks like:
LargeObj getObjData()
{
return LargeObj( fillsomehow() );
}
If fillsomehow() returns the data (perhaps a "big string" then have a constructor that takes a "big string". If you have such a constructor, then the compiler will very likelt construct a single object and not make any copies at all to perform the return. Of course, whether this is userful in real life depends on your particular problem.
A somewhat idiomatic solution would be:
std::auto_ptr<LargeObj> getObjData()
{
std::auto_ptr<LargeObj> a(new LargeObj);
a->fillWithData();
return a;
}
int main()
{
std::auto_ptr<LargeObj> a(getObjData());
}
Alternatively, you can avoid this issue all together by letting the object get its own data, i. e. by making getObjData() a member function of LargeObj. Depending on what you are actually doing, this may be a good way to go.
Depending on how large the object really is and how often the operation happens, don't get too bogged down in efficiency when it will have no discernible effect either way. Optimization at the expense of clean, readable code should only happen when it is determined to be necessary.
The chances are that some cycles will be wasted when you return by copy. Whether it's worth worrying about depends on how large the object really is, and how often you invoke this code.
But I'd like to point out that if LargeObj is a large and non-trivial class, then in any case its empty constructor should be initializing it to a known state:
LargeObj::LargeObj() :
m_member1(),
m_member2(),
...
{}
That wastes a few cycles too. Re-writing the code as
LargeObj::LargeObj()
{
// (The body of fillWithData should ideally be re-written into
// the initializer list...)
fillWithData() ;
}
int main()
{
LargeObj a ;
}
would probably be a win-win for you: you'd have the LargeObj instances getting initialized into known and useful states, and you'd have fewer wasted cycles.
If you don't always want to use fillWithData() in the constructor, you could pass a flag into the constructor as an argument.
UPDATE (from your edit & comment) : Semantically, if it's worthwhile to create a typedef for LargeObj -- i.e., to give it a name, rather than referencing it simply as typedef std::vector<HugeObj> -- then you're already on the road to giving it its own behavioral semantics. You could, for example, define it as
class LargeObj : public std::vector<HugeObj> {
// constructor that fills the object with data
LargeObj() ;
// ... other standard methods ...
};
Only you can determine if this is appropriate for your app. My point is that even though LargeObj is "mostly" a container, you can still give it class behavior if doing so works for your application.
Your first snippet is especially useful when you do things like have getObjData() implemented in one DLL, call it from another DLL, and the two DLLs are implemented in different languages or different versions of the compiler for the same language. The reason is because when they are compiled in different compilers they often use different heaps. You must allocate and deallocate memory from within the same heap, else you will corrupt memory. </windows>
But if you don't do something like that, I would normally simply return a pointer (or smart pointer) to memory your function allocates:
LargeObj* getObjData()
{
LargeObj* ret = new LargeObj;
ret->fillWithData() ;
return ret;
}
...unless I have a specific reason not to.
In a project I maintain, I see a lot of code like this for simple get/set methods
const int & MyClass::getFoo() { return m_foo; }
void MyClass::setFoo(const int & foo) { m_foo = foo; }
What is the point in doing that instead of the following?
int MyClass::getFoo() { return m_foo; } // Removed 'const' and '&'
void MyClass::setFoo(const int foo) { m_foo = foo; } // Removed '&'
Passing a reference to a primitive type should require the same (or more) effort as passing the type's value itself, right?
It's just a number after all...
Is this just some attempted micro-optimization or is there a true benefit?
The difference is that if you get that result into a reference yourself you can track the changes of the integer member variable in your own variable name without recalling the function.
const &int x = myObject.getFoo();
cout<<x<<endl;
//...
cout<<x<<endl;//x might have changed
It's probably not the best design choice, and it's very dangerous to return a reference (const or not), in case a variable that gets freed from scope is returned. So if you return a reference, be careful to be sure it is not a variable that goes out of scope.
There is a slight difference for the modifier too, but again probably not something that is worth doing or that was intended.
void test1(int x)
{
cout<<x<<endl;//prints 1
}
void test2(const int &x)
{
cout<<x<<endl;//prints 1 or something else possibly, another thread could have changed x
}
int main(int argc, char**argv)
{
int x = 1;
test1(x);
//...
test2(x);
return 0;
}
So the end result is that you obtain changes even after the parameters are passed.
To me, passing a const reference for primitives is a mistake. Either you need to modify the value, and in that case you pass a non-const reference, or you just need to access the value and in that case you pass a const.
Const references should only be used for complex classes, when copying objects could be a performance problem. In the case of primitives, unless you need to modify the value of the variable you shouldn't pass a reference. The reason is that references take more computation time than non-references, since with references, the program needs to look up in a table to find the address of the object. When this look-up time is shorter than the copying time, references are an improvement.
Generally, ints and addresses have the same byte length in low-level implementations. So the time of copying an int as a return value for a function is equivalent to the time of copying an address. But in the case where an int is returned, no look up is performed, therefore performance is increased.
The main difference between returning a value and returning a const reference is that you then can const_cast that reference and alter the value.
It's an example of bad design and an attempt to create a smart design where easy and concise design would be more than enough. Instead of just returning a value the author makes readers of code think what intention he might have had.
There is not much benefit. I have seen this in framework or macro generated getters and setters before. The macro code did not distinguish between primitive and non-POD types and just used const type& across the board for setters. I doubt that it is an efficiency issue or a genuine misunderstanding; chances are this is a consistency issue.
I think this type of code is written who have misunderstood the concept of references and use it for everything including primitive data types. I've also seen some code like this and can't see any benefit of doing this.
There is no point and benefit except
void MyClass::setFoo(const int foo)
void MyClass::setFoo(const int& foo)
as then you won't be able to reuse 'foo' variable inside 'setFoo' implementation. And I believe that 'int&' is just because Guy just get used to pass all things by const reference and there is nothing wrong with that.