Returning a Static Local Reference - c++

Suppose I have a function that will return a large data structure, with the intention that the caller will immediately copy the return value:
Large large()
{
return Large();
}
Now suppose I do not want to rely on any kind of compiler optimizations such as return value optimization etc. Also suppose that I cannot rely on the C++11 move constructor. I would like to gather some opinions on the "correctness" of the following code:
const Large& large()
{
static Large large;
large = Large();
return large;
}
It should work as intended, but is it poor style to return a reference to a static local even if it is const qualified?

It all depends on what should work as expected means. In this case all callers will share references to the exact same variable. Also note that if callers will copy, then you are effectively disabling RVO (Return Value Optimization), which will work in all current compilers [*].
I would stay away from that approach as much as possible, it is not idiomatic and will probably cause confusion in many cases.
[*]The calling convention in all compilers I know of determines that a function that returns a large (i.e. does not fit a register) variable receives a hidden pointer to the location in which the caller has allocated the space for the variable. That is, the optization is forced by the calling convention.

I don't think there's any issue with doing this. So long as this code base is, and forever will be, single threaded.
Do this on a multithreaded piece of code, and you might never be able to figure out why your data are occasionally being randomly corrupted.

Related

function returning - unique_ptr VS passing result as parameter VS returning by value

In c++, what the preferred/recommended way to create an object in a function/method and return it to be used outside the creation function's scope?
In most functional languages, option 3 (and sometimes even option 1) would be preferred, but what's the c++ way of best handling this?
Option 1 (return unique_ptr)
pros: function is pure and does not change input params
cons: is this an unnecessarily complicated solution?
std::unique_ptr<SomeClass> createSometing(){
auto s = std::make_unique<SomeClass>();
return s;
}
Option 2 (pass result as a reference parameter)
pros: simple and does not involve pointers
cons: input parameter is changed (makes function less pure and more unpredictable - the result reference param could be changed anywhere within the function and it could get hard/messy to track in larger functions).
void createSometing(SomeClass& result){
SomeClass s;
result = s;
}
Option 3 (return by value - involves copying)
pros: simple and clear
cons: involves copying an object - which could be expensive. But is this ok?
SomeClass createSometing(){
SomeClass s;
return s;
}
In modern C++, the rule is that the compiler is smarter than the programmer. Said differently the programmer is expected to write code that will be easy to read and maintain. And except when profiling have proven that there is a non acceptable bottleneck, low level concerns should be left to the optimizing compilers.
For that reason and except if profiling has proven that another way is required I would first try option 3 and return a plain object. If the object is moveable, moving an object is generally not too expensive. Furthermore, most compilers are able to fully elide the copy/move operation if they can. If I correctly remember, copy elision is even required starting with C++17 for statements like that:
T foo = functionReturningT();
This is a loaded question, because the matter involves a decision to create the object on the heap vs not creating it on the heap. In C++, it’s ideal to have objects that can be passed around as values cheaply. std::string is a good example of that. It’s generally a premature pessimization to allocate std::string on the heap. On the other hand, the object you may be creating may be large and expensive to copy. In that case, putting it on the heap would be preferable. But that assumes that a copy would have to take place. By default, the copy is eluded! But also: figure out if the type could be made cheaper to copy.
So there’s no “one way suits all”. In my experience, legacy code tends to overuse the heap.
In most cases, returning by value is preferable, since all mainstream compilers will have the function instantiate the object in the storage where it’ll reside, without moves nor copies.
Then, the object can be copy-constructed on the heap by the user of the function, if they so desire, and the compiler will get rid of that copy as well.
Micromanagement of this stuff, without looking at actual generated code, is typically a waste of time, since the code declares intent and not the implementation. Compilers these days literally produce code that has equivalent meaning, taking the C++ source’s semantics, but not necessarily using the source to dictate identical implementation at the machine level.
Thus, in most instances, returning by value is the sensible default, unless the type is borked and doesn’t support that. Unfortunately , some widely used types are in this camp, eg. Qt’s QObject.
TL;DR: Given MyType myFactoryFunction();, the statement auto obj = std::make_unique<MyType>(myFactoryFunction()); will not copy nor move on modern compilers in the release build, if the type is designed well.
There isn't a single right answer and it depends on the situation and personal preference to some extent. Here are pros and cons of different approaches.
Just declare it
SomeClass foo(arg1, arg2);
Factory functions should be relatively uncommon and only needed if the code creating the object doesn't have all the necessary information to create it (or shouldn't, due to encapsulation reasons). Perhaps it's more common in other languages to have factory functions for everything, but instantiating objects directly should be the first pick.
Return by value
SomeClass createSomeClass();
The first question is whether you want the resulting object to live on the stack or the heap. The default for small objects is the stack, since it's more efficient as you skip the call to malloc(). With Return Value Optimization usually there's no copy.
Return by pointer
std::unique_ptr<SomeClass> createSomeClass();
or
SomeClass* createSomeClass();
Reasons you might pick this include being a large object that you want to be heap allocated; the object is created out of some data store and the caller won't own the memory; you want a nullable return type to signal errors.
Out parameter
bool createSomeClass(SomeClass&);
Main benefits of using out parameters are when you have multiple return types. For example, you might want to return true/false for whether the object creation succeeded (e.g. if your object doesn't have a valid "unset" state, like an integer). You might also have a factory function that returns multiple things, e.g.
void createUserAndToken(User& user, Token& token);
In summary, I'd say by default, go with return by value. Do you need to signal failure? Out parameter or pointer. Is it a large object that lives on the heap, or some other data structure and you're giving out a handle? Return by pointer. If you don't strictly need a factory function, just declare it.

Is it faster to return a value or modify a parameter passed by reference?

In a programme I am writing, I have to pass large data structures (images) between functions. I need my code to be as fast as possible, on different OSs (thus, I can't profile all test cases). I frequently have code of the form...
void foo() {
ImageType img = getCustomImage();
}
ImageType getCustomImage() {
ImageType custom_img;
//lots of code
return custom_img;
}
AFAIK, the line ImageType img = getCustomImage(); will result in a copy constructor being called for img with the return value from custom_img as its parameter. Wikipedia says that some compilers will even do this operation again, for an initial temporary variable!
My question: Is it faster in general to thus bypass this overhead (copy constructors for images are expensive) by using pass by reference rather than a return value...
void foo() {
ImageType img;
getCustomImage(img);
}
void getCustomImage(ImageType &img) {
//code operating directly on img
}
I've been told that if the compiler supports return value optimisation then there should be no difference. Is this true? can I (within reason) assume this nowadays, and how should I structure my programmes when speed is important
You should write code that is maintainable, compilers are really good at doing the right thing for performance in most cases. If you feel that things go slowly, then measure the performance and after you have located the bottleneck, try to figure out how to improve it.
You are right in that logically the code triggers different copy constructions: from custom_img to the returned temporary and then to the img object in the caller code, but the fact is that both copies will be elided.
In the particular case of return by value versus default-construct + pass-by-reference, all calling conventions that I know of implement return by value by having the caller allocate the memory and pass a hidden pointer to the callee, which effectively implements what you would be trying to do. So from a performance point of view, they are basically equivalent.
I wrote about this (value semantics in function arguments and return values) in the past in this two blog entries:
Named Return Value Optimization
Copy Elision
EDIT: I have intentionally avoided the discussion of the cases where NRVO cannot be applied by the compiler, the reason being that any function f that can take a reference to the object for processing: void f( T & out ) { /* code */ } can be trivially convertible to a function where NRVO is trivial for the compiler to implement that returns by value by a simple transformation into: T f() { T out; /* code */ return out; }
since your images are big data structures, I would perhaps suggest that function should return pointers to images. You could use references also (which at the machine level are pointers), but I think pointers fit better for that purpose.
I am more familiar with C than with C++, so I could be wrong.
The important issue is when and by whom should your images be de-allocated.
At least if you'r targeting reasonably current compilers for the reasonably typical OSes like Windows, MacOS, Linux, or *BSD, you can pretty well count on their implementing RVO/NRVO. IOW, you'd have to look pretty hard to find cases where there was enough difference to care about -- or most likely any at all.
Depending on how you're using the data involved, if there is a speed difference, it could just about as easily favor passing/return objects as using a reference. You might want to read David Abrahams's article about this.
Seeing the question "What is faster?", I generally advise to actually measure for yourself, in your compiler / environment, and then to find out why is that so.

Pass reference to output location vs using return

Which is better for performance when calling a function that provides a simple datatype -- having it fill in a memory location (passed by pointer) or having it return the simple data?
I've oversimplified the example returning a static value of 5 here, but assume the lookup/functionality that determines the return value would be dynamic in real life...
Conventional logic would tell me the first approach is quicker since we are operating by reference instead of having to return a copy as in the 2nd approach... But, I'd like others' opinions.
Thanks
void func(int *a) {
*a = 5;
}
or...
int func() {
return 5;
}
In general, if your function acts like a function (that is, returning a single logical value), then it's probably best to use int func(). Even if the return value is a complex C++ object, there's a common optimisation called Return Value Optimisation that avoids unnecessary object copying and makes the two forms roughly equivalent in runtime performance.
Most compilers will return a value in a register as long as what you're returning is small enough to fit in a register. It's pretty unusual (and often nearly impossible) for anything else to be more efficient than that.
For PODs, there is no or almost no difference and I'd always go with a return value as I find those cleaner and easier to read.
For non-PODs the answer is "it depends" - a lot of compilers use Return Value Optimisation in this sort of scenario which tends to create an implicit reference parameter.
However unless you have measured - not "know", but actually measured with a profiler - that returning the results of the function using a return value is actually a bottleneck in your software, go for the more readable version of the code.
In my opinion, always go with return unless you know of a reason not to, or you have to return more than one value from the function. Returning a built-in type is very efficient, and whatever the difference vs. returning via pointer, it must be negligible. But the real benefit here is using return is clearer and simpler for those who read the code later.
Returning a simple value is just something like an instrution in assmbly ( ie MOV eax,xxxx ), passing a parameter introduce a little more overhead. in any case you should not worry about that, difference are hard to notice.
Another important issue is that a function returniong on the left is generally cleaner in term of design, and preferred when possible.
This is a low level thing, where it would be hard to see any difference.
Easy answer: it depends.
It depends on the types being used, whether they can be copied cheaply or not (or at all), whether the compiler can use RVO in some circumstances or not, inline things better with one form or another...
Use what makes sense in the context.

Performance when accessing class members

I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now
The compiler will probably equalize them. If it has any brains at all, it will copy values.a, values.b, and values.c into local variables or registers, which is also what happens in the simple case.
The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.
This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)
No, your cpu would cache the variables you use over and over again.
I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
int a = objA.a;
int b = objA.b;
int c = objA.c;
I think it would be almost the same performance
If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.
you can also start your function with
int & a = values.a;
int & b = values.b;
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.
As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the this pointer) and some offset (the location in the function's stack frame, or the location inside the class).
Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the this pointer might not. If this is the case, accessing this (which is presumably on the stack) would require an extra memory lookup.
Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.
If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.

How to avoid out parameters?

I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).