Lets say I have a function where the parameter is passed by value instead of const-reference. Further, lets assume that only the value is used inside the function i.e. the function doesn't try to modify it. In that case will the compiler will be able to figure out that it can pass the value by const-reference (for performance reasons) and generate the code accordingly? Is there any compiler which does that?
If you pass a variable instead of a temporary, the compiler is not allowed to optimize away the copy if the copy constructor of it does anything you would notice when running the program ("observable behavior": inputs/outputs, or changing volatile variables).
Apart from that, the compiler is free to do everything it wants (it only needs to resemble the observable behavior as-if it wouldn't have optimized at all).
Only when the argument is an rvalue (most temporary), the compiler is allowed to optimize the copy to the by-value parameter even if the copy constructor has observable side effects.
Only if the function is not exported there is a chance the compiler to convert call-by-reference to call-by-value (or vise-versa).
Otherwise, due to the calling convention, the function must keep the call-by-value/reference semantic.
I'm not aware of any general guarantees that this will be done, but if the called function is inlined, then this would then allow the compiler to see that an unnecessary copy is being made, and if the optimization level is high enough, the copy operation would be eliminated. GCC can do this at least.
You might want to think about whether the class of this parameter value has a copy constructor or not. If it doesn't, then the performance difference between pass-by-value and pass-by-const-ref is probably neglible.
On the other hand, if class does have a copy constructor that does stuff, then the optimization you are hoping for probably will not happen because the compiler cannot remove the call to the constructor--it cannot know that the side effects of the constructor are not important to you.
You might be able to get more useful answers if you say what the class of the parameter is, or if it is a custom class, describe what fields it has and whether it has a copy constructor.
With all optimisations the answer is generally "maybe". The only way to check is to examine the output assembly and see what it's really doing. If the standard allows it, whether or not it really happens is down to the whims of the compiler. You should not rely on it happening because an arbitrary change elsewhere in your codebase may change the heuristics used by the optimizer which might cause it to stop performing a certain optimization.
Play it safe: code it how you intend - pass by reference if that's what you want. However, if you're writing templated code which could work on types of any size, the choice is not so clear. Personally I'd side with passing by const reference - the compiler could also perform a different optimisation, where a small type which can fit inside the size of a reference is passed by value, rather than by const reference. But again, it might happen, it might not.
This post is an excellent reference to this kind of optimization:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Related
When returning a container, I always had to determine if I should use return value or use output parameter. If the performance matter, I chose the second option, otherwise I always chose the first option because it is more intuitive.
Frankly, I personally have been strongly objected about output parameters, possibly because of my mathematical background, but it was kind of OK to use them when I had no other options.
However, the things have been completely changed when it comes to generic programming. There are situations I encountered where a function may not know whether or not the object it returns is a huge container or just a simple value.
Consistently using output parameters may be the solution that I want to know if I can avoid. It is just so awkward if I have to do
int a;
f(a, other_arguments);
compared to
auto a = f(other_arguments);
Furthermore, sometimes the return type of f() has no default constructor. If output parameters are used, there is no graceful way to deal with that case.
I wonder if it is possible to return a "modifier object", a functor taking output parameters to modify them appropriately. (Perhaps this is a kind of lazy evaluation?) Well, returning such objects is not a problem, but the problem is I can't insert an appropriate overload of the assignment operator (or constructor) that takes such an object and triggers it to do its job, when the return type belongs to a library that I can't touch, e.g., std::vector. Of course, conversion operators are not helpful as they have no access to existing resources prepared for the target object.
Some people might ask why not use assign(); define a "generator object" which has begin() & end(), and pass those iterators to std::vector::assign. It is not a solution. For the first reason, the "generator object" does not have the full access to the target object and this may limit what could be done. For the second and more important reason, the call site of my function f() may also be a template which does not know the exact return type of f(), so it cannot determine which of the assignment operator or the assign() member function should be used.
I think that the "modifier object" approach to modify containers should have been already discussed in the past as it is not at all a new idea.
To sum up,
Is it possible to use return values to simulate what would happen when output parameters are used instead, in particular, when outputs are containers?
If not, was adding those supports to the standard discussed before? If it was, what were the issues? Is it a terrible idea?
Edit
The code example I've put above is misleading. The function f() may be used to initialize a local variable, but it may be also used to modify existing variables defined elsewhere. For the first case, as Rakete1111 mentioned, there is no problem with return by value as copy elision comes into play. But for the second case, there may be unnecessary resource releasing/acquiring.
I don't think your "modifier object" was ever proposed (AFAIK). And it will never go into the standard. Why? Because we already have a way to get rid of the expensive copy, and that is return by value (plus compiler optimizations).
Before C++17, compilers were allowed to do basically almost the same thing. This optimization is known as (N)RVO, which optimizes away the copy when returning a (named) temporary from a function.
auto a = f(other_arguments);
Will not return a temporary, then copy it into a. The compiler will optimize the copy away entirely, it is not needed. In theory, you cannot assume that your compiler supports this, but the three major ones do (clang, gcc, MSVC) so no need to worry - I don't know about ICC and the others, so I can't say.
So, as there is no copy (or move) involved, there is no performance penalty of using return values instead of output parameters (most probably, if for some reason your compiler doesn't support it, you'll get a move most of the time). You should always use return parameters if possible, and only use output parameters or some other technique if you measure that you get significantly better performance otherwise.
(Edited, based on comments)
You are right you should avoid output parameters if possible, because the code using them is harder to read and to debug.
Since C++11 we have a feature called move constructors (see reference). You can use std::move on all primitive and STL containers types. It is efficient (question: Efficiency difference between copy and move constructor), because you don't actually copy the values of the variables. Only the pointers are swapped. For your own complex types, you can write your own move constructor.
On the other hand, the only thing you can do is to return a reference to a temporary, which is of undefined behavior, for example:
#include <iostream>
int &&add(int initial, int howMany) {
return std::move(initial + howMany);
}
int main() {
std::cout << add(5, 2) << std::endl;
return 0;
}
Its output could be:
7
You could avoid the problem of temporary variables using global or static ones, but it is bad either. #Rakete is right, there is no good way of achieving it.
One commonly known compiler optimisation is is the so-called return value optimisation. This optimisation basically allows the compiler to not copy a local variable that is being returned from a function, but instead moving it.
However, I was wondering if the same is also possible for passing arguments to a function by value if it is known that the return value of the function will overwrite the original argument.
Here is an example. Let's assume we have the following function:
std::vector<Foo> modify(std::vector<Foo> data) {
/* Do some funny things to data */
return data;
}
This function is then used in the following way:
std::vector<Foo> bigData = /* big data */;
bigData = modify(bigData); // Here copying the data into the function could be omitted
Now, in this case it can be clearly determined that the return value of the function call will override the argument that is passed into the function per value. My question is whether current compilers are able to optimise this code in a way so that the argument data is not copied when passed to the function, or if this might even be a part of the so-called return value optimisation.
Update
Let's take C++11 into account. I wonder if the following understanding is correct: If the value passed to a function parameter by value is an r-value, and the type of the parameter has a move-constructor, the move constructor will be used instead of the copy constructor.
For example:
std::vector<Foo> bigData = /* big data */;
bigData = modify(std::move(bigData));
If this is assumption is correct, this eliminates the copy operation when passing the value. From the answers already given it seems that the optimisation I referred to earlier is not commonly undertaken. Looking at this manual approach I don't really understand why, as appears to be pretty straightforward to apply.
It's hard to say for sure because in principle compilers can optimize many things, as long as they are certain it has the same behavior. However, in my experience, this optimization will not occur without inlining. Consider the following code:
__attribute__((noinline)) std::vector<double> modify(std::vector<double> data) {
std::sort(data.begin(), data.end());
return data;
}
std::vector<double> blah(std::vector<double> v) {
v = modify(v);
return v;
}
You can look at the assembly generated for this for various compilers; here I have clang 4.0 with O3 optimization: https://godbolt.org/g/xa2Dhf. If you look at the assembly carefully, you'll see a call to operator new in blah. This proves that blah is indeed performing a copy in order to call modify.
Of course, if inlining occurs, it should be pretty trivial for the compiler to remove the copy.
In C++11 the compiler could determine that bigData is reassigned after use in the function and pass it as rvalue, but there is no guarantee for that, unlike for the RVO (from c++17).
For std::vector at least you can make sure this happens by calling the function as modify(std::move(bigData)), which will construct the value in modify from the rvalue reference, which it cannot optimize with the RVO afaik, because it is the function parameter, which is explicitly excluded from this optimization (3rd point here). However the compiler should understand that the return value is an r-value, and move it into big-data again.
Whether some compilers elide a move from an object into a function and out of the function back into the object I don't know for sure, but I know nothing that explicitly allows it, and since the move-constructor could have observable side-effects, that probably means, that it is not allowed (cf. the Notes section in above link).
That is really compiler specific and depends on how you perform operations(whether we are modifying the data or not) with the data. Mostly you shouldn't expect the compiler to do such kind of optimizations unless you really benchmark it. I did some tests with VS2012 compiler that performs copy operations though we don't modify it.
Please have a look at this post(Does the compiler optimize the function parameters passed by value?), that may give you a better idea I hope.
I want to construct an object with another using rvalue.
class BigDataClass{
public:
BigDataClass(); //some default BigData
BigDataClass(BigDataClass&& anotherBigData);
private:
BigDataClass(BigDataClass& anotherBigData);
BigDataPtr m_data;
};
So now I want to do something like:
BigDataClass someData;
BigDataClass anotherData(std::move(someData));
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now
has an oppourtunity to optimize the initialization of anotherData with moving
someData to another.
In my opinion we can in fact get 2 different things:
Optimized approach: data moved. It's optimized, fast and we're happy
Nonoptimized approach: data not moved. We have to copy data from object to another AND delete data from the first one(as far as I know after changing object to rvalue once we cannot use it, because it has got no ownership of data, that it held). In fact it can be even slower than initialization with lvalue referrence due to deletion operation.
Can we really get so unoptimized way of data initialization?
You said:
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now has an oppourtunity to optimize the initialization of anotherData with moving someData to another.
Actually, what it stated was:
Code that receives such an xvalue has the opportunity to optimize away unnecessary overhead by moving data out of the argument, leaving it in a valid but unspecified state.
That is, it's the code that's responsible for optimization here, not the compiler. All std::move(someData) does is cast its argument to an rvalue reference. Given that BigDataClass has a constructor that takes an rvalue reference, that constructor is preferred, and that constructor will be the one that is called. There isn't any room for change here from the compiler's point of view. Thus the code will do whatever the BigDataClass(BigDataClass&&) constructor does.
Looks like you are confused with what is optimization and what is not. Using move constructor (when available) is not an optimization, it is mandated by standard. It is not that the compiler has this opportunity, it has to do this.
On the other hand, copy elision is an optimization which compiler has an opportunity to perform. How relibale it is, depends on your compiler (though they are applying it pretty uniformely) and the actual function code.
You think about what the optimizer can do with move semantics. Simply nothing itself! You, the coder, has to implement the code which is the optimization compared against the constructor with a const ref.
The question can go to the opposite:
If the compiler already knows that you have a rvalue which is passed as const ref to a constructor, the compiler is able to do the construction as if the value is generated in the constructor itself. Copy eliding is done very often by up to date compilers. The question here is ( for me ) how many effort I should spend to write some rvalue reference constructions to get the same result as the compiler already builds on the fly for me.
OK, in c++11 you have a lot of opportunities to handle code for forwarding and moving by your algorithms. And yes, some benefit can be generated. But I see the benefit only for templated code where I have the need to move/forward some of the parameters to (meta)template functions.
And on the opposite: Handling rvalue references must taken with care and the meaning of a valid but undefined state rise some questions on every user who use your interface implementation. See also: What can I do with a moved-from object?
What kind of optimizations does rvalue guarantee
Simply nothing. You have to implement it!
One of the goals of C++ is to allow user-defined types to behave as nicely as built-in types. One place where this seems to fail is in compiler optimization. If we assume that a const nonvolatile member function is the moral equivalent of a read (for a user-defined type), then why not allow a compiler to eliminate repeated calls to such a function? For example
class C {
...
public:
int get() const;
}
int main() {
C c;
int x{c.get()};
x = c.get(); // why not allow the compiler to eliminate this call
}
The argument for allowing this is the same as the argument for copy elision: while it changes the operational semantics, it should work for code that follows good semantic practice, and provides substantial improvement in efficiency/modularity. (In this example it is obviously silly, but it becomes quite valuable in, say, eliminating redundant iterative safety checks when functions are inlined.)
Of course it wouldn't make sense to allow this for functions that return non-const references, only for functions that return values or const references.
My question is whether there is a fundamental technical argument against this that doesn't equally apply to copy elision.
Note: just to be clear, I am not suggesting the compiler look inside of the definition of get(). I'm saying that the declaration of get() by itself should allow the compiler to elide the extra call. I'm not claiming that it preserves the as-if rule; I'm claiming that, just as in copy elision, this is a case where we want to allow the compiler to violate the as-if rule. If you are writing code where you want a side effect to be semantically visible, and don't want redundant calls to be eliminated, you shouldn't declare your method as const.
New answer based on clarification on the question
C::get would need a stronger annotation than const. As it stands today, the const is a promise that the method doesn't (conceptually) modify the object. It makes not guarantees about interaction with global state or side effects.
Thus if the new version of the C++ standard carved out another exception to the as-if rule, as it did for copy elision, based solely on the fact that a method is marked const, it would break a lot of existing code. The standards committee seems to try pretty hard not to break existing code.
(Copy elision probably broke some code, too, but I think it's actually a pretty narrow exception compared to what you're proposing.)
You might argue that we should re-specify what const means on a method declaration, giving it this stronger meaning. That would mean you could no longer have a C::print method that's const, so it seems this approach would also break a lot of existing code.
So we would have to invent a new annotation, say pure_function. To get that into the standard, you'd have to propose it and probably convince at least one compiler maker to implement it as an extension to illustrate that it's feasible and useful.
I suspect that the incremental utility is pretty low. If your C::get were trivial (no interaction with global state and no observable side effects), then you may as well define it in the class definition, thus making it available for inlining. I believe inlining would allow the compiler to generate code as optimal as a pure_function tag on the declaration (and maybe even more so), so I wouldn't expect the incremental benefit of a pure_function tag to be significant enough to convince the standards committee, compiler makers, and language users to adopt it.
Original answer
C::get could depend on global state and it might have observable side effects, either of which would make it a mistake to elide the second call. It would violate the as-if rule.
The question is whether the compiler knows this at the time it's optimizing at the call site. As your example is written, only the declaration of C::get is in scope. The definition is elsewhere, presumably in another compilation unit. Thus the compiler must assume the worst when it compiles and optimizes the calling code.
Now if the definition of C::get were both trivial and in view, then I suppose it's theoretically possible for the compiler to realize there are no side effects or non-deterministic behavior, but I doubt most optimizers get that aggressive. Unless C::get were inlined, I imagine there would be an exponential growth in the paths to analyze.
And if you want to skip the entire assignment statement (as opposed to just the second call of C::get), then the compiler would also have to examine the assignment operator for side effects and reliance on global state in order to ensure the optimization wouldn't violate the as-if rule.
First of all const-ness of methods (or of references) is totally irrelevant for the optimizer, because constness can be casted away legally (using const-cast) and because, in case of references, there could be aliasing. Const correctness has been designed to help programmers, not the optimizer (another issue is if it really helps or not, but that's a separate unrelated discussion).
Moreover to elide a call to a function the optimizer would also need to be sure that the result doesn't depend and doesn't influence global state.
Compilers sometimes have a way to declare that a function is "pure", i.e. that the result depends only on the arguments and doesn't influence global state (like sin(x)), but how you declare them is implementation dependent because the C++ standard doesn't cover this semantic concept.
Note also that the word const in const reference describes a property of the reference, not of the referenced object. Nothing is known about the const-ness of an object that you're given a const reference of and the object can indeed change or even go out of existence while you have the reference still in your hands. A const reference means simply that you cannot change the object using that reference, not that the object is constant or that it will be constant for a while.
For a description of why a const reference and a value are two very different semantic concepts and of the subtle bugs you can meet if you confuse them see this more detailed answer.
The first answer to your question from Adrian McCarthy was just about as clear as possible:
The const-ness of a member function is a promise that no modification of externally visible state will be made (baring mutable variables in an object instance, for example).
You would expect a const member function which just reported the internal state of an object to always return the same answer. However, it could also interact with the ever changing real world and return a different answer every time.
What if it is a function to return the current time?
Let us put this into an example.
This is a class which converts a timestamp (double) into a human readable string.
class time_str {
// time and its format
double time;
string time_format;
public:
void set_format(const string& time_format);
void set_time(double time);
string get_time() const;
string get_current_time() const;
};
And it is used (clumsily) like so:
time_str a;
a.set_format("hh:mm:ss");
a.set_time(89.432);
cout << a.get_time() << endl;
So far so good. Each invocation to a.get_time(); will return the same result.
However, at some point, we decide to introduce a convenience function which returns the current time in the same format:
cout << a.get_time() << " is different from " << a.get_current_time() << endl;
It is const because it doesn't change the state of the object in any way (though it accesses the time format). However, obviously each call to get_current_time() must return a different answer.
This is probably a simple question, but this came across my mind. It is regarding the difference between the two functions below:
T func_one(T obj) { //for the purpose of this question,
return obj + obj; //T is a large object and has an overloaded '+' operator
}
T func_two(T obj) {
T output = obj + obj;
return output;
}
In func_one(), rather than creating an object T, assigning it a value and then returning the object, I just return the value itself without creating a new object. If T was a large object, would func_one() be more efficient than func_two() or does func_one() make an object T anyways when returning the sum of the two objects?
The compiler would optimize away fund_two into something similar to func_one which would then be optimized to something else, long story short, you need not to worry about this, unless you really do need to worry about this, then in that case you can look at the asm output.
Short answer: We can't know
Long answer: it depends highly on how T works and your compilers support for return value optimization.
Any function which returns by value can have RVO or NRVO optimization applied to it.
This means that it will construct the return value directly into the calling function, eliminating the copy constructor. As this is the problem with returning large objects by value, this will mean a substantial gain in performance.
The difference between func_one and func_two is that func_one returns an anonymous temporary value, an r-value; this means RVO can trivially be used. func_two returns a named value, an l-value, so NRVO, a much harder optimization, will be used. However, func_two is trivial, so it will almost certainly have NRVO applied, and both functions will be basically identical.
This is assuming you have a modern or even semi-modern compiler; if not, it will depend highly on how you implemented T.
If T has move semantics, your compiler will instead be able to move rather than copy. This should apply to both functions, as temporaries exist in both; however, as func_two returns a named value, it may not be capable of using move semantics. It's up to the compiler, and if the compiler isn't doing RVO or NRVO, I doubt it's doing move.
Finally, it depends on how + operator and = operator are implemented. If, for example, they were implemented as expression templates, then fun_two still requires an assignment, which will slow it down, where as func_one will simply return a highly optimized temporary.
In Summary
In almost all practical contexts, these are identical. In the vanishingly small window where your compiler is acting very strange, func_one is almost universally faster.
Modern compilers can transform the version with the extra variable to the one without (named return value optimization, this is quite a frequent source of questions here on SO, Why isn't the copy-constructor called when returning LOCAL variable for example). So this is not the overhead you should worry about.
The overhead you should worry about, is the function call overhead. An addition takes a modern CPU at most a single cycle. A function call takes between 10 and 20 cycles, depending on the amount of arguments.
I am a bit unsure what you mean with T in your question (is it a template parameter? is it a class? is it a placeholder for a type that you didn't want to disclose in your question?). However, the question whether you have a function call overhead problem depends on that type. And it depends on whether your compiler can inline your function.
Obviously, if it's inlined, you're fine, there's no function call overhead.
If T is a complex type with an expensive operator+() overload, then you are fine as well.
However, if T is int, for instance, and your function is not inlined, then you have roughly 90% overhead in your function.