value semantics vs output params with large data structures

value semantics vs output params with large data structures - c++

2013 Keynote: Chandler Carruth: Optimizing the Emergent Structures of C++
42:45
You don't need output parameters, we have value semantics in C++. ... Anytime you see someone arguing that nonono I'm not going to return by value because copy would cost too much, someone working on an optimizer says they're wrong. All right? I have never yet seen a piece of code where that argument was correct. ... People don't realize how important value semantics are to the optimizer because it completely clarifies the aliasing scenarios.
Can anyone put this in the context of this answer: https://stackoverflow.com/a/14229152
I hear that being repeated on and on but, well, for me a function returning something is a source. Output parameters by reference take that characteristic out of a function, and removing such hard coded characteristic from a function allows one to manage outside instead, how output will be stored/reused.
My question is, even in the context of that SO answer, is there a way to tell, restructuring the code some other equivalent way, "ok now see, value semantics in this way doesn't lose for the output param version", or Chandler comments were specific to some contrived situations? I had even seen Andrei Alexandrescu arguing this in a talk and telling you can't escape using by ref output for better performance.
For another take on Andrei's comments see Eric Niebler: Out Parameters, Move Semantics, and Stateful Algorithms.

It's either an overstatement, generalization, a joke, or Chandler's idea of "Perfectly Reasonable Performance" (using modern C++ toolchains/libs) is unacceptable to my programs.
I find it a fairly narrow scope of optimizations. Penalties exist beyond that scope which cannot be ignored due to the actual complexities and designs found in programs - heap allocations were an example for the getline example. The specific optimizations may or may not always be applicable to the program in question, despite your attempts to reduce them. Real world structures will reference memory which may alias. You can reduce that, but it's not practical to believe that you can eliminate aliasing (from an optimizer's perspective).
Of course, RBV can be a great thing - it's just not appropriate for all cases. Even the link you referenced pointed out how one can avoid a ton of allocations/frees. Real programs and the data structures found in them are far more complex.
Later in the talk, he goes on to criticize the use of member functions (ref: S::compute()). Sure, there is a point to take away, but is it really reasonable to avoid using these language features entirely because it makes the optimizer's job easier? No. Will it always result in more readable programs? No. Will these code transformations always result in measurably faster programs? No. Are the changes required to transform your codebase worth the time you invest? Sometimes. Can you take away some points and make better informed decisions which impact some of your existing or future codebase? Yes.
Sometimes it helps to break down how exactly your program would execute, or what it would look like in C.
The optimizer will not solve all performance problems, and you should not rewrite programs with the assumption that the programs you are dealing with are "completely brain dead and broken designs", nor should you believe that using RBV will always result in "Perfectly Reasonable Performance". You can utilize new language features and make the optimizer's job easier, though there is much to gain there are often more important optimizations to invest your time on.
It's fine to consider the proposed changes; ideally you would measure the impact of such changes in real world execution times and impact on your source code before adopting these suggestions.
For your example: Even copying+assigning large structures by value can have significant costs. Beyond the cost of running constructors and destructors (along with their associated creation/cleanup of the resources they acquire and own, as pointed out in the link you reference), even things as simple as avoiding unnecessary structure copies can save you a ton of CPU time if you use references (where appropriate). The structure copy may be as simple as a memcpy. These aren't contrived problems; they appear in actual programs and the complexity can increase greatly with your program's complexity. Is reducing aliasing of some memory and other optimizations worth the costs, and does it result in "Perfectly Reasonable Performance"? Not always.

The problem with output parameters as described in the linked question is that they generally make the common calling case (i.e., you don't have a vector storage to use already) much more verbose than normal. For example, if you used return by value:
auto a = split(s, r);
if you used output parameters:
std::vector<std::string> a;
split(s,r,a);
The second one looks much less pretty to my eyes. Also, as Chandler mentioned, the optimizer can do much more with the first one than the second, depending on the rest of your code.
Is there a way we can get the best of both worlds? Emphatically yes, using move semantics:
std::vector<std::string> split(const std::string &s, const std::regex &r, std::vector<std::string> v = {})
{
auto rit = std::sregex_token_iterator(s.begin(), s.end(), r, -1);
auto rend = std::sregex_token_iterator();
v.clear();
while(rit != rend)
{
v.push_back(*rit);
++rit;
}
return v;
}
Now, in the common case, we can call split as normal (i.e., the first example) and it will allocate a new vector storage for us. In the important but rare case when we have to split repeatedly and we want to re-use the same storage, we can just move in a storage that persists between calls:
int main()
{
const std::regex r(" +");
std::vector<std::string> a;
for(auto i=0; i < 1000000; ++i)
a = split("a b c", r, std::move(a));
return 0;
}
This runs just as fast as the output-argument method and what's going on is quite clear. You don't have to make your function hard to use all the time just to get good performance some of the time.

I was about to implement a solution using string_view and ranges and then found this:
std::split(): An algorithm for splitting strings
This endorses value semantics output by return and I accept it as prettier than the current choice in the referred SO answer. This design also would take the characteristic of being a source out of a function even though it is returning. Simple illustration: one could have an outer reserved vector being repetitively filled by a returned range.
In any case, I'm not sure whether such version of split would help the optimizer in any sense (I'm talking in context to Chandler's talk here).
Notice
An output param version enforces the existence of a named variable at call site, which may be ugly to the eyes, but may turn debugging the call site always easier.
Sample solution
While std::split doesn't arrive, I've exercised the value semantics output by return version this way:
#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>
using namespace std;
using namespace std::experimental;
using namespace boost;
string_view stringfier(const cregex_token_iterator::value_type &match) {
return {match.first, static_cast<size_t>(match.length())};
}
using string_view_iterator =
transform_iterator<decltype(&stringfier), cregex_token_iterator>;
iterator_range<string_view_iterator> split(string_view s, const regex &r) {
return {
string_view_iterator(
cregex_token_iterator(s.begin(), s.end(), r, -1),
stringfier
),
string_view_iterator()
};
}
int main() {
const regex r(" +");
for (size_t i = 0; i < 1000000; ++i) {
split("a b c", r);
}
}
I have used Marshall Clow's string_view libc++ implementation found at https://github.com/mclow/string_view.
I have posted the timings at the bottom of the referred answer.

Related

Should we always rely on compiler optimizations nowadays?

It used to be a good practice to write code like this:
int var = 0, var1 = 0; //declare variables outside the loop not to create them every time we go into for
auto var2 = somefunc(/*some params*/);
for (int i = 0, size = vec.size(); i < size; ++i)
{
//do some calculation, use var, var1, var2
}
But now all(?) modern compilers will optimize it for you, and if you write:
for (int i = 0; i < vec.size(); ++i)
{
//do something
int var = /*some usage*/; //declare where you need it for better readability
//do something
int var1 = /*some usage with call to somefunc(/*some params*/) */; //declare where you need it for better readability
//do something
}
The compiler will optimize it to be the same as the first snippet (or even better).
So, should we always rely on the compiler to optimize variable allocation, etc, to write code that is easier to read for other programmers?
Disclaimer: this is not an opinion-based question. I don't have experience with many compilers, and I don't have experience in using compiler optimization options, so I expect people with experience to answer here, something like "I've worked with so many compilers, yes, nowadays they are all smart and we don't need to think about where to declare variables", or "oh, I've run into some compilers that didn't do those kind of optimizations, so we still need to think about it", or something about experience of usage of optimization options.

Variables don't get "created" like you think they do. In practice they're either positions on the stack, or registers reserved for a particular purpose. In most cases there's absolutely zero cost to scoping them inside the loop.
As always, look at the assembly output if you ever want to be sure.
If you're used to languages like Python, Ruby or JavaScript where creating a variable is an operation with an actual cost this might be why you're thinking this way, but in an optimized C++ build it's a whole different game, as is with any language that goes through a compiler pass, even a JIT.

The call to somefunc is going to be a fragile optimization away, if it occurs.
But in general, write for readability, avoid premature pessimization, determine where performance bottlenecks are after writing it, and expend effort on measured improvements where the performance bottleneck is.
Code that is hard to read is hard to debug and hard to make correct and hard to improve. All of which you'll often spend more time on than writing the code in the first place.
Variables of trivial type, like int or double, do not exist outside of debug builds the way you might imagine. More complex types can have more fundamental existence, in that where you create/destroy them can matter. But compilers continue to improve, so even here worrying to much without knowledge the code is a bottleneck is a bad plan.

Compiler optimizations are performance safety nets. In other words, they aren't something you rely on. They are a fallback for programmers unaware of penalties in their code.
That being said, you shouldn't trust optimizations by default because not all compilers share the same behavior. In the same sense that no two implementations of anything share the same behavior. If you work in embedded systems, especially on exotic hardware, you might get stuck with a bare minimal C compiler that does very little to help you.
Also, just because something is optimized, doesn't necessarily mean you have to sacrifice readability.

Why does TensorFlow recommends the "functional style for constructing operations"?

In TensorFlow's documentation, it is possible to find the following text:
// Not recommended
MatMul m(scope, a, b);
// Recommended
auto m = MatMul(scope, a, b);
I see no obvious benefit from using the "recommended" style. The first version is shorter at least. Also the "recommended" version might include more actions related to the unnecessary assignment operation.
I have read that documentation page no less than six times and still cannot get the rationale behind their reasoning.
Is this recommendation just a matter of style or may the second version have some benefits?

Also the "recommended" version might include more actions related to the unnecessary assignment operation.
There is no assignment. It's initialization. And any extra object that may exist in principle is elided entirely in any compiler worth using.
That recommendation you cite is inline with Herb Sutter's "Almost Always Auto" advice. The strongest point in favor of this style (which in full disclosure, I do not follow) is that it makes it impossible to leave a variable uninitialized.
auto foo; // ill-formed.
int foo; // indeterminate value
auto foo = int(); // a zero integer
Granted, a good static analysis tool can warn about it too, but it's still a very strong point that requires no additional analysis by a compiler or external tools.
Beyond that, the stylistic argument is that it also keep your code consistent with cases where you use auto for sanity's sake, such as
auto it = myVec.begin();
But YMMV on all counts. It is ultimately going to be a stylistic choice, and regardless of the chosen style, exceptions to both styles exist.

Should I use const for local variables for better code optimization?

I often use const for local variables that are not being modified, like this:
const float height = person.getHeight();
I think it can make the compiled code potentially faster, allowing the compiler to do some more optimization. Or am I wrong, and compilers can figure out by themselves that the local variable is never modified?

Or am I wrong, and compilers can figure out by themselves that the local variable is never modified?
Most of the compilers are smart enough to figure this out themselves.
You should rather use const for ensuring const-correctness and not for micro-optimization.
const correctness lets compiler help you guard against making honest mistakes, so you should use const wherever possible but for maintainability reasons & preventing yourself from doing stupid mistakes.
It is good to understand the performance implications of code we write but excessive micro-optimization should be avoided. With regards to performance one should follow the,
80-20 Rule:
Identify the 20% of your code which uses 80% of your resources, through profiling on representative data sets and only then attempt to optimize those bottlenecks.

This performance difference will almost certainly be negligible, however you should be using const whenever possible for code documentation reasons. Often times, compilers can figure this out for your anyway and make the optimizations automatically. const is really more about code readability and clarity than performance.

If there is a value type on the left hand, you may safely assume that it will have a negligible effect, or none at all. It's not going to influence overload resolution, and what is actually const can easily be deduced from the scope.
It's an entirely different matter with reference types:
std::vector<int> v(1);
const auto& a = v[0];
auto& b = v[0];
These two assignments resolve to two entirely different operators, and similar overload pairs are found in many libraries aside STL too. Even in this simple example, optimizations which depend on v having been immutable for the scope of b are already no longer trivial and less likely to be found.
The STL is still quite tame in these terms though, in such that at least the behavior doesn't change based on choosing the const_reference overload or not. For most of STL, the const_reference overload is only tied to the object being const itself.
Some other libraries (e.g. Qt) make heavy use of copy-on-write semantics. In these const-correctness with references is no longer optional, but necessary:
QVector<int> v1(1);
auto v2 = v1; // Backing storage of v2 and v1 is still linked
const auto& a = v1[0]; // Still linked
const auto& b = v2[0]; // Still linked
auto& c = v2[0]; // Deep copy from v1 to v2 is happening now :(
// Even worse, &b != &c
Copy-on-write semantics are something commonly found in large-matrix or image manipulation libraries, and something to watch out for.
It's also something where the compiler is no longer able to save you, the overload resolution is mandated by C++ standard and there is no leeway for eliminating the costly side effects.

I don't think it is a good practice to make local variables, including function parameters, constant by default.
The main reason is brevity. Good coding practices allow you to make your code short, this one doesn't.
Quite similarly, you can write void foo(void) in your function declarations, and you can justify it by increased clarity, being explicit about not intending to pass a parameter to the function, etc, but it is essentially a waste of space, and eventually almost fell out of use. I think the same thing will happen to the trend of using const everywhere.
Marking local variables with a const qualifier is not very useful for most of the collaborators working with the code you create. Unlike class members, global variables, or the data pointed to a by a pointer, a local variable doesn't have any external effects, and no one would ever be restricted by the qualifier of the local variable or learn anything useful from it (unless he is going to change the particular function where the local variable is).
If he needs to change your function, the task should not normally require him to try to deduce valuable information from the constant qualifier of a variable there. The function should not be so large or difficult to understand; if it is, probably you have more serious problems with your design, consider refactoring. One exception is functions that implement some hardcore math calculations, but for those you would need to put some details or a link to the paper in your comments.
You might ask why not still put the const qualifier if it doesn't cost you much effort. Unfortunately, it does. If I had to put all const qualifiers, I would most likely have to go through my function after I am done and put the qualifiers in place - a waste of time. The reason for that is that you don't have to plan the use of local variables carefully, unlike with the members or the data pointed to by pointers.
They are mostly a convenience tool, technically, most of them can be avoided by either factoring them into expressions or reusing variables. So, since they are a convenience tool, the very existence of a particular local variable is merely a matter of taste.
Particularly, I can write:
int d = foo(b) + c;
const int a = foo(b); int d = a + c;
int a = foo(b); a += c
Each of the variants is identical in every respect, except that the variable a is either constant or not, or doesn't exist at all. It is hard to commit to some of the choices early.

There is one major problem with local constant values - as shown in the code below:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
// const uint32_t const_dummy = 0;
void func1(const uint32_t *ptr);
int main(void) {
const uint32_t const_dummy = 0;
func1(&const_dummy);
printf("0x%x\n", const_dummy);
return EXIT_SUCCESS;
}
void func1(const uint32_t *ptr) {
uint32_t *tmp = (uint32_t *)ptr;
*tmp = 1;
}
This code was compiled on Ubuntu 18.04.
As you can see, const_dummy's value can be modified in this case!
But, if you modify the code and set const_dummy scope to global - by commenting out the local definition and remove the comment from the global definition - you will get an exception and your our program will crash - which is good, because you can debug it and find the problem.
What is the reason? Well global const values are located in the ro (read only) section of the program. The OS - protects this area using the MMU.
It is not possible to do it with constants defined in the stack.
With systems that don't use the MMU - you will not even "feel" that there is a problem.

Use of const double for intermediate results

I a writing a Simulation program and wondering if the use of const double is of any use when storing intermediate results. Consider this snippet:
double DoSomeCalculation(const AcModel &model) {
(...)
const double V = model.GetVelocity();
const double m = model.GetMass();
const double cos_gamma = cos(model.GetFlightPathAngleRad());
(...)
return m*V*cos_gamma*Chi_dot;
}
Note that the sample is there only to illustrate -- it might not make to much sense from the engineering side of things. The motivation of storing for example cos_gamma in a variable is that this cosine is used many time in other expressions covered by (...) and I feel that the code gets more readable when using
cos_gamma
rather than
cos(model.GetFlightPathAngleRad())
in various expressions. Now the actual is question is this: since I expect the cosine to be the same througout the code section and I actually created the thing only as a placeholder and for convenience I tend to declare it const. Is there a etablished opinion on wether this is good or bad practive or whether it might bite me in the end? Does a compiler make any use of this additional information or am I actually hindering the compiler from performing useful optimizations?
Arne

I am not sure about the optimization part, but I think it is good to declare it as const. This is because if code is large, then if somebody incorrectly does a cos_gamma = 1 in between you will get a compiler error instead of run time surprises.

You can certainly help things by only computing the cosine once and using it everywhere. Making that result const is a great way to ensure you (or someone else) don't try to change it somewhere down the road.
A good rule of thumb here is to make it correct and readable first. Don't worry about any optimizations the compiler might or might not make. Only after profiling and discovering that a piece of code is indeed too slow should you worry about helping the compiler optimize things.

Given your code:
const double V = model.GetVelocity();
const double m = model.GetMass();
const double cos_gamma = cos(model.GetFlightPathAngleRad());
I would probably leave cos_gamma as it is. I'd consider changing V and m to references though:
const double &V = model.GetVelocity();
const double &m = model.GetMass();
This way you're making it clear that these are strictly placeholders. It does, however, raise the possibility of lifetime issues -- if you use a reference, you clearly have to ensure that what it refers to has sufficient lifetime. At least from the looks of things, this probably won't be a problem though. First of all, GetVelocity() and GetMass() probably return values, not references (in which case you're initializing the references with temporaries, and the lifetime of the temporary is extended to the lifetime of the reference it initializes). Second, even if you return an actual reference, it's apparently to a member of the model, which (at a guess) will exist throughout the entire calculation in question anyway.

I wish I worked with more code written like this!
Anything you can do to make your code more readable can only be a good thing. Optimize only when you need to optimize.
My biggest beef with C++ is that values are not const by default. Use const liberally and keep your feet bullet-hole free!

Whether the compiler makes use of this is an interesting question - once you're at the stage of optimizing a fully working program. Until then, you write the program mostly for the humans coming later and having to look at the code. The compiler will happily swallow (objectively) very unreadable code, lacking spaces, newlines, and sporting hilarious indentation. Humans won't.
The most important question is, how readable is the code, and how easy it is to make a mistake changing it. Here, const helps tremendously, because it makes the compiler bark at everyone who mistakenly changes anything that shouldn't change. I always make everything const unless I really, really want it to be changeable.

Is putting a function within an if statement efficient? (C++)

I've seen statements like this
if(SomeBoolReturningFunc())
{
//do some stuff
//do some more stuff
}
and am wondering if putting a function in an if statement is efficient, or if there are cases when it would be better to leave them separate, like this
bool AwesomeResult = SomeBoolReturningFunc();
if(AwesomeResult)
{
//do some other, more important stuff
}
...?

I'm not sure what makes you think that assigning the result of the expression to a variable first would be more efficient than evaluating the expression itself, but it's never going to matter, so choose the option that enhances the readability of your code. If you really want to know, look at the output of your compiler and see if there is any difference. On the vast majority of systems out there this will likely result in identical machine code.

either way it should not matter. the basic idea is that the result will be stored in a temporary variable no matter what, whether you name it or not. Readability is more important nowadays because computers are normally so fast that small tweaks don't matter as much.

I've certainly seen if (f()) {blah;} produce more efficient code than bool r = f(); if (r) {blah;}, but that was many years ago on a 68000.
These days, I'd definitely pick the code that's simpler to debug. Your inefficiencies are far more likely to be your algorithm vs. the code your compiler generates.

as others have said, basically makes no real difference in performance, generally in C++ most performance gains at a pure code level (as opposed to algorithm) are made in loop unrolling.
also, avoiding branching altogether can give way more performance if the condition is in a loop.
for instance by having separate loops for each conditional case or by having a statements that inherently take into account the condition (perhaps by multiplying a term by 0 if its not wanted)
and then you can get more by unrolling that loop
templating your code can help a lot with this in a "semi" clean way.

There is also the possibility of
if (bool AwesomeResult = SomeBoolRetuningFunc()) { ... }
:)
IMO, the kind of statements you have seen are clearer to read, and less error prone:
bool result = Foo();
if (result) {
//some stuff
}
bool other_result = Bar();
if (result) { //should be other_result? Hopefully caught by "unused variable" warning...
//more stuff
}

Both variants usually produce the same machine code and run exactly the same. Very rarely there will a performance difference and even in this cases it will unlikely be a bottleneck (which translates into don't bother prior to profiling).
The significant difference is in debugging and readability. With a temporary variable it's easier to debug. Without the variable the code is shorter and perhaps more readable.
If you want both easy to debug and easier to read code you should better declare the variable as const:
const bool AwesomeResult = SomeBoolReturningFunc();
if(AwesomeResult)
{
//do some other, more important stuff
}
this way it's clearer that the variable is never assigned to again and there's no other logic behind its declaration.

Putting aside any debugging ease or readability problems, and as long as the function's returned value is not used again in the if-block; it seems to me that assigning the returned value to a variable only causes an extra use of the = operator and an extra bool variable stored in the stack space - I might speculate further that an extra variable in the stack space will cause latency in further stack accesses (not sure though).
The thing is, these are really minor problems and as long as the compiler has an optimization flag on, should cause no inefficiency. A different case I'd consider would be an embedded system - then again, how much damage could a single, 8-bit variable cause? (I have absolutely no knowledge concerning embedded systems, so maybe someone else could elaborate on this?)

The AwesomeResult version can be faster if SomeBoolReturningFunc() is fairly slow and you are able to use AwesomeResult more than once rather than calling SomeBoolReturningFunc() again.

Placing the function inside or outside the if-statement doesn't matter. There is no performance gain or loss. This is because the compiler will automatically create a place on the stack for the return value - whether or not you've explicitly defined a variable.

In the performance tuning I've done, this would only be thought about in the final stage of cycle-shaving after cleaning out a series of significant performance issues.

I seem to recall that one statement per line was the recommendation of the book Code Complete, where it was argued that such code is easier to understand. Make each statement do one and only one thing so that it is very easy to see very quickly at a glance what is happening in the code.
I personally like having the return types in variables to make them easier to inspect (or even change) in the debugger.
One answer stated that a difference was observed in generated code. I sincerely doubt that with an optimizing compiler.
A disadvantage, prior to C++11, for multiple lines is that you have to know the type of the return for the variable declaration. For example, if the return is changed from bool to int then depending on the types involved you could have a truncated value in the local variable (which could make the if malfunction). If compiling with C++11 enabled, this can be dealt with by using the auto keyword, as in:
auto AwesomeResult = SomeBoolReturningFunc() ;
if ( AwesomeResult )
{
//do some other, more important stuff
}
Stroustrup's C++ 4th edition recommends putting the variable declaration in the if statement itself, as in:
if ( auto AwesomeResult = SomeBoolReturningFunc() )
{
//do some other, more important stuff
}
His argument is that this limits the scope of the variable to the greatest extent possible. Whether or not that is more readable (or debuggable) is a judgement call.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js