I have a pointer int* p, and do some operations in a loop. I do not modify the memory, just read. If I add const to the pointer (both cases, const int* p, and int* const p), can it help a compiler to optimize the code?
I know other merits of const, like safety or self-documentation, I ask about this particular case. Rephrasing the question: can const give the compiler any useful (for optimization) information, ever?
While this is obviously specific to the implementation, it is hard to see how changing a pointer from int* to int const* could ever provide any additional information that the compiler would not otherwise have known.
In both cases, the value pointed to can change during the execution of the loop.
Therefore it probably will not help the compiler optimize the code.
No. Using const like that will not provide the compiler with any information that can be used for optimization.
Theoretically, for it to be a help to your compiler, the optimizer must be able to prove that nobody will ever use const_cast on your const pointer yet be unable to otherwise prove that the variable is never written to. Such a situation is highly unlikely.
Herb Sutter covers this is more depth in one of his Guru of the Week columns.
It can help or it can make no difference or it can make it worse. The only way to know is to try both and inspect the emitted machine code.
Modern compilers are very smart so they can often deduce that memory is unchanged without any qualifiers (pr they can deduce many other optimizations are possible without code being written in manner easier to analyze) yet they are rather complex and so have a lot of deficiencies and often can't optimize every possible thing at every opportunity.
I think the compiler can't do much in your scenario. The fact that your pointer declared as const int * const p doesn't guarantee that the memory can't be changed externally, e.g. by another thread. Therefore the compiler must generate code that reads the memory value on each iteration of your loop.
But if you are not going to write to the memory location and you know that no other piece of code will, then you can create a local variable and use it similar to this:
const int * p = ...
...
int val = *p;
/* use the value in a loop */
for (i = 0; i < BAZILLION; i++)
{
use_value(val);
}
Not only you help potential readers of your code to see that the val is not changed in a loop, but you also give the compiler a possibility to optimize (load val in a register, for instance).
Using const is, as everyone else has said, unlikely to help the compiler optimize your loop.
It may, however, help optimise code outside the loop, or at the site of a call to a const-qualified method, or to a function taking const arguments.
This is likely to depend on whether the compiler can prove it's allowed to eliminate redundant loads, move them around, or cache calculated values rather than re-calculating them.
The only way to prove this is still to profile and/or check the assembly, but that's where you should probably be looking.
You don't say which compiler you are using. But if you are both reading and writing to memory you could benefit from using "restrict" or similar. The compiler does not know if your pointers are aliasing the same memory so any store often forces loading other values again. "restrict" tells the compiler that no aliasing of the pointer is happening and can keep using values loaded before a subsequent write. Another way to avoid the aliasing issue is to load your values into local variables then the compiler is not forced to reload after a write.
Related
I wrote this basic code for a DSP/audio application I'm making:
double input = 0.0;
for (int i = 0; i < nChannels; i++) {
input = inputs[i];
and some DSP engineering expert tell to me: "you should not declare it outside the loop, otherwise it create a dependency and the compiler can't deal with it as efficiently as possible."
He's talking about var input I think. Why this? Isn't better decleare once and overwrite it?
Maybe somethings to do with different memory location used? i.e. register instead of stack?
Good old K&R C compilers in the early eighties used to produce code as near as possible what the programmer wrote, and programmers used to do their best to produce optimized source code. Modern optimizing compilers can rework things provided the resulting code has same observable effects as the original code. So here, assuming the input variable is not used outside the loop, an optimizing compiler could optimize out the line double input = 0.0; because there are no observable effects until next assignation : input = inputs[i];. And it could the same factor the variable assignation outside the loop (whether in source C++ file it is inside or not) for the same reason.
Short story, unless you want to produce code for one specific compiler with one specific parameters set, and in that case you should thoroughly examine the generated assembly code, you should never worry for those low level optimizations. Some people say compiler is smarter than you, other say compiler will produce its own code whatever way I wrote mine.
What matters is just readability and variable scoping. Here input is functionaly local to the loop, so it should be declared inside the loop. Full stop. Any other optimization consideration is just useless, unless you do have special requirements for low level optimization (profiling showing that these lines require special processing).
Many people think that declaring a variable allocates some memory for you to use. It does not work like that. It does not allocate a register either.
It only creates for you a name (and an associated type) that you can use to link consumers of values with their producers.
On a 50 year old compiler (or one written by students in their 3rd year Compiler Construction course), that may be implemented by indeed allocating some memory for the variable on the stack, and using that every time the variable is referenced. It's simple, it works, and it's horribly inefficient. A good step up is putting local variables in registers when possible, but that uses registers inefficiently and it's not where we're at currently (have been for some time).
Linking consumers with producers creates a data flow graph. In most modern compilers, it's the edges in that graph that receive registers. This is completely removed from any variables as you declared them. They no longer exist. You can see this in action if you use -emit-llvm in clang.
So variables aren't real, they're just labels. Use them as you want.
It is better to declare variable inside the loop, but the reason is wrong.
There is a rule of thumb: declare variables in the smallest scope possible. Your code is more readable and less error prone this way.
As for performance question, it doesn't matter at all for any modern compiler where exactly you declare your variables. For example, clang eliminates variable entirely at -O1 from its own IR: https://godbolt.org/g/yjs4dA
One corner case, however: if you ever takes an address of input, variable can't be eliminated (easily), and you should declare it inside the loop, if you care about performance.
Arrow dereferencing p->m is syntactic sugar for (*p).m, which appears like it might involve two separate memory lookup operations--one to find the object on the heap and the second to then locate the member field offset.
This made me question whether there is any performance difference between these two code snippets. Assume classA has 30+ disparate fields of various types which need to be accessed in various orders (not necessarily consecutively or contiguously):
Version 1:
void func(classA* ptr)
{
std::string s = ptr->field1;
int i = ptr->field2;
float f = ptr->field3;
// etc...
}
Version 2:
void func(classA* ptr)
{
classA &a = *ptr;
std::string s = a.field1;
int i = a.field2;
float f = a.field3;
// etc...
}
So my question is whether or not there is a difference in performance (even if very slight) between these two versions, or if the compiler is smart enough to make them equivalent (even if the different field accesses are interrupted by many lines of other code in between them, which I did not show here).
Arrow dereferencing p->m is syntactic sugar for (*p).m
That isn't generally true, but is true in the limited context in which you are asking.
which appears like it might involve two separate memory lookup
operations--one to find the object on the heap and the second to then
locate the member field offset.
Not at all. It is one to read the parameter or local variable holding the pointer and the second to access the member. But any reasonable optimizer would keep the pointer in a register in the code you showed, so no extra access.
But your alternate version also has a local pointer, so no difference anyway (at least in the direction you're asking about):
classA &a = *ptr;
Assuming the whole function is not being inlined or assuming for some other reason the compiler doesn't know exactly where ptr points, the & must use a pointer, so either the compiler can deduce it is safe for a to be an alias of *ptr so there is NO difference, or the compiler must make a an alias of *copy_of_ptr so the version using a & is slower (not faster as you seem to have expected) by the cost of copying ptr.
even if the different field accesses are interrupted by many lines of
other code in between them, which I did not show here
That moves you toward the interesting case. If that intervening code could change ptr then obviously the two versions behave differently. But what if a human can see that the intervening code can't change ptr while a compiler can't see that: Then the two versions are semantically equal, but the compiler doesn't know that and the compiler may generate slower code for the version you tried to hand optimize by creation of a reference.
Most (?all) compilers implement references as pointers under the hood, so I would expect no difference in the generated assembly (apart from a possible copy to initialize a reference - but I would expect the optimizer to eliminate even that).
In general, this sort of micro-optimization is not worth it. It is always preferable to concentrate on clear and correct code. It is certainly not worth this sort of optimization until you have measured where the bottleneck is.
Out of habit, when accessing values via . or ->, I assign them to variables anytime the value is going to be used more than once. My understanding is that in scripting languages like actionscript, this is pretty important. However, in C/C++, I'm wondering if this is a meaningless chore; am I wasting effort that the compiler is going to handle for me, or am I exercising a good practice, and why?
public struct Foo
{
public:
Foo(int val){m_intVal = val;)
int GetInt(){return m_intVal;}
int m_intVal; // not private for sake of last example
};
public void Bar()
{
Foo* foo = GetFooFromSomewhere();
SomeFuncUsingIntValA(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValB(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValC(foo->GetInt()); // accessing via dereference then function
// Is this better?
int val = foo->GetInt();
SomeFuncUsingIntValA(val);
SomeFuncUsingIntValB(val);
SomeFuncUsingIntValC(val);
///////////////////////////////////////////////
// And likewise with . operator
Foo fooDot(5);
SomeFuncUsingIntValA(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValB(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValC(fooDot.GetInt()); // accessing via function
// Is this better?
int valDot = foo.GetInt();
SomeFuncUsingIntValA(valDot);
SomeFuncUsingIntValB(valDot);
SomeFuncUsingIntValC(valDot);
///////////////////////////////////////////////
// And lastly, a dot operator to a member, not a function
SomeFuncUsingIntValA(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValB(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValC(fooDot.m_intVal); // accessing via member
// Is this better?
int valAsMember = foo.m_intVal;
SomeFuncUsingIntValA(valAsMember);
SomeFuncUsingIntValB(valAsMember);
SomeFuncUsingIntValC(valAsMember);
}
Ok so I try to go for an answer here.
Short version: you definitely don’t need to to this.
Long version: you might need to do this.
So here it goes: in interpreted programs like Javascript theese kind of things might have a noticeable impact. In compiled programs, like C++, not so much to the point of not at all.
Most of the times you don’t need to worry with these things because an immense amount of resources have been pulled into compiler optimization algorithms (and actual implementations) that the compiler will correctly decide what to do: allocate an extra register and save the result in order to reuse it or recompute every time and save that register space, etc.
There are instances where the compiler can’t do this. That is when it can’t prove multiple calls produce the same result. Then it has no choice but to make all the calls.
Now let’s assume that the compiler makes the wrong choice and you as a precaution make the effort of micro–optimizations. You make the optimization and you squish a 10% performance increase (which is already an overly overly optimistic figure for this kind of optimization) on that portion of code. But what do you know, your code spends only 1% of his time in that portion of code. The rest of the time is most likely spend in some hot loops and waiting for data fetch. So you spend a non-negligible amount of effort to optimize yourself the code only to get a 0.1% performance increase in total time, which won’t even be observable due to the external factors that vary the execution time by way more than that amount.
So don’t spend time with micro-optimizations in C++.
However there are cases where you might need to do this and even crazier things. But this is only after properly profiling your code and this is another discussion.
So worry about readability, don’t worry about micro–optimizations.
The question is not really related to -> and . operators, but rather about repetitive expressions in general. Yes, it is true that most modern compilers are smart enough to optimize the code that evaluates the same expression repeatedly (assuming it has no observable side-effects).
However, using an explicit intermediate variable typically makes the program much more readable, since it explicitly exposes the fact that the same value is supposed to be used in all contexts. It exposes the fact the it was your intent to use the same value in all contexts.
If you repeat using the same expression to generate that value again and again, this fact becomes much less obvious. Firstly, it is difficult to say at the first sight whether the expressions are really identical (especially when they are long). Secondly, it is not obvious whether sequential evaluations of the seemingly the same expression produce identical results.
Finally, slicing long expressions into smaller ones by using intermediate variables can significantly simply debugging the code in step-by-step debugger, since it give the user much greater degree of control through "step in" and "step over" commands.
It's for sure better in terms of readability and maintainability to have such temporary variable.
In terms of performance, you shouldn't worry about such micro-optimization at this stage (premature optimization). Moreover, modern C++ compilers can optimize it anyway, so you really shouldn't worry about it.
Assume that your first objective is execution speed, then code cleanliness and finally usage of resources.
If at a certain point of an algorithm a variable (for instance a double) is not going to change any more (but you are still going to read it many times), would you copy it into a constant value?
If you want to make your code clearer, by all means, copy your values into a const double const_value = calculated_value;, but compilers are very good at tracking dependencies, and it's highly unlikely (assuming you are using a modern, reasonably competent compiler) that the code will be any faster or otherwise "better" because you do this. There is a small chance that the compiler takes your word for the fact that you want a second variable, and thus makes a copy, and makes the code slower because of that.
As always, if performance is important to your application, make a "before & after" comparative benchmark for your particular code, as reading a page on the internet or asking on SO is not the same as benchmarking your code.
Just copying non-constant variable into a constant one does not make the code cleaner because instead of one variable you have two. Much more interesting would be to move non-constant one out-of-scope. This way, we have only constant version of the variable visible and compiler prevents us from changing its value by mistake.
Herb Sutter describes how to do this using C++11 lambdas: Complex initialization for a const variable.
const int i = [&]{
int i = some_default_value;
if(someConditionIstrue)
{
Do some operations and calculate the value of i;
i = some calculated value;
}
return i;
} ();
(I don't explain execution speed objective since it is already done by Mats Petersson).
For better code readability, you can create a const reference to the variable at the point where it isn't changed anymore, and use the const reference from that point on.
double value_init;
// Some code that generates value_init...
const double& value = value_init;
// Use value from now on in your algorithm
Say you see a loop like this one:
for(int i=0;
i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
++i)
{
thing.getData().insert(
thing.GetData().Count(),
thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
);
}
if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile.
This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following:
Is having a messy condition in the for loop going to get evaluated each time, or only once?
If those get methods are simply returning references to member variables on the objects, will they be inlined away?
Would you expect custom [] operators to get optimized at all?
In other words is it worth the time (in performance only, not readability) to convert it to something like:
ElementContainer &source =
thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
destination.insert(thing.GetData().Count(), source[i].getName());
}
Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?
Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.
The general way to answer such questions is to looked at the produced assembly. With gcc, this involve replacing the -c flag with -S.
My own rule is not to fight the compiler. If something is to be inlined, then I make sure that the compiler has all the information needed to perform such an inline, and (possibly) I try to urge him to do so with an explicit inline keyword.
Also, inlining saves a few opcodes but makes the code grow, which, as far as L1 cache is concerned, can be very bad for performance.
All the questions you are asking are compiler-specific, so the only sensible answer is "it depends". If it is important to you, you should (as always) look at the code the compiler is emitting and do some timing experiments. Make sure your code is compiled with all optimisations turned on - this can make a big difference for things like operator[](), which is often implemented as an inline function, but which won't be inlined (in GCC at least) unless you turn on optimisation.
If the loop is that critical, I can only suggest that you look at the code generated. If the compiler is allowed to aggressively optimise the calls away then perhaps it will not be an issue. Sorry to say this but modern compilers can optimise incredibly well and the I really would suggest profiling to find the best solution in your particular case.
If the methods are small and can and will be inlined, then the compiler may do the same optimizations that you have done. So, look at the generated code and compare.
Edit: It is also important to mark const methods as const, e.g. in your example count() and getName() should be const to let the compiler know that these methods do not alter the contents of the given object.
As a rule, you should not have all that garbage in your "for condition" unless the result is going to be changing during your loop execution.
Use another variable set outside the loop. This will eliminate the WTF when reading the code, it will not negatively impact performance, and it will sidestep the question of how well the functions get optimized. If those calls are not optimized this will also result in performance increase.
I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.
This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.
For what it's worth, if speed really mattered, I'd split it up in Java too.