I wrote this basic code for a DSP/audio application I'm making:
double input = 0.0;
for (int i = 0; i < nChannels; i++) {
input = inputs[i];
and some DSP engineering expert tell to me: "you should not declare it outside the loop, otherwise it create a dependency and the compiler can't deal with it as efficiently as possible."
He's talking about var input I think. Why this? Isn't better decleare once and overwrite it?
Maybe somethings to do with different memory location used? i.e. register instead of stack?
Good old K&R C compilers in the early eighties used to produce code as near as possible what the programmer wrote, and programmers used to do their best to produce optimized source code. Modern optimizing compilers can rework things provided the resulting code has same observable effects as the original code. So here, assuming the input variable is not used outside the loop, an optimizing compiler could optimize out the line double input = 0.0; because there are no observable effects until next assignation : input = inputs[i];. And it could the same factor the variable assignation outside the loop (whether in source C++ file it is inside or not) for the same reason.
Short story, unless you want to produce code for one specific compiler with one specific parameters set, and in that case you should thoroughly examine the generated assembly code, you should never worry for those low level optimizations. Some people say compiler is smarter than you, other say compiler will produce its own code whatever way I wrote mine.
What matters is just readability and variable scoping. Here input is functionaly local to the loop, so it should be declared inside the loop. Full stop. Any other optimization consideration is just useless, unless you do have special requirements for low level optimization (profiling showing that these lines require special processing).
Many people think that declaring a variable allocates some memory for you to use. It does not work like that. It does not allocate a register either.
It only creates for you a name (and an associated type) that you can use to link consumers of values with their producers.
On a 50 year old compiler (or one written by students in their 3rd year Compiler Construction course), that may be implemented by indeed allocating some memory for the variable on the stack, and using that every time the variable is referenced. It's simple, it works, and it's horribly inefficient. A good step up is putting local variables in registers when possible, but that uses registers inefficiently and it's not where we're at currently (have been for some time).
Linking consumers with producers creates a data flow graph. In most modern compilers, it's the edges in that graph that receive registers. This is completely removed from any variables as you declared them. They no longer exist. You can see this in action if you use -emit-llvm in clang.
So variables aren't real, they're just labels. Use them as you want.
It is better to declare variable inside the loop, but the reason is wrong.
There is a rule of thumb: declare variables in the smallest scope possible. Your code is more readable and less error prone this way.
As for performance question, it doesn't matter at all for any modern compiler where exactly you declare your variables. For example, clang eliminates variable entirely at -O1 from its own IR: https://godbolt.org/g/yjs4dA
One corner case, however: if you ever takes an address of input, variable can't be eliminated (easily), and you should declare it inside the loop, if you care about performance.
Related
Ive been reading a excellent book written by Bjarne Stroustrup and he recommends that you declare variables as late as possible, preferable just before you use it, however it fails to mention any benefits over declaring the variables late than at the start of the function body.
So what is the benefit of declaring variable late like this:
int main()
{
/* some
code
here
*/
int MyVariable1;
int MyVariable2;
std::cin >> MyVariable1 >> MyVariable2;
return(0);
}
instead of at the start of a function body like this:
int main()
{
int MyVariable1;
int MyVariable2;
/* some
code
here
*/
std::cin >> MyVariable1 >> MyVariable2;
return (0);
}
It makes the code easier to follow. In general, you declare variables when you need them, e.g. near a loop when you want to find a minimum of something via that loop. In this way, when someone reads your code, (s)he doesn't have to try to decipher what 25 variables mean at the start of a function, but the variables will "explain" themselves when going through the code. After all, it's not important to know what variables mean, but to understand what the code does.
Remember that most of the time you use that local variable in a very small portion of your code, so it makes sense to define it in that small portion where you need it.
A few points that comes to mind
Not all objects are default - constructible , so many times declaring the object in the beginning of the function is not an option, only on assignment (aka auto myObj = creationalfunction();)
your function gets smaller number of lines, hence more readable. declaring each variable in the beginning of the function really makes it a few lines bigger, throughout the code.
if your function throws - it's not economical to build a list of objects, just to destroy them on stack-unwinding
declaring variables in the same line they are assigned can let you use auto, which makes the code times more flexible.
it's the common convention for C++ these days, and that is pretty important.
create an object + assign it later on might be more slow than directly initialize an object with values.
If "other code" is a page of code then you can't actually see the declaration on the screen when you read the values. If you thought that you were reading two doubles, you can't see on the screen that you are wrong. If you declare the variable on one line and use it on the next, any mistake would be obvious.
Suppose, that you deal with some objects and construction of these objects is an expensive operation. In such situation there are a few reasons why it is better to define variables just before their usage:
1) First of all, it is sometimes faster to create an object using appropriate constructor instead of default-constructing and assignment. So this:
T obj(/* some arguments here */);
may be faster then this:
T obj;
/* some code here*/
obj = T(/* some arguments here */);
Note that in the first example only a single constructor is invoked. But in the second example default constructor and assignment operator are invoked.
2) If an exception is thrown somewhere between object definition and its first usage you just do unnecessary work creating and destroying your object without any usage at all. The same is applicable when function returns between object definition and its first usage.
3) Yes, readability is also worth to mention here :)
When starting to get good at programming you will usually end up holding the entire program in your head at the same time. Later, you will learn how to reduce this to one function in your head.
Both of these limit how large/complex a program or function you can work with. You can help this problem by simplifying what is going on so you no longer have to think about it: reduce your working memory needs. Also you can trade one kind of complexity for another; fsncy variable value dancing for some complex higher level algorithm, or for certainty of code correctness.
There are many ways to do this. You can work with chunkable patterns, and think in those patterns instead of in lower level primitives (this is basically what you did when you graduated from whole program state to single function state). You can also do this by making your state simpler.
Every variable carries state. It modifies what that line of code means, and what every previous line of code means up to the point of its declaration. A variable that exists on a line could be modified by the line or read by the line. To understand what the reading of a variable means, you have to audit every line between its declaration and its use for the possibility it is edited.
Now, this may not happen: but checking it both takes time and working memory. If you have 10 variables, having to remember which of them where modified "above" and which not and what their values mean can burn a lot of headspace.
On the other hand, a variable created, used, and either falling out of scope or never used again is not going to cause this cognitive load. You do not have to check for hidden state or meaning. What more, you are not tempted -- indeed not able -- to use it prior to that line. You are definitely not going to overwrite important state that later code relies on when you set it, and you are not going to have it modified to something surprising between initialization and use.
In short, reduce the "state space" of the lines of code you use it, and even don't use it in.
Sometimes this is difficult to achieve, and sometimes impractical or impossible. But quite often it is easy, improves code quality, makes it easier to read or understand. The most important audience of code is humans, there is a reason we don't check in the object file output of a compiler (or some intermediate representation).
Suc "low state" code is also way easier to modify after the fact. In the limit, it becomes pure functional code.
First off, let me get of my chest the fact that I'm a greenhorn trying to do things the right way which means I get into a contradiction about what is the right way every now and then.
I am modifying a driver for a peripheral which contains a function - lets call it Send(). In the function I have a timestamp variable so the function loops for a specified amount of time.
So, should I declare the variable global (that way it is always in memory and no time is lost for declaring it each time the function runs) or do I leave the variable local to the function context (and avoid a bad design pattern with global variables)?
Please bear in mind that the function can be called multiple times per milisecond.
Speed of execution shouldn't be significantly different for a local vs. a global variable. The only real difference is where the variable lives. Local variables are allocated on the stack, global variables are in a different memory segment. It is true that local variables are allocated every time you enter a routine, but allocating memory is a single instruction to move the stack pointer.
There are much more important considerations when deciding if a variable should be global or local.
When implementing a driver, try to avoid global variables as much as possible, because:
They are thread-unsafe, and you have no idea about the scheduling scheme of the user application (in fact, even without threads, using multiple instances of the same driver is a potential problem).
It automatically yields the creation of data-section as part of the executable image of any application that links to your driver (which is something that the application programmer might want to avoid).
Did you profile a fully-optimized, release build of your code and identify the bottleneck to be small allocations in this function?
The change you are proposing is a micro-optimization; a change to a small part of your code with the intent to make it more efficient. If the question to the above question is "no" as I'd expect, you shouldn't even be thinking of such things.
Select the correct algorithm for your code. Write your code using idiomatic techniques. Do not write in micro-optimizations. You might be surprised how good your compiler is at optimizing your code for you. It will often be able to optimize away these small allocations, but even if it can't you still don't know if the performance penalty imposed by them is even noticeable or significant.
For drivers, with is usually position independent, global variables are accessed indirectly with GOT table unless IP-relative operations is available (i.e. x86_64, ARM, etc)
In case of GOT, you can think it as an extra indirect pointer.
However, even with an extra pointer it won't make any observable difference if it's "only" called in mill-second frequency.
It baffles me why C++ compilers do not initialise every integer declaration to 0, be it local or global or members? Why do uninitialised sections exists in the memory model?
I know it's a dull answer, but your question begs for it exactly:
Because the C++ standard says so.
Why does it say so? Because C++ is built on a principle:
Don't pay for what you don't use.
Setting memory to a certain value costs CPU time and memory bandwidth. If you want to do it, do it explicitly. A variable declaration should not incur this cost.
C++ is based on C, and in C a primary design concern was code efficiency. In most cases you want to initialize a new variable to a specific value after declaring it. When the compiler would write 0 to that memory address just to write another value to it shortly afterwards, it would be a waste of a CPU cycle.
Sure, a smart compiler could detect that a variable isn't read before it gets a value assigned and could optimize the initialization to 0 away. But when C was developed, compilers weren't that smart yet.
The C language and its standard library generally follow the principle that it doesn't do stuff automatically when it might be unnecessary to do it under some circumstances.
It might make life easier for you if it did, but C++ errs on the side of avoiding overheads, e.g. setting values which you might then reset to something else.
I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now
The compiler will probably equalize them. If it has any brains at all, it will copy values.a, values.b, and values.c into local variables or registers, which is also what happens in the simple case.
The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.
This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)
No, your cpu would cache the variables you use over and over again.
I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
int a = objA.a;
int b = objA.b;
int c = objA.c;
I think it would be almost the same performance
If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.
you can also start your function with
int & a = values.a;
int & b = values.b;
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.
As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the this pointer) and some offset (the location in the function's stack frame, or the location inside the class).
Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the this pointer might not. If this is the case, accessing this (which is presumably on the stack) would require an extra memory lookup.
Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.
If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.
Say you see a loop like this one:
for(int i=0;
i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
++i)
{
thing.getData().insert(
thing.GetData().Count(),
thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
);
}
if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile.
This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following:
Is having a messy condition in the for loop going to get evaluated each time, or only once?
If those get methods are simply returning references to member variables on the objects, will they be inlined away?
Would you expect custom [] operators to get optimized at all?
In other words is it worth the time (in performance only, not readability) to convert it to something like:
ElementContainer &source =
thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
destination.insert(thing.GetData().Count(), source[i].getName());
}
Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?
Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.
The general way to answer such questions is to looked at the produced assembly. With gcc, this involve replacing the -c flag with -S.
My own rule is not to fight the compiler. If something is to be inlined, then I make sure that the compiler has all the information needed to perform such an inline, and (possibly) I try to urge him to do so with an explicit inline keyword.
Also, inlining saves a few opcodes but makes the code grow, which, as far as L1 cache is concerned, can be very bad for performance.
All the questions you are asking are compiler-specific, so the only sensible answer is "it depends". If it is important to you, you should (as always) look at the code the compiler is emitting and do some timing experiments. Make sure your code is compiled with all optimisations turned on - this can make a big difference for things like operator[](), which is often implemented as an inline function, but which won't be inlined (in GCC at least) unless you turn on optimisation.
If the loop is that critical, I can only suggest that you look at the code generated. If the compiler is allowed to aggressively optimise the calls away then perhaps it will not be an issue. Sorry to say this but modern compilers can optimise incredibly well and the I really would suggest profiling to find the best solution in your particular case.
If the methods are small and can and will be inlined, then the compiler may do the same optimizations that you have done. So, look at the generated code and compare.
Edit: It is also important to mark const methods as const, e.g. in your example count() and getName() should be const to let the compiler know that these methods do not alter the contents of the given object.
As a rule, you should not have all that garbage in your "for condition" unless the result is going to be changing during your loop execution.
Use another variable set outside the loop. This will eliminate the WTF when reading the code, it will not negatively impact performance, and it will sidestep the question of how well the functions get optimized. If those calls are not optimized this will also result in performance increase.
I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.
This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.
For what it's worth, if speed really mattered, I'd split it up in Java too.