I'm discussing with some colleagues about the efficiency of If statements and which is the best in cost of memory and CPU use, at this stage, doesn't matter the language used.
The two conditionals is the following:
If value is present then
skip
If value = "1234" then
execute
So, the first controlls if the value is null, in that case exit (skip) the routine, the second statement compare the variable to a specific value.
What I'm thinking is that the first uses more CPU and the second more Ram, what do you think about it?
Do I have to use both so that if the value is null the second statement is skipped? Or is better to use only the second who compare two values?
Thank you
Can you elaborate why second uses more ram? "1234" will be placed in memory only once as a const value. Code which do comparison is also compiled and generated only once. In fact second If might be more CPU consuming by comparing strings, but I don't think you can do much with this. So not really sure how you get to your conclusions. Am I missing something?
Related
I'm wondering which is more efficient.
Let's say I have this code:
while (i<10000){
if (cond1)
doSomething1();
else if (cond2)
doSomething2();
else
doSomething3();
}
Now cond1 and cond2 are "constant" conditions, which means that if it happens once for i it will happen to all of them, and just one of them can be true.
Most of the time, the last doSomething3() is executed.
Now what happens if I write something like this:
if (cond1){
while (i<10000)
doSomething1();
}
else if (cond2){
while (i<10000)
doSomething2();
}
else{
while (i<10000)
doSomething3();
}
Is it more efficient because I'm checking cond1 and cond2 just once?
What you need to be asking is which is more cache-efficient.
In many cases, the compiler can figure this out for you and will rearrange your code to get the best results. Its ability to do so will depend largely on what doSomethingN does and what condN are. If condN can change over time, then it may well need to be physically checked each time, in which case your code cannot be substantially re-arranged and checking it on every loop iteration may indeed be far slower. But, if condN is constant, the repeated checks can be optimised away, essentially resulting in your first code being converted into the second.
Ultimately, the only way to be sure is to measure it (and study the resulting assembly, if you can understand it).
On a first look, the second one looks like it is more efficient, after all one check is obviously more efficient than 10000 checks, well, more efficient performance wise, slightly more code is the price it comes at.
But then again, the performance overhead of the first one may very well be negligible, you should really benchmark the two, because you know, you don't have a performance problem until you prove you do.
At any rate, any production grade compiler will likely be able to optimize things for you in that fairly static and easy to analyze context.
It could be another scenario, where there are a lot more options, which might not be able to predict during compile time. For example you can have a hash map of conditions and function pointers, so you lookup the function pointer for the condition and use it to invoke the functionality. Naturally, this will be far less efficient, because it will use dynamic dispatch, which also means the calls cannot be inlined, but it is the price you pay for having it more flexible, for example using this approach you can register different functions and conditions during the runtime, you can change functionality for a given condition and whatnot.
For example consider a scenario where you need to perform 10000 actions, but each actions should depend on the result from the previous one.
while (i < 10000) cond = map[cond]();
I think you've answered your own question. Yes, it's more efficient if you write more efficient code.
EDIT #NeilKirk has a point. Since your compiler knows what it's doing, manually checking outside is only at least as efficient, not necessarily more efficient, if the compiler can detect that the condition won't change during your loop.
Why does std::atomic_compare_exchange and all its brothers and sisters update the passed expected value?
I am wondering if the are any reasons besides the given simplicity in loops, e.g.: is there an intrinsic function which can do that in one operation to improve performance?
The processor has to load the current value, in order to do the "compare" part of the operation. When the comparison fails the caller needs to know the new value, to retry the compare-exchange (you almost always use it in a loop), so if it wasn't returned (e.g. by modifying the expected value that is passed by reference) then the caller would need to do another atomic load to get the new value. That's wasteful, because the processor has already loaded the value. You should only be messing about with low-level atomic operations when extreme performance is the only option, so in that case you do not want to perform two operations when one will do.
is there an intrinsic function which can do that in one operation to improve performance
That can do what, specifically? The instruction has to load the current value to do the comparison, so on a mismatch yielding the current value costs nothing and is pretty much guaranteed to be useful.
How much time does saving a value cost me processor-vise? Say i have a calculated value x that i will use 2 times, 5 times, or 20 times.At what point does it get more optimal to save the value calculated instead of recalculating it each time i use it?
example:
int a=0,b=-5;
for(int i=0;i<k;++i)
a+=abs(b);
or
int a=0,b=-5;
int x=abs(b);
for(int i=0;i<k;++i)
a+=x;
At what k value does the second scenario produce better results? Also, how much is this RAM dependent?
Since the value of abs(b) doesn't change inside the for loop, a compiler will most likely optimize both snippets to the same result i.e. evaluating the value of abs(b) just once.
It is almost impossible to provide an answer other than measure in a real scenario. When you cache the data in the code, it may be stored in a register (in the code you provide it will most probably be), or it might be flushed to L1 cache, or L2 cache... depending on what the loop is doing (how much data is it using?). If the value is cached in a register the cost is 0, the farther it is pushed the higher the cost it will take to retrieve the value.
In general, write code that is easy to read and maintain, then measure the performance of the application, and if that is not good, profile. Find the hotspots, find why they are hotspots and then work from there on. I doubt that caching vs. calculating abs(x) for something as above would ever be a hotspot in a real application. So don't sweat it.
I would suggest (this is without testing mind you) that the example with
int x=abs(b)
outside the loop will be faster simply because you're avoiding allocating a stack frame each iteration in order to call abs().
That being said, if the compiler is smart enough, it may figure out what you're doing and produce the same (or similar) instructions for both.
As a rule of thumb it doesn't cost you much, if anything, to store that value outside the loop, since the compiler is most likely going to store the result of abs(x) into a register anyways. In fact, when the compiler optimizes this code (assuming you have optimizations turned on), one of the first things it will do is pull that abs(x) out of the loop.
You can further help the compiler generate good code by qualifying your declaration of "x" with the "register" hint. This will ask the compiler to store x into a register value if possible.
If you want to see what the compiler actually does with your code, one thing to do is to tell it to compile but not assemble (in gcc, the option is -S) and look at the resulting assembly code. In many cases, the compiler will generate better code than you can optimize by hand. However, there's also no reason to NOT do these easy optimizations yourself.
Addendum:
Compiling the above code with optimizations turned on in GCC will result in code equivalent to:
a = abs(b) * k;
Try it and see.
For many cases it produces better perf from k=2. The example you gave is . not one. Most compilers try to perform this kind of hoisting when even low levels of optimization are enabled. The value is stored, at worst, on the local stack and so is likely to stay fairly cache warm, negating your memory concerns.
But potentially it will be held in a register.
The original has to perform an adittional branch, repeat the calculations and return the value. Abs is one example of a function the compiler may be able to recognize as a constexpr and hoist.
While developing your own classes, this is one of the reason you should try to mark members and references as construe whenever possible.
What is the most efficient way to code "print all the elements of a vector to standard out" in C++,
for(std::vector<int>::iterator it = intVect.begin(); it != intVect.end(); ++i)
std::cout << *it;
or
std::copy(intVect.begin(), intVect.end(), std::ostream_iterator<int>(std::cout));
and why?
You can use
http://louisdx.github.com/cxx-prettyprint/
and rely on the work of other people that made sure it will be most optimal.
If you are asking which of methods you've posted will be faster, the only valid answer can be:
There is no way to know for sure because they are equivalent. You must profile them both and see
for yourself.
This is because the two methods are effectively the same. The do the same thing, but they use different mechanisms to do it. By the time your compiler's optimizer has finished with the code, it may have found different opportunities to increase execution speed, or it may have found opportunities in each that result in identical machine code being executed.
For example, consider:
for(std::vector<int>::iterator it = intVect.begin(); it != intVect.end(); ++i)
At first blush, it might seem like this could have a built-in inefficiency by the fact that intVect.end() is evaluated at each loop. This would make this method slower than,
std::copy(intVect.begin(), intVect.end(), std::ostream_iterator<int>(std::cout));
...where it is only evaluated once.
However, depending on the surrounding code and your compiler's settings, it might be re-written so that it is only evaluated once, at the beginning of the for. (Credit: #SteveJessop) Or it might even be that it isn't hoisted, but evaluating it is no different from examining a pre-computed value. It's possible that either way, the emitted code must load a pointer value from (stack pointer) + (small offset known at compile time). The only way to know for sure is to compile them both and examine the resulting assembly code.
Beyond all of this however is a more fundamental issue. You are asking which method of doing something is faster, when the core thing you're trying to do is potentially very slow to begin with, relative to the means by which you do it. If you are writing to stdout using streams, it is going to have negligible effect on the overall execution time whether you use a for loop or std::copy even if one is marginally faster than the other. If your concern is overall execution time, you're possibly barking up the wrong tree.
These two lines will end up doing essentially the same thing (almost definitely) once the compiler gets through with them. Either way you will end up with the same code looping through using iterators in range of {begin, end-1} using the same streams.
This is a micro-optimization that will not help you significantly, though I'm sure you can compile it with a big data set and see for yourself easily on your platform.
Here is a fragment getting data from a buffered source and sending it along to be processed. If the queue is empty, get() returns a null, and the process method is happy to take a null and do nothing.
What is the most optimum way to code this?
something a; // any legal C++ return type...
aQueueOfSomethings g;
while (true) {
a=g.get();
process(a);
}
There is no way to predict the values arriving via get(), they are what they are, and they need to be dequeued and passed on to process() as quickly as possible.
I don't see a lot of wasted effort here- if I skip the explicit local variable named 'a' and make the loop a one liner:
process(g.get());
the implicit return value of g.get() will still have space allocated, might involve a constructor call, etc, etc.
If the thing returned has any size or complexity, it would be better to have a pointer to it rather than a copy of it, and pass that pointer rather than a copy by value... So I'd prefer to have
something *a;
g.get(a);
process(a);
rather than
something a;
a=g.get();
process(a);
I wrote a test case in c++ trying the two line and one line versions, loop 100,000,000 times.
If the a is an object with 4 integer and 2 floating point numbers, and the process() method touches them all, the two line solution is actually faster! If the a object is a single int, the one-line version is faster. If the object is complex but the process() method just touches one value, the one-line version is faster.
Most interesting to me, using g++ compiler, Mac OS X 10.5.8, the -O first level optimization switch results in identical, much faster, operation, with both 1 and 2 line versions.
Other than letting the compiler optimize, a single line for both methods and no explicit intermediate variable, and pass by reference so avoiding making copies, is there anything that would generally make it run faster? I feel like I'm missing something obvious.
I think this is a supreme case of useless optimization
(you are taking something that buffers and want to bit-optimize it?)
Also, the compiler will compile both ways to exactly the same code, and (in most circumstances) is completely entitled to do return value optimization and tail call optimization.
Combined with probable inlining of queue_class::get() your issue seems to be completely MOOT
I believe your are trying to beat the compiler at his own job.
Have you experienced performance issues ? If not, you might focus on producing a readable code (which you seem to have) that you can maintain rather than resorting to what could be premature optimization and clutter the code with weird optimizations.
The issue with this code is not in what you've done, but in that it has to spin - wasting CPU cycles that some other task your computer's performing might have used - even when there's no work to do.
If there are many programs that take this attitude (that they're king of the computer and will hog entire CPUs) then everything slows to an absolute crawl. It's a very drastic decision to let your code work like this.
If possible, change the entire model so that you get a callback/signal/event of some kind when there's more data available.
You're right that you should let the compiler optimise, but if you know that it is safe to do this:
while (true) {
a=g.get();
b=g.get();
c=g.get();
d=g.get();
process(a);
process(b);
process(c);
process(d);
}
then it might make things faster.
Or, even more extreme, get a whole array of the return type (or pointers to it) ready, then loop over it processing them. If process() and get() both use a lot of code, then doing this could mean all the code can stay in immediate cache, instead of being fetched from a further cache each time the function is called.
The compiler can't make this optimisation because it probably doesn't know that it's safe to re-order function calls.