Performance of operator '==' with boolean variable? - c++

I guess this situation came across in every programmer,where we can use comparison operator '==',in my case situation is like this,a c++ pgm
code 1:This has been used in all files except constructor
if(a==10)
{
//do something;
}
but i can do the same as above with the following way,
i set a bool variable to true when variable a becomes 10 in constructor itself,i.e
constructor_name()
{
boolean variable_name=TRUE;//when a == 10;
}
then i use the following code in my all files instead of code 1,
code 3:
if(variable_name)
{
//do same as first code
}
which is better for performance ,the code 1 or code 3.I hope i have illustrated my situation so than you can understand.Please help me.Thanks in advance.

You shouldn't micro-optimize. You will hardly notice any difference between your 2 version in performance (maybe you will save 1 CPU cycle), but it is not worth the time and effort, especially because nowadays CPUs are really fast.
Only optimize if you profile and find a bottleneck in your code.
Look at it this way, if you store the boolean variable in the class, it uses memory (1 byte) for maybe saving 1 CPU cycle. Depending on how often you create the class, that can scale up (even though the amount would still be ridiculously small). You maybe saved 1 cycle, but you lost 1 byte.
If you wrote this in production code, I am sure that others would find it confusing (I would), and wonder why you put a isTen boolean in the class, instead of just comparing the value using operator==.
Also, there may be a bug if you change a outisde of the constructor to 10, then isTen would still be false, but a is 10!

I found the below thing would make a difference,
consider variable a is an integer variable and it takes 4 bytes(assuming 4bytes for int),then compiler has to perform comparision of 4 bytes of memory where as a bool variable takes 1byte ,i guess this makes a differance in performance.

Related

Large performance difference between comparing a variable to a fixed value and reading or writing from mapped memory address

I'm developing a software that runs on a DE10 board, in an ARM Cortex-A9 processor.
This software has to access physical memory addresses in order to communicate with the FPGA in the DE10, and this is done mapping /dev/mem, this method is described here.
I have a situation where I have to select which of 4 addresses to send some values, and this could be done in 1 of 2 ways:
Using an if statement and checking a integer variable (which is always 0 or 1 at that part of the loop) and only write if it's 1.
Multiply the values that should be sent by the aforementioned variable and write on all addresses without any conditional, because writing zero doesn't have any effect on my system.
I was curious about which would be faster, so I tried this:
First, I made this loop:
int test=0;
for(int i=0;i<1000000;i++)
{
if(test==9)
{
test=15;
}
test++;
if(test==9)
{
test=0;
}
}
The first if statement should never be satisfied, so its only contribution to the time taken in the loop is from its comparison itself.
The increment and the second if statement are just things I added in an attempt to prevent the compiler from just "optimizing out" the first if statement.
This loop is ran once without being benchmarked (just in case there's any frequency scaling ramp, although I'm pretty sure it has none) and then its ran once again being benchmarked, and it takes around 18350 μs to complete.
Without the first if statement, it takes around 17260 μs
Now, If I change that first if statement by a line that sets the value of a memory-mapped address to the value of the integer test, like this:
for(int i=0;i<1000000;i++)
{
*(uint8_t*)address=test;
test++;
if(test==9)
{
test=0;
}
}
This loops takes around 253600 μs to complete, almost 14 x slower.
Reading that address instead of writing on it barely changes anything.
Is this what it really is, or is there some kind of compiler optimization possibly frustrating my benchmarking?
Should I expect this difference in performance (and thus favoring the comparison method) in the actual software?

Optimize simple comparison with zero for performance

I have a bottleneck (about 20% CPU time) in my code which is in following if statement:
if (a == 0) { // here
...
}
where a is a uint8_t, so a number from 0 to 255.
Are there any low level optimizations to make it faster?
I thought about using bitwise NOR (~(a| 0)), but that would work only if a was a 1-bit, right?
Just in case: I don't care about code readability in this particular case.
Unless your compiler is garbage, there is nothing you can do to speed up integer comparison.
However, it is possible that the bottleneck you observe is not really the comparison itself, but rather the result of unlucky branch prediction.
There are two ways of getting around this:
If "to branch or not to branch" follows a pattern, move this last second decision further up in your program logic where you can use the pattern, just don't branch in your hot function. This might require serious thinking. A hacky way to find out whether you have patterns: Print 1 if you branch and 0 else for enough calls, Zip is up and see whether the resulting archive gets much smaller (in bits) than the number of values you printed. (Of course there are also smart formulas for that if you like it more theoretical.)
If you choose one branch over the other most of the time, you can tell the compiler which branch is the likely one. With gcc, checkout __builtin_expect, for other compilers, read the manual.
Important for both solutions: You will need to measure whether that actually helped. Especially the second one will not be magically be better, it might even make things much worse.

performance of passing arguments by value

in the process of refactoring some code, want to change a function like this
bool A::function() {
return this->a == this->b || this->c == this->d || this->e == this->f || this->g == this->h ;
}
to something like this
bool A::function(int a, int b, int c, int d, int e, int g) {
return a == b || c == d || e == this->f || g == this->h ;
}
this function is supposed to be called each time inside a main loop which would have at most 10M elements
The people I'm working with are reluctant to use the second version because of the performance cost of passing 6 ints.
I'm pretty sure that this is negligeable, considering that each iteration of the loop goes through a LOT of code, and it roughly takes ~1 minute to proces the 10M elements.
Is the cost of passing 6 int by value all the time so hight? if not, how can I make them change their mind?
edit :
about inlining, I told them that the penality would be 0 if the function was inlined but their answer was basically "we can't know for sure if it will be inlined", which I seem to recall is true (up to the compiler)
I suspect that you won't see any big difference between these two variants in reasonably optimised code. However, the proof of that would be to actually change the code and compare the different times. (And more so if 10M entries are being processed in a minute, that's 6 microseconds per item, so around 30000-200000 instructions on a modern processor - adding 6 argument passes won't budge it one or the other way, I'd say - unless this function is called many times in the loop, of course).
And yes, if the function is inlined, the result would be identical code for the two alternatives - but are your colleagues say, you can't know for sure that it is inlined or not - the only way to really determine that is to have a look at the generated machine-code (-S or use objdump or similar).
In terms of performance, I would suggest you profile your code, and see if there is a difference that matters. Passing ints around is usually very cheap and open to automatic optimization, so I doubt you would see a measurable performance hit.
Also worth pointing out that the two functions are different. The second doesn't necessarily use the member variables and the first does. If you're always comparing member variables, why pass them as parameters? Extra unnecessary parameters means more source code and a greater scope for bugs.
Write the code and as Shane says, profile it, or I prefer to grab a few stack samples because you can see exactly what's going on.
If you find the program counter in the instructions that pass those int arguments, on more than one sample, then they are costing a significant fraction of time, and you should do something about it.
On the other hand, the samples might tell you something else is the main time-taker, and maybe you should fix that first.
Then the program will be faster, and if you do the whole process again, it might come back to your original question.

Use of Literals, yay/nay in C++

I've recently heard that in some cases, programmers believe that you should never use literals in your code. I understand that in some cases, assigning a variable name to a given number can be helpful (especially in terms of maintenance if that number is used elsewhere). However, consider the following case studies:
Case Study 1: Use of Literals for "special" byte codes.
Say you have an if statement that checks for a specific value stored in (for the sake of argument) a uint16_t. Here are the two code samples:
Version 1:
// Descriptive comment as to why I'm using 0xBEEF goes here
if (my_var == 0xBEEF) {
//do something
}
Version 2:
const uint16_t kSuperDescriptiveVarName = 0xBEEF;
if (my_var == kSuperDescriptiveVarName) {
// do something
}
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once. Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
Case Study 2: Use of sizeof
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns. Take the two code examples into account. The scenario is that you are computing the offset into a packet buffer (an array of uint8_t) where the first part of the packet is stored as my_packet_header, which let's say is a uint32_t.
Version 1:
const int offset = sizeof(my_packet_header);
Version 2:
const int offset = 4; // good comment telling reader where 4 came from
Clearly, version 1 is preferred, but what about for cases where you have multiple data fields to skip over? What if you have the following instead:
Version 1:
const int offset = sizeof(my_packet_header) + sizeof(data_field1) + sizeof(data_field2) + ... + sizeof(data_fieldn);
Version 2:
const int offset = 47;
Which is preferred in this case? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
Thanks for the help in advance as I attempt to better my code practices.
Which is the "preferred" method in terms of good coding practice? I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
Sounds like you understand the main point... factoring values (and their comments) that are used in multiple places. Further, it can sometimes help to have a group of constants in one place - so their values can be inspected, verified, modified etc. without concern for where they're used in the code. Other times, there are many constants used in proximity and the comments needed to properly explain them would obfuscate the code in which they're used.
Countering that, having a const variable means all the programmers studying the code will be wondering whether it's used anywhere else, keeping it in mind as they inspect the rest of the scope in which it's declared etc. - the less unnecessary things to remember the surer the understanding of important parts of the code will be.
Like so many things in programming, it's "an art" balancing the pros and cons of each approach, and best guided by experience and knowledge of the way the code's likely to be studied, maintained, and evolved.
Also, does the compiler do any optimizations to make both versions effectively the same executable code? That is, are there any performance implications here?
There's no performance implications in optimised code.
I fully understand that using sizeof versus a raw literal is preferred for portability and also readability concerns.
And other reasons too. A big factor in good programming is reducing the points of maintenance when changes are done. If you can modify the type of a variable and know that all the places using that variable will adjust accordingly, that's great - saves time and potential errors. Using sizeof helps with that.
Which is preferred [for calculating offsets in a struct]? Does is still make sense to show all the steps involved with computing the offset or does the literal usage make sense here?
The offsetof macro (#include <cstddef>) is better for this... again reducing maintenance burden. With the this + that approach you illustrate, if the compiler decides to use any padding your offset will be wrong, and further you have to fix it every time you add or remove a field.
Ignoring the offsetof issues and just considering your this + that example as an illustration of a more complex value to assign, again it's a balancing act. You'd definitely want some explanation/comment/documentation re intent here (are you working out the binary size of earlier fields? calculating the offset of the next field?, deliberately missing some fields that might not be needed for the intended use or was that accidental?...). Still, a named constant might be enough documentation, so it's likely unimportant which way you lean....
In every example you list, I would go with the name.
In your first example, you almost certainly used that special 0xBEEF number at least twice - once to write it and once to do your comparison. If you didn't write it, that number is still part of a contract with someone else (perhaps a file format definition).
In the last example, it is especially useful to show the computation that yielded the value. That way, if you encounter trouble down the line, you can easily see either that the number is trustworthy, or what you missed and fix it.
There are some cases where I prefer literals over named constants though. These are always cases where a name is no more meaningful than the number. For example, you have a game program that plays a dice game (perhaps Yahtzee), where there are specific rules for specific die rolls. You could define constants for One = 1, Two = 2, etc. But why bother?
Generally it is better to use a name instead of a value. After all, if you need to change it later, you can find it more easily. Also it is not always clear why this particular number is used, when you read the code, so having a meaningful name assigned to it, makes this immediately clear to a programmer.
Performance-wise there is no difference, because the optimizers should take care of it. And it is rather unlikely, even if there would be an extra instruction generated, that this would cause you troubles. If your code would be that tight, you probably shouldn't rely on an optimizer effect anyway.
I can fully understand why you would prefer version 2 if kSuperDescriptiveVarName is used more than once.
I think kSuperDescriptiveVarName will definitely be used more than once. One for check and at least one for assignment, maybe in different part of your program.
There will be no difference in performance, since an optimization called Constant Propagation exists in almost all compilers. Just enable optimization for your compiler.

Performance of comparisons in C++ ( foo >= 0 vs. foo != 0 )

I've been working on a piece of code recently where performance is very important, and essentially I have the following situation:
int len = some_very_big_number;
int counter = some_rather_small_number;
for( int i = len; i >= 0; --i ){
while( counter > 0 && costly other stuff here ){
/* do stuff */
--counter;
}
/* do more stuff */
}
So here I have a loop that runs very often and for a certain number of runs the while block will be executed as well until the variable counter is reduced to zero and then the while loop will not be called because the first expression will be false.
The question is now, if there is a difference in performance between using
counter > 0 and counter != 0?
I suspect there would be, does anyone know specifics about this.
To measure is to know.
Do you think that what will solve your problem! :D
if(x >= 0)
00CA1011 cmp dword ptr [esp],0
00CA1015 jl main+2Ch (0CA102Ch) <----
...
if(x != 0)
00CA1026 cmp dword ptr [esp],0
00CA102A je main+3Bh (0CA103Bh) <----
In programming, the following statement is the sign designating the road to Hell:
I've been working on a piece of code recently where performance is very important
Write your code in the cleanest, most easy to understand way. Period.
Once that is done, you can measure its runtime. If it takes too long, measure the bottlenecks, and speed up the biggest ones. Keep doing that until it is fast enough.
The list of projects that failed or suffered catastrophic loss due to a misguided emphasis on blind optimization is large and tragic. Don't join them.
I think you're spending time optimizing the wrong thing. "costly other stuff here", "do stuff" and "do more stuff" are more important to look at. That is where you'll make the big performance improvements I bet.
There will be a huge difference if the counter starts with a negative number. Otherwise, on every platform I'm familiar with, there won't be a difference.
Is there a difference between counter > 0 and counter != 0? It depends on the platform.
A very common type of CPU are those from Intel we have in our PC's. Both comparisons will map to a single instruction on that CPU and I assume they will execute at the same speed. However, to be certain you will have to perform your own benchmark.
As Jim said, when in doubt see for yourself :
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
using namespace boost::posix_time;
using namespace std;
void main()
{
ptime Before = microsec_clock::universal_time(); // UTC NOW
// do stuff here
ptime After = microsec_clock::universal_time(); // UTC NOW
time_duration delta_t = After - Before; // How much time has passed?
cout << delta_t.total_seconds() << endl; // how much seconds total?
cout << delta_t.fractional_seconds() << endl; // how much microseconds total?
}
Here's a pretty nifty way of measuring time. Hope that helps.
OK, you can measure this, sure. However, these sorts of comparisons are so fast that you are probably going to see more variation based on processor swapping and scheduling then on this single line of code.
This smells of unnecessary, and premature, optimization. Right your program, optimize what you see. If you need more, profile, and then go from there.
I would add that the overwhelming performance aspects of this code on modern cpus will be dominated not by the comparison instruction but whether the comparison is well predicted since any mis-predict will waste many more cycles than any integral operation.
As such loop unrolling will probably be the biggest winner but measure, measure, measure.
Thinking that the type of comparison is going to make a difference, without knowing it, is the definition of guessing.
Don't guess.
In general, they should be equivalent (both are usually implemented in single-cycle instructions/micro-ops). Your compiler may do some strange special-case optimization that is difficult to reason about from the source level, which may make either one slightly faster. Also, equality testing is more energy-efficient than inequality testing (>), though the system-level effect is so small as to not merit discussion.
There may be no difference. You could try examining the assembly output for each.
That being said, the only way to tell if any difference is significant is to try it both ways and measure. I'd bet that the change makes no difference whatsoever with optimizations on.
Assuming you are developing for the x86 architecture, when you look at the assembly output it will come down to jns vs jne. jns will check the sign flag and jne will check the zero flag. Both operations, should as far as I know, be equally costly.
Clearly the solution is to use the correct data type.
Make counter an unsigned int. Then it can't be less than zero. Your compiler will obviously know this and be forced to choose the optimal solution.
Or you could just measure it.
You could also think about how it would be implemented...(here we go on a tangent)...
less than zero: the sign bit would be set, so need to check 1 bit
equal to zero : the whole value would be zero, so need to check all the bits
Of course, computers are funny things, and it may take longer to check a single bit than the whole value (however many bytes it is on your platform).
You could just measure it...
And you could find out that one it more optimal than another (under the conditions you measured it). But your program will still run like a dog because you spent all your time optimising the wrong part of your code.
The best solution is to use what many large software companies do - blame the hardware for not runnnig fast enough and encourage your customer to upgrade their equipment (which is clearly inferior since your product doesn't run fast enough).
< /rant>
I stumbled across this question just now, 3 years after it is asked, so I am not sure how useful the answer will still be... Still, I am surprised not to see clearly stated that answering your question requires to know two and only two things:
which processor you target
which compiler you work with
To the first point, each processor has different instructions for tests. On one given processor, two similar comparisons may turn up to take a different number of cycles. For example, you may have a 1-cycle instruction to do a gt (>), eq (==), or a le (<=), but no 1-cycle instruction for other comparisons like a ge (>=). Following a test, you may decide to execute conditional instructions, or, more often, as in your code example, take a jump. There again, cycles spent in jumps take a variable number of cycles on most high-end processors, depending whether the conditional jump is taken or not taken, predicted or not predicted. When you write code in assembly and your code is time critical, you can actually take quite a bit of time to figure out how to best arrange your code to minimize overall the cycle count and may end up in a solution that may have to be optimized based on the number of time a given comparison returns a true or false.
Which leads me to the second point: compilers, like human coders, try to arrange the code to take into account the instructions available and their latencies. Their job is harder because some assumptions an assembly code would know like "counter is small" is hard (not impossible) to know. For trivial cases like a loop counter, most modern compilers can at least recognize the counter will always be positive and that a != will be the same as a > and thus generate the best code accordingly. But that, as many mentioned in the posts, you will only know if you either run measurements, or inspect your assembly code and convince yourself this is the best you could do in assembly. And when you upgrade to a new compiler, you may then get a different answer.