Must I initialize floats using 0.f? - c++

When I initialize float variables in my program, I commonly have vectors like:
Vector forward(0.f,0.f,-1.f),right(1.f,0.f,0.f),up(0.f,1.f,0.f)
(Vectors are just 3 floats like struct Vector{ float x,y,z; };)
This looks much easier to read as:
Vector forward(0,0,-1),right(1,0,0),up(0,1,0)
Must I initialize my float variables using floats? Am I losing anything or incurring some kind of penalty when I use integers (or doubles) to initialize a float?

There's no semantic difference between the two. Depending on some compilers, it is possible for extra code to be generated, though. See also this and this SO questions of the same topic.
I can confirm that gcc generates the same code for all variants of
int main()
{
float a = 0.0f; /* or 0 or 0.0 */
return 0;
}
and that this code is
.file "1.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0x00000000, %eax
movl %eax, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
The relevant line is
movl $0x00000000, %eax
Changing a to 0.1 (or 0.1f) changes the line to
movl $0x3dcccccd, %eax
It seems that gcc is able to deduce the correct constant and doesn't generate extra code.

For a single literal constant, it shouldn't matter. In the context of an initializer, a constant of any numeric type will be implicitly converted to the type of the object being initialized. This is guaranteed by the language standard. So all of these:
float x = 0;
float x = 0.0;
float x = 0.0f;
float x = 0.0L; // converted from long double to float
are equally valid and result in the same value being stored in x.
A literal constant in a more complex expression can have surprising results, though.
In most cases, each expression is evaluated by itself, regardless of the context in which it appears. Any implicit conversion is applied after the subexpression has been evaluated.
So if you write:
float x = 1 / 2;
the expression 1 / 2 will be evaluated as an int, yielding 0, which is then converted to float. It will setxto0.0f, not to0.5f`.
I think you should be safe using unsuffixed floating-point constants (which are of type double).
Incidentally, you might consider using double rather than float in your program. double, as I mentioned, is the type of an unsuffixed floating-point constant, and can be thought of in some sense as the "default" floating-point type. It usually has more range and precision than float, and there's typically not much difference in performance.

It could be a good programming practise to always write 0.f, 1.f etc., even if often gcc can figure out what the programmer means by 1.0 et al.
The problematic cases are not so much trivial float variable initializations, but numeric constants in somewhat more complex formulae, where a combination of operators, float variables and said constants can easily lead to occurrence of unintended double valued calculations and costly float-double-float conversions.
Spotting these conversions without specifically checking the compiled code for them becomes very hard if the intended type for numeric values is mostly omitted in the code and instead only included when it's absolutely required. Hence I for one would choose the approach of just typing in the f's and getting used to having them around.

Related

Does gcc automatically perform a mathematical operations on const values during compilation

Lets say we perform
malloc(4 * sizeof(int))
Now, the number 4 is a constant and from my understanding sizeof is actually compile time function (unless you have a variable inside of it).
In this case (considering x86) sizeof(int) would also be 4. My question is: will the gcc optimization perform the calculation itself or will the equation be generated in the asm?
This is called "constant-folding" and yes, it will happen before assembly. Assembly in itself is usually not optimized at all.
Consider the minimal program
#include <stdlib.h>
int main(void)
{
malloc(4 * sizeof(int));
}
We can compile it into assembly with gcc -S. On my computer, the resulting assembly says:
main:
pushq %rbp
movq %rsp, %rbp
movl $16, %edi
call malloc#PLT
movl $0, %eax
popq %rbp
ret
I.e. the only constants you see in there are 16 (4 * sizeof(int)), and 0 (the implicit return value from main()).
Note that in C there is a class of expressions that are called "integer constant expressions" that are supposed to be evaluated at the compilation time. You can use 4 * sizeof(int) as the size of an array - or even within a _Static_assert clause - naturally then it must be evaluated during the compilation, but in general case, such as here, the C standard does not require one or the other.

Which is the faster operation? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have two variable a and b.I have to write a if condition on variable a and b:
This is First Approach:
if(a > 0 || b >0){
//do some things
}
This is second Approach:
if((a+b) > 0){
//do some thing
}
Update: consider a and b are unsigned.then which will take lesser execution time between logical or(||) and arithmetic (+ )operator
this condition will Iterate around one million times.
Any help on this will be appreciated.
Your second condition is wrong. If a=1, b=-1000, it will evaluate to false, whereas your first condition will be evaluated to true. In general you shouldn't worry about speed at these kind of tests, the compiler optimizes the condition a lot, so a logical OR is super fast. In general, people are making bigger mistakes than optimizing such conditions... So don't try to optimize unless you really know what is going on, the compiler is in general doing a much better job than any of us.
In principle, in the first expression you have 2 CMP and one OR, whereas in the second, you have only one CMP and one ADD, so the second should be faster (even though the complier does some short-circuit in the first case, but this cannot happen 100% of the time), however in your case the expressions are not equivalent (well, they are for positive numbers...).
I decided to check this for C language, but identical arguments apply to C++, and similar arguments apply to Java (except Java allows signed overflow). Following code was tested (for C++, replace _Bool with bool).
_Bool approach1(int a, int b) {
return a > 0 || b > 0;
}
_Bool approach2(int a, int b) {
return (a + b) > 0;
}
And this was resulting disasembly.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
testl %edi, %edi
setg %al
testl %esi, %esi
setg %dl
orl %edx, %eax
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setg %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
Those codes are quite different, even considering how clever the compilers are these days. Why is that so? Well, the reason is quite simple - they aren't identical. If a is -42 and b is 2, the first approach will return true, and second will return false.
Surely, you may think that a and b should be unsigned.
.file "faster.c"
.text
.p2align 4,,15
.globl approach1
.type approach1, #function
approach1:
.LFB0:
.cfi_startproc
orl %esi, %edi
setne %al
ret
.cfi_endproc
.LFE0:
.size approach1, .-approach1
.p2align 4,,15
.globl approach2
.type approach2, #function
approach2:
.LFB1:
.cfi_startproc
addl %esi, %edi
testl %edi, %edi
setne %al
ret
.cfi_endproc
.LFE1:
.size approach2, .-approach2
.ident "GCC: (SUSE Linux) 4.8.1 20130909 [gcc-4_8-branch revision 202388]"
.section .note.GNU-stack,"",#progbits
It's quite easy to notice that approach1 is better here, because it doesn't do pointless addition, which is in fact, quite wrong. In fact, it even makes an optimization to (a | b) != 0, which is correct optimization.
In C, unsigned overflows are defined, so the compiler has to handle the case when integers are very high (try INT_MAX and 1 for approach2). Even assuming you know the numbers won't overflow, it's easy to notice approach1 is faster, because it simply tests if both variables are 0.
Trust your compiler, it will optimize better than you, and that without small bugs that you could accidentally write. Write code instead of asking yourself whether i++ or ++i is faster, or if x >> 1 or x / 2 is faster (by the way, x >> 1 doesn't do the same thing as x / 2 for signed numbers, because of rounding behavior).
If you want to optimize something, optimize algorithms you use. Instead of using worst case O(N4) sorting algorithm, use worst case O(N log N) algorithm. This will actually make program faster, especially if you sort reasonably big arrays
The real answer for this is always to do both and actually test which one runs faster. That's the only way to know for sure.
I would guess the second one would run faster, because an add is a quick operation but a missed branch causes pipeline clears and all sort of nasty things. It would be data dependent though. But it isn't exactly the same, if a or b is allowed to be negative or big enough for overflow then it isn't the same test.
Well, I wrote some quick code and disassembled:
public boolean method1(final int a, final int b) {
if (a > 0 || b > 0) {
return true;
}
return false;
}
public boolean method2(final int a, final int b) {
if ((a + b) > 0) {
return true;
}
return false;
}
These produce:
public boolean method1(int, int);
Code:
0: iload_1
1: ifgt 8
4: iload_2
5: ifle 10
8: iconst_1
9: ireturn
10: iconst_0
11: ireturn
public boolean method2(int, int);
Code:
0: iload_1
1: iload_2
2: iadd
3: ifle 8
6: iconst_1
7: ireturn
8: iconst_0
9: ireturn
So as you can see, they're pretty similar; the only difference is performing a > 0 test vs a + b; looks like the || got optimized away. What the JIT compiler optimizes these to, I have no clue.
If you wanted to get really picky:
Option 1: Always 1 load and 1 comparison, possible 2 loads and 2 comparisons
Option 2: Always 2 loads, 1 addition, 1 comparison
So really, which one performs better depends on what your data looks like and whether there is a pattern the branch predictor can use. If so, I could imagine the first method running faster because the processor basically "skips" the checks, and in the best case only has to perform half the operations the second option will. To be honest, though, this really seems like premature optimization, and I'm willing to bet that you're much more likely to get more improvement elsewhere in your code. I don't find basic operations to be bottlenecks most of the time.
Two things:
(a|b) > 0 is strictly better than (a+b) > 0, so replace it.
The above two only work correctly if the numbers are both unsigned.
If a and b have the potential to be negative numbers, the two choices are not equivalent, as has been pointed out by the answer by #vsoftco.
If both a and b are guaranteed to be non-negative integers, I would use
if ( (a|b) > 0 )
instead of
if ( (a+b) > 0 )
I think bitwise | is faster than integer addition.
Update
Use bitwise | instead of &.

Boolean multiplication in c++?

Consider the following:
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
The syntax of f2 is more compact, but do the standard guarantees that f1 and f2 are strictly equivalent ?
Furthermore, if I want the compiler to optimize this expression if b and i are known at compile-time, which version should I prefer ?
Well, yes, both are equivalent. bool is an integral type and true is guaranteed to convert to 1 in integer context, while false is guaranteed to convert to 0.
(The reverse is also true, i.e. non-zero integer values are guaranteed to convert to true in boolean context, while zero integer values are guaranteed to convert to false in boolean context.)
Since you are working with unsigned types, one can easily come up with other, possibly bit-hack-based yet perfectly portable implementations of the same thing, like
i & -(unsigned) b
although a decent compiler should be able to choose the best implementation by itself for any of your versions.
P.S. Although to my great surprise, GCC 4.1.2 compiled all three variants virtually literally, i.e. it used machine multiplication instruction in multiplication-based variant. It was smart enough to use cmovne instruction on the ?: variant to make it branchless, which quite possibly made it the most efficient implementation.
Yes. It's safe to assume true is 1 and false is 0 when used in expressions as you do and is guaranteed:
C++11, Integral Promotions, 4.5:
An rvalue of type bool can be converted to an rvalue of type int, with
false becoming zero and true becoming one.
The compiler will use implicit conversion to make an unsigned int from b, so, yes, this should work. You're skipping the condition checking by simple multiplication. Which one is more effective/faster? Don't know. A good compiler would most likely optimize both versions I'd assume.
FWIW, the following code
inline unsigned int f1(const unsigned int i, const bool b) {return b ? i : 0;}
inline unsigned int f2(const unsigned int i, const bool b) {return b*i;}
int main()
{
volatile unsigned int i = f1(42, true);
volatile unsigned int j = f2(42, true);
}
compiled with gcc -O2 produces this assembly:
.file "test.cpp"
.def ___main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 2,,3
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB2:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $42, 8(%esp) // i
movl $42, 12(%esp) // j
xorl %eax, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE2:
There's not much left of either f1 or f2, as you can see.
As far as C++ standard is concerned, the compiler is allowed to do anything with regards to optimization, as long as it doesn't change the observable behaviour (the as if rule).

Should we generally use float literals for floats instead of the simpler double literals?

In C++ (or maybe only our compilers VC8 and VC10) 3.14 is a double literal and 3.14f is a float literal.
Now I have a colleague that stated:
We should use float-literals for float calculations and double-literals for double calculations as this could have an impact on the precision of a calculation when constants are used in a calcualtion.
Specifically, I think he meant:
double d1, d2;
float f1, f2;
... init and stuff ...
f1 = 3.1415 * f2;
f1 = 3.1415f * f2; // any difference?
d1 = 3.1415 * d2;
d1 = 3.1415f * d2; // any difference?
Or, added by me, even:
d1 = 42 * d2;
d1 = 42.0f * d2; // any difference?
d1 = 42.0 * d2; // any difference?
More generally, the only point I can see for using 2.71828183f is to make sure that the constant I'm trying to specify will actually fit into a float (compiler error/warning otherwise).
Can someone shed some light on this? Do you specify the f postfix? Why?
To quote from an answer what I implicitly took for granted:
If you're working with a float variable and a double literal the whole
operation will be done as double and then converted back to float.
Could there possibly be any harm in this? (Other than a very, very theoretical performance impact?)
Further edit: It would be nice if answers containing technical details (appreciated!) could also include how these differences affect general purpose code. (Yes, if you're number crunching, you probably like to make sure your big-n floating point ops are as efficient (and correct) as possible -- but does it matter for general purpose code that's called a few times? Isn't it cleaner if the code just uses 0.0 and skips the -- hard to maintain! -- float suffix?)
Yes, you should use the f suffix. Reasons include:
Performance. When you write float foo(float x) { return x*3.14; }, you force the compiler to emit code that converts x to double, then does the multiplication, then converts the result back to single. If you add the f suffix, then both conversions are eliminated. On many platforms, each those conversions are about as expensive as the multiplication itself.
Performance (continued). There are platforms (most cellphones, for example), on which double-precision arithmetic is dramatically slower than single-precision. Even ignoring the conversion overhead (covered in 1.), every time you force a computation to be evaluated in double, you slow your program down. This is not just a "theoretical" issue.
Reduce your exposure to bugs. Consider the example float x = 1.2; if (x == 1.2) // something; Is something executed? No, it is not, because x holds 1.2 rounded to a float, but is being compared to the double-precision value 1.2. The two are not equal.
I suspect something like this: If you're working with a float variable and a double literal the whole operation will be done as double and then converted back to float.
If you use a float literal, notionally speaking the computation will be done at float precision even though some hardware will convert it to double anyway to do the calculation.
I did a test.
I compiled this code:
float f1(float x) { return x*3.14; }
float f2(float x) { return x*3.14F; }
Using gcc 4.5.1 for i686 with optimization -O2.
This was the assembly code generated for f1:
pushl %ebp
movl %esp, %ebp
subl $4, %esp # Allocate 4 bytes on the stack
fldl .LC0 # Load a double-precision floating point constant
fmuls 8(%ebp) # Multiply by parameter
fstps -4(%ebp) # Store single-precision result on the stack
flds -4(%ebp) # Load single-precision result from the stack
leave
ret
And this is the assembly code generated for f2:
pushl %ebp
flds .LC2 # Load a single-precision floating point constant
movl %esp, %ebp
fmuls 8(%ebp) # Multiply by parameter
popl %ebp
ret
So the interesting thing is that for f1, the compiler stored the value and re-loaded it just to make sure that the result was truncated to single-precision.
If we use the -ffast-math option, then this difference is significantly reduced:
pushl %ebp
fldl .LC0 # Load double-precision constant
movl %esp, %ebp
fmuls 8(%ebp) # multiply by parameter
popl %ebp
ret
pushl %ebp
flds .LC2 # Load single-precision constant
movl %esp, %ebp
fmuls 8(%ebp) # multiply by parameter
popl %ebp
ret
But there is still the difference between loading a single or double precision constant.
Update for 64-bit
These are the results with gcc 5.2.1 for x86-64 with optimization -O2:
f1:
cvtss2sd %xmm0, %xmm0 # Convert arg to double precision
mulsd .LC0(%rip), %xmm0 # Double-precision multiply
cvtsd2ss %xmm0, %xmm0 # Convert to single-precision
ret
f2:
mulss .LC2(%rip), %xmm0 # Single-precision multiply
ret
With -ffast-math, the results are the same.
Typically, I don't think it will make any difference, but it is worth
pointing out that 3.1415f and 3.1415 are (typically) not equal. On
the other hand, you don't normally do any calculations in float
anyway, at least on the usual platforms. (double is just as fast, if
not faster.) About the only time you should see float is when there
are large arrays, and even then, all of the calculations will typically
be done in double.
There is a difference: If you use a double constant and multiply it with a float variable, the variable is converted into double first, the calculation is done in double, and then the result is converted into float. While precision isn't really a problem here, this might lead to confusion.
I personally tend to use the f postfix notation as a matter of principles and to make it obvious as much as I can that this is a float type rather than a double.
My two cents
From the C++ Standard ( Working Draft ), section 5 on binary operators
Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows: — If either operand is of scoped
enumeration type (7.2), no conversions are performed; if the other
operand does not have the same type, the expression is ill-formed. —
If either operand is of type long double, the other shall be converted
to long double. — Otherwise, if either operand is double, the other
shall be converted to double. — Otherwise, if either operand is float,
the other shall be converted to float.
And also section 4.8
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exact representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise, the
behavior is undefined
The upshot of this is that you can avoid unnecessary conversions by specifying your constants in the precision dictated by the destination type, provided that you will not lose precision in the calculation by doing so (ie, your operands are exactly representable in the precision of the destination type ).

Which is faster : if (bool) or if(int)?

Which value is better to use? Boolean true or Integer 1?
The above topic made me do some experiments with bool and int in if condition. So just out of curiosity I wrote this program:
int f(int i)
{
if ( i ) return 99; //if(int)
else return -99;
}
int g(bool b)
{
if ( b ) return 99; //if(bool)
else return -99;
}
int main(){}
g++ intbool.cpp -S generates asm code for each functions as follows:
asm code for f(int)
__Z1fi:
LFB0:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
cmpl $0, 8(%ebp)
je L2
movl $99, %eax
jmp L3
L2:
movl $-99, %eax
L3:
leave
LCFI2:
ret
asm code for g(bool)
__Z1gb:
LFB1:
pushl %ebp
LCFI3:
movl %esp, %ebp
LCFI4:
subl $4, %esp
LCFI5:
movl 8(%ebp), %eax
movb %al, -4(%ebp)
cmpb $0, -4(%ebp)
je L5
movl $99, %eax
jmp L6
L5:
movl $-99, %eax
L6:
leave
LCFI6:
ret
Surprisingly, g(bool) generates more asm instructions! Does it mean that if(bool) is little slower than if(int)? I used to think bool is especially designed to be used in conditional statement such as if, so I was expecting g(bool) to generate less asm instructions, thereby making g(bool) more efficient and fast.
EDIT:
I'm not using any optimization flag as of now. But even absence of it, why does it generate more asm for g(bool) is a question for which I'm looking for a reasonable answer. I should also tell you that -O2 optimization flag generates exactly same asm. But that isn't the question. The question is what I've asked.
Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.
Compiling with -03 gives the following for me:
f:
pushl %ebp
movl %esp, %ebp
cmpl $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
g:
pushl %ebp
movl %esp, %ebp
cmpb $1, 8(%ebp)
popl %ebp
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.. so it compiles to essentially the same code, except for cmpl vs cmpb.
This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.
Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.
When I compile this with a sane set of options (specifically -O3), here's what I get:
For f():
.type _Z1fi, #function
_Z1fi:
.LFB0:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpl $1, %edi
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
For g():
.type _Z1gb, #function
_Z1gb:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
cmpb $1, %dil
sbbl %eax, %eax
andb $58, %al
addl $99, %eax
ret
.cfi_endproc
They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.
edited to add the following based on your addition
The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)
When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.
With GCC 4.5 on Linux and Windows at least, sizeof(bool) == 1. On x86 and x86_64, you can't pass in less than an general purpose register's worth to a function (whether via the stack or a register depending on the calling convention etc...).
So the code for bool, when un-optimized, actually goes to some length to extract that bool value from the argument stack (using another stack slot to save that byte). It's more complicated than just pulling a native register-sized variable.
Yeah, the discussion's fun. But just test it:
Test code:
#include <stdio.h>
#include <string.h>
int testi(int);
int testb(bool);
int main (int argc, char* argv[]){
bool valb;
int vali;
int loops;
if( argc < 2 ){
return 2;
}
valb = (0 != (strcmp(argv[1], "0")));
vali = strcmp(argv[1], "0");
printf("Arg1: %s\n", argv[1]);
printf("BArg1: %i\n", valb ? 1 : 0);
printf("IArg1: %i\n", vali);
for(loops=30000000; loops>0; loops--){
//printf("%i: %i\n", loops, testb(valb=!valb));
printf("%i: %i\n", loops, testi(vali=!vali));
}
return valb;
}
int testi(int val){
if( val ){
return 1;
}
return 0;
}
int testb(bool val){
if( val ){
return 1;
}
return 0;
}
Compiled on a 64-bit Ubuntu 10.10 laptop with:
g++ -O3 -o /tmp/test_i /tmp/test_i.cpp
Integer-based comparison:
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.203s
user 0m8.170s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.056s
user 0m8.020s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.116s
user 0m8.100s
sys 0m0.000s
Boolean test / print uncommented (and integer commented):
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.254s
user 0m8.240s
sys 0m0.000s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m8.028s
user 0m8.000s
sys 0m0.010s
sauer#trogdor:/tmp$ time /tmp/test_i 1 > /dev/null
real 0m7.981s
user 0m7.900s
sys 0m0.050s
They're the same with 1 assignment and 2 comparisons each loop over 30 million loops. Find something else to optimize. For example, don't use strcmp unnecessarily. ;)
At the machine level there is no such thing as bool
Very few instruction set architectures define any sort of boolean operand type, although there are often instructions that trigger an action on non-zero values. To the CPU, usually, everything is one of the scalar types or a string of them.
A given compiler and a given ABI will need to choose specific sizes for int and bool and when, like in your case, these are different sizes they may generate slightly different code, and at some levels of optimization one may be slightly faster.
Why is bool one byte on many systems?
It's safer to choose a char type for bool because someone might make a really large array of them.
Update: by "safer", I mean: for the compiler and library implementors. I'm not saying people need to reimplement the system type.
It will mostly depend on the compiler and the optimization. There's an interesting discussion (language agnostic) here:
Does "if ([bool] == true)" require one more step than "if ([bool])"?
Also, take a look at this post: http://www.linuxquestions.org/questions/programming-9/c-compiler-handling-of-boolean-variables-290996/
Approaching your question in two different ways:
If you are specifically talking about C++ or any programming language that will produce assembly code for that matter, we are bound to what code the compiler will generate in ASM. We are also bound to the representation of true and false in c++. An integer will have to be stored in 32 bits, and I could simply use a byte to store the boolean expression. Asm snippets for conditional statements:
For the integer:
mov eax,dword ptr[esp] ;Store integer
cmp eax,0 ;Compare to 0
je false ;If int is 0, its false
;Do what has to be done when true
false:
;Do what has to be done when false
For the bool:
mov al,1 ;Anything that is not 0 is true
test al,1 ;See if first bit is fliped
jz false ;Not fliped, so it's false
;Do what has to be done when true
false:
;Do what has to be done when false
So, that's why the speed comparison is so compile dependent. In the case above, the bool would be slightly fast since cmp would imply a subtraction for setting the flags. It also contradicts with what your compiler generated.
Another approach, a much simpler one, is to look at the logic of the expression on it's own and try not to worry about how the compiler will translate your code, and I think this is a much healthier way of thinking. I still believe, ultimately, that the code being generated by the compiler is actually trying to give a truthful resolution. What I mean is that, maybe if you increase the test cases in the if statement and stick with boolean in one side and integer in another, the compiler will make it so the code generated will execute faster with boolean expressions in the machine level.
I'm considering this is a conceptual question, so I'll give a conceptual answer. This discussion reminds me of discussions I commonly have about whether or not code efficiency translates to less lines of code in assembly. It seems that this concept is generally accepted as being true. Considering that keeping track of how fast the ALU will handle each statement is not viable, the second option would be to focus on jumps and compares in assembly. When that is the case, the distinction between boolean statements or integers in the code you presented becomes rather representative. The result of an expression in C++ will return a value that will then be given a representation. In assembly, on the other hand, the jumps and comparisons will be based in numeric values regardless of what type of expression was being evaluated back at you C++ if statement. It is important on these questions to remember that purely logicical statements such as these end up with a huge computational overhead, even though a single bit would be capable of the same thing.