How to cast perforrmantly when dividing float and int?

How to cast perforrmantly when dividing float and int? - casting

If I divide a float by an integer then I have two possibilities how to cast:
var f : Float
val i : Int = 2
f = 1f / i
Here I "cast" the first parameter by saying it is a float. Then if I correctly understand i is casted implicitly.
var f : Float
val i : Int = 2
f = 1 / i.toFloat()
Here I cast the second parameter to float. Then if I correctly understand the 1 is casted to float, too.
My Question is if the first of the two possibilities is faster. Or do I have the same performance in both cases?

Here's the decompiled bytecode of the two snippets, with comments explaining the parts that are different between the two.
First version:
L1
LINENUMBER 16 L1
ICONST_2
ISTORE 1
L2
LINENUMBER 17 L2
FCONST_1 // create constant float value 1f
ILOAD 1 // load value of i
I2F // cast i from int to float
FDIV // perform division
FSTORE 0 // write result to f
Second version:
L1
LINENUMBER 22 L1
ICONST_2
ISTORE 1
L2
LINENUMBER 23 L2
ICONST_1 // create constant int value 1
I2F // cast this value from int to float
ILOAD 1 // load value of i
I2F // cast i from int to float
FDIV // perform division
FSTORE 0 // write result to f
Basically, you're saving an int to float cast in the first version, because you're creating a float constant 1f immediately.
In reality though, this could be easily optimized by the JVM anyway, and is not worth worrying about at the syntax level - just use whichever code is more expressive for what you're doing. If you're somehow writing code that would benefit from performance improvements this tiny, do profiling on it to see if there's somehow a difference.

The only difference between your two snippets is syntax. There is no CPU out there that could natively perform a computation on one FP argument and one integer argument. Integer arguments are always first converted to FP.
If you're worried about the .toFloat() part, that's again just syntax. There is no actual method call going on, it just changes the type of the Int expression to Float.

Related

C/C++ What does casting do in the low level?

Somebody told me that type-casting C conversions does only change how the system interprets the information (for example, casting the char 'A' into int does return 65 when using cout to print it since in
memory it stays as 01000001).
However, I noticed that, when casting floating point numbers into same width integers, the value is conserved and not changed, as it would be if only the interpretation was changed.
For example, let X be a double precision floating point number:
double X = 3.14159;
As far as I now, when inspecting &X we will find (converted by decimal to binary converter):
01000000 00001001 00100001 11111001 11110000 00011011 10000110 01101110
But, as some of you would already know, when doing:
long long Y = (long long)X;
Y will be 3, the truncated version of X, instead of 4614256650576692846, the value it would get when looking at the binary values at &X if looking for a long long.
So, I think it is clear that they were wrong but, then, how does casting work in low level? Is there any detection of whether the value would be changed or not? How would you code it to get Y = 4614256650576692846 instead of Y = 3?

Casting will try to preserve the values as precise as possible.
You can use memcpy() to copy bit patterns.
#include <iostream>
#include <cstring>
int main() {
double X = 3.14159;
long long Y;
memcpy(&Y, &X, sizeof(Y));
std::cout << Y << '\n';
return 0;
}

Casting lets the compiler decide how to change the data in order for it to be as useful as possible yet respecting the requested datatype.
The int to char conversion just changes the interpretation from, let us say, 65 to 'A'.
However, when we have a value we may want to conserve, the compiler will use special instructions for its conversion.
For example, when casting from double to long long, the processor will use the CVTTSD2SI instruction, which loads and truncates a FP register's value into a general purpose one:
double a = 3.14159;
long long b = (long long)a;
will have a disassembly of (I got rid of the stack pointers for ease of understanding):
movsd xmm0, QWORD PTR [a]
cvttsd2si rax, xmm0
mov QWORD PTR [b], rax
So, the ways to use the original value would be as mentioned in the selected answer: dereferencing the pointer to the double and place it into the long long variable or, as other stated, using memcpy().

If you want to get Y = 4614256650576692846, you can use:
double X = 3.14159;
long long Y = *( (long long*)(&X) );
This will cast a double pointer to a long long pointer, and then the compiler thinks that (long long*)(&X) is somewhere a long long stores.
But I don't advise you to do so because the result is based on how double is stored on your machine, and the result is not guaranteed to be 4614256650576692846.

Confusion about float data type declaration in C++

a complete newbie here. For my school homework, I was given to write a program that displays -
s= 1 + 1/2 + 1/3 + 1/4 ..... + 1/n
Here's what I did -
#include<iostream.h>
#include<conio.h>
void main()
{
clrscr();
int a;
float s=0, n;
cin>>a;
for(n=1;n<=a;n++)
{
s+=1/n;
}
cout<<s;
getch();
}
It perfectly displays what it should. However, in the past I have only written programs which uses int data type. To my understanding, int data type does not contain any decimal place whereas float does. So I don't know much about float yet. Later that night, I was watching some video on YouTube in which he was writing the exact same program but in a little different way. The video was in some foreign language so I couldn't understand it. What he did was declared 'n' as an integer.
int a, n;
float s=0;
instead of
int a
float s=0, n;
But this was not displaying the desired result. So he went ahead and showed two ways to correct it. He made changes in the for loop body -
s+=1.0f/n;
and
s+=1/(float)n;
To my understanding, he declared 'n' a float data type later in the program(Am I right?). So, my question is, both display the same result but is there any difference between the two? As we are declaring 'n' a float, why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. And in the second method, why we can't write 1(float)/n instead of 1/(float)n? As in the first method we have added float suffix with 1. Also, is there a difference between 1.f and 1.0f?
I tried to google my question but couldn't find any answer. Also, another confusion that came to my mind after a few hours is - Why are we even declaring 'n' a float? As per the program, the sum should come out as a real number. So, shouldn't we declare only 's' a float. The more I think the more I confuse my brain. Please help!
Thank You.

The reason is that integer division behaves different than floating point division.
4 / 3 gives you the integer 1. 10 / 3 gives you the integer 3.
However, 4.0f / 3 gives you the float 1.3333..., 10.0f / 3 gives you the float 3.3333...
So if you have:
float f = 4 / 3;
4 / 3 will give you the integer 1, which will then be stored into the float f as 1.0f.
You instead have to make sure either the divisor or the dividend is a float:
float f = 4.0f / 3;
float f = 4 / 3.0f;
If you have two integer variables, then you have to convert one of them to a float first:
int a = ..., b = ...;
float f = (float)a / b;
float f = a / (float)b;
The first is equivalent to something like:
float tmp = a;
float f = tmp / b;

Since n will only ever have an integer value, it makes sense to define it as as int. However doing so means that this won't work as you might expect:
s+=1/n;
In the division operation both operands are integer types, so it performs integer division which means it takes the integer part of the result and throws away any fractional component. So 1/2 would evaluate to 0 because dividing 1 by 2 results in 0.5, and throwing away the fraction results in 0.
This in contrast to floating point division which keeps the fractional component. C will perform floating point division if either operand is a floating point type.
In the case of the above expression, we can force floating point division by performing a typecast on either operand:
s += (float)1/n
Or:
s += 1/(float)n
You can also specify the constant 1 as a floating point constant by giving a decimal component:
s += 1.0/n
Or appending the f suffix:
s += 1.0f/n
The f suffix (as well as the U, L, and LL suffixes) can only be applied to numerical constants, not variables.

What he is doing is something called casting. I'm sure your school will mention it in new lectures. Basically n is set as an integer for the entire program. But since integer and double are similar (both are numbers), the c/c++ language allows you to use them as either as long as you tell the compiler what you want to use it as. You do this by adding parenthesis and the data type ie
(float) n

he declared 'n' a float data type later in the program(Am I right?)
No, he defined (thereby also declared) n an int and later he explicitly converted (casted) it into a float. Both are very different.
both display the same result but is there any difference between the two?
Nope. They're the same in this context. When an arithmetic operator has int and float operands, the former is implicitly converted into the latter and thereby the result will also be a float. He's just shown you two ways to do it. When both the operands are integers, you'd get an integer value as a result which may be incorrect, when proper mathematical division would give you a non-integer quotient. To avoid this, usually one of the operands are made into a floating-point number so that the actual result is closer to the expected result.
why he has written 1.0f instead of n.f or f.n. I tried it but it gives error. [...] Also, is there a difference between 1.f and 1.0f?
This is because the language syntax is defined thus. When you're declaring a floating-point literal, the suffix is to use .f. So 5 would be an int while 5.0f or 5.f is a float; there's no difference when you omit any trailing 0s. However, n.f is syntax error since n is a identifier (variable) name and not a constant number literal.
And in the second method, why we can't write 1(float)/n instead of 1/(float)n?
(float)n is a valid, C-style casting of the int variable n, while 1(float) is just syntax error.

s+=1.0f/n;
and
s+=1/(float)n;
... So, my question is, both display the same result but is there any difference between the two?
Yes.
In both C and C++, when a calculation involves expressions of different types, one or more of those expressions will be "promoted" to the type with greater precision or range. So if you have an expression with signed and unsigned operands, the signed operand will be "promoted" to unsigned. If you have an expression with float and double operands, the float operand will be promoted to double.
Remember that division with two integer operands gives an integer result - 1/2 yields 0, not 0.5. To get a floating point result, at least one of the operands must have a floating point type.
In the case of 1.0f/n, the expression 1.0f has type float1, so the n will be "promoted" from type int to type float.
In the case of 1/(float) n, the expression n is being explicitly cast to type float, so the expression 1 is promoted from type int to float.
Nitpicks:
Unless your compiler documentation explicitly lists void main() as a legal signature for the main function, use int main() instead. From the online C++ standard:
3.6.1 Main function
...
2 An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a declared return type of type int, but otherwise its type is implementation-defined...
Secondly, please format your code - it makes it easier for others to read and debug. Whitespace and indentation are your friends - use them.
1. The constant expression 1.0 with no suffix has type double. The f suffix tells the compiler to treat it as float. 1.0/n would result in a value of type double.

Getting an int instead of a float

I am doing something like this
int a = 3;
int b = 4;
float c = a/b ; //This returns 0 while its suppose to return 0.75
I wanted to know why the above code doesn't work ? I realize that 3 is an int and 4 is an int too. However the result is a float which is being assigned to float. However I am getting a 0 here. Any suggestions on what I might be doing wrong ?

The division is evaluated first, and because it is two integer operands, it evaluates to an integer... which then only get assigned to a float.
This is due to a predefined set of rules that decreases in type complexity. To force the result to be of a particular type (at least), at least one of the operands needs to be of that type. (via a static_cast< > )
Thus:
float c = a / static_cast<float>(b);

float c = a/b ;
a and b are integers, so it is integer division.
From the C++ standard:
5.6 Multiplicative operators [expr.mul]
For integral operands the / operator yields the algebraic quotient with any fractional part discarded.
Instaed, try this:
float c = a / static_cast<float>(b);
(As #TrevorHickey suggested, static_cast<float> is better than old-style (float) cast.)

You cant divide two ints and receive a float. You either have to cast to a float or have the types as a float.
float a = 3;
float b = 4;
float c = a/b;
or
float c = (float)a/(float)b;

HINT: the result from integer division is integer. The result of the division is then assigned to a float. That is a/b results in an int. Cast that however you want, but you aren't gonna get 0.75 out of it.

If you are working in C++, you should use the static_cast method over the implicit cast.
This will ensure that the type can be safely cast at compile time.
float c = a/static_cast<float>(b);

Is -!(condition) a correct way to obtain a full-bitvector from a boolean (mask-boolean)?

In removing conditional branches from high-performance code, converting a true boolean to unsigned long i = -1 to set all bits can be useful.
I came up with a way to obtain this integer-mask-boolean from input of a int b (or bool b) taking values either 1 or 0:
unsigned long boolean_mask = -(!b);
To get the opposite value:
unsigned long boolean_mask = -b;
Has anybody seen this construction before? Am I on to something? When a int value of -1 (which I assume -b or -(!b) does produce) is promoted to a bigger unsigned int type is it guaranteed to set all the bits?
Here's the context:
uint64_t ffz_flipped = ~i&~(~i-1); // least sig bit unset
// only set our least unset bit if we are not pow2-1
i |= (ffz_flipped < i) ? ffz_flipped : 0;
I will inspect the generated asm before asking questions like this next time. Sounds very likely the compiler will not burden the cpu with a branch here.

The question you should be asking yourself is this: If you write:
int it_was_true = b > c;
then it_was_true will be either 1 or 0. But where did that 1 come from?
The machine's instruction set doesn't contain an instruction of the form:
Compare R1 with R2 and store either 1 or 0 in R3
or, indeed, anything like that. (I put a note on SSE at the end of this answer, illustrating that the former statement is not quite true.) The machine has an internal condition register, consisting of several condition bits, and the compare instruction -- and a number of other arithmetic operations -- cause those condition bits to be modified in specific ways. Subsequently, you can do a conditional branch, based on some condition bits, or a conditional load, and sometimes other conditional operations.
So actually, it could be a lot less efficient to store that 1 in a variable than it would have been to have directly done some conditional operation. Could have been, but maybe not, because the compiler (or at least, the guys who wrote the compiler) may well be cleverer than you, and it might just remember that it should have put a 1 into it_was_true so that when you actually get around to checking the value, the compiler can emit an appropriate branch or whatever.
So, speaking of clever compilers, you should take a careful look at the assembly code produced by:
uint64_t ffz_flipped = ~i&~(~i-1);
Looking at that expression, I can count five operations: three bitwise negations, one bitwise conjunction (and), and one subtract. But you won't find five operations in the assembly output (at least, if you use gcc -O3). You'll find three.
Before we look at the assembly output, let's do some basic algebra. Here's the most important identity:
-X == ~X + 1
Can you see why that's true? -X, in 2's complement, is just another way of saying 2n - X, where n is the number of bits in the word. In fact, that's why it's called "2's complement". What about ~X? Well, we can think of that as the result of subtracting every bit in X from the corresponding power of 2. For example, if we have four bits in our word, and X is 0101 (which is 5, or 22 + 20), then ~X is 1010 which we can think of as 23×(1-0) + 22×(1-1) + 21×(1-0) + 20×(1-1), which is exactly the same as 1111 − 0101. Or, in other words:
−X == 2n − X
~X == (2n−1) − X
which means that
~X == (−X) − 1
Remember that we had
ffz_flipped = ~i&~(~i-1);
But we now know that we can change ~(~i−1) into minus operations:
~(~i−1)
== −(~i−1) − 1
== −(−i - 1 - 1) − 1
== (i + 2) - 1
== i + 1
How cool is that! We could have just written:
ffz_flipped = ~i & (i + 1);
which is only three operations, instead of five.
Now, I don't know if you followed that, and it took me a bit of time to get it right, but now let's look at gcc's output:
leaq 1(%rdi), %rdx # rdx = rdi + 1
movq %rdi, %rax # rax = rdi
notq %rax # rax = ~rax
andq %rax, %rdx # rdx &= rax
So gcc just went and figured all that out on its own.
The promised note about SSE: It turns out that SSE can do parallel comparisons, even to the point of doing 16 byte-wise comparisons at a time between two 16-byte registers. Condition registers weren't designed for that, and anyway no-one wants to branch when they don't have to. So the CPU does actually change one of the SSE registers (a vector of 16 bytes, or 8 "words" or 4 "double words", whatever the operation says) into a vector of boolean indicators. But it doesn't use 1 for true; instead, it uses a mask of all 1s. Why? Because it's likely that the next thing the programmer is going to do with that comparison result is use it to mask out values, which I think is just exactly what you were planning to do with your -(!B) trick, except in the parallel streaming version.
So, rest assured, it's been covered.

Has anybody seen this construction before? Am I on to something?
Many people have seen it. It's old as rocks. It's not unusual but you should encapsulate it in an inline function to avoid obfuscating your code.
And, verify that you compiler is actually producing branches on the old code, and that it is configured properly, and that this micro-optimization actually improves performance. (And it's a good idea to keep notes on how much time each optimization strategy cuts.)
On the plus side, it is perfectly standard-compliant.
When a int value of -1 (which I assume -b or -(!b) does produce) is promoted to a bigger unsigned int type is it guaranteed to set all the bits?
Just be careful that b is not already unsigned. Since unsigned numbers are always positive, the result of casting -1u is not special and won't be extended with more ones.
If you have different sizes and want to be anal, try this:
template< typename uint >
uint mask_cast( bool f )
{ return static_cast< uint >( - ! f ); }

struct full_mask {
bool b;
full_mask(bool b_):b(b_){}
template<
typename int_type,
typename=typename std::enable_if<std::is_unsigned<int_type>::value>::type
>
operator int_type() const {
return -b;
}
};
use:
unsigned long long_mask = full_mask(b);
unsigned char char_mask = full_mask(b);
char char_mask2 = full_mask(b); // does not compile
basically I use the class full_mask to deduce the type we are casting to, and automatically generate a bit-filled unsigned value of that type. I tossed in some SFINAE code to detect that the type I'm converting to is an unsigned integer type.

You can convert 1 / 0 to 0 / -1 just by decrementing. That inverts the boolean condition, but if you can generate the inverse of the boolean in the first place, or use the inverse of the mask, then it's only a single operation instead of two.

Is 'const' double copying + comparison safe?

I've noticed there's a lot of discussion on the topic of floating-point computation errors which require you to use more complex comparison than ==. However, all those articles seem to be assuming the value is manipulated (or double-calculated) somehow, while I didn't see an example covering a very simple constant copying.
Please consider the following:
const double magical_value = -10;
class Test
{
double _val;
public:
Test()
: _val(magical_value)
{
}
bool is_special()
{
return _val == magical_value;
}
};
As far as I understand this, magical_value should be set at compile time, so that all rounding occurs at that point. Afterwards, the value should just be copied to the class, and compared with the original one. Is such a comparison guaranteed to be safe? Or can either copying or comparing introduce errors here?
Please do not suggest alternative comparison or magical value use methods, that's another topic. I'm just curious about this assumption.
Edit: just to note, I am a little afraid that on some architectures, the optimizations could result in copying the value to a differently-sized floating-point registers, thus introducing differences in the exact values. Is there a risk of something like that?

Is such a comparison guaranteed to be safe? Or can either copying or comparing introduce errors here?
Yes, safe (this is a requirement of the copy operation as implied by =). There are no conversions/promotions that you need to worry about as long as the source and destination types are same.
However, note that magical_value may not contain 10 exactly but an approximation. This approximation will get copied over to _val.
Given the const qualifier, chances are that magical_value will probably be optimized away (should you turn on optimizations) or used as-is (i.e. no memory will probably be used up).

Apart from possibly different-sized registers, you have denormalized floating point (cq flush-to-zero) to worry about (see Why does changing 0.1f to 0 slow down performance by 10x?)
Just to give an idea of the weirdness this could lead to, try this bit of code:
float a = 0.000000000000000000000000000000000000000047683384;
const float b = 0.000000000000000000000000000000000000000047683384;
float aa = a, bb = b;
#define SUPPORT_DENORMALIZATION ({volatile double t=DBL_MIN/2.0;t!=0.0;})
printf("support denormals: %d\n",SUPPORT_DENORMALIZATION);
printf("a = %.48f, aa = %.48f\na==aa %d, a==0.0f %d, aa==0.0f %d\n",a,aa,a==aa,a==0.0f,aa==0.0f);
printf("b = %.48f, bb = %.48f\nb==bb %d, b==0.0f %d, bb==0.0f %d\n",b,bb,b==bb,b==0.0f,bb==0.0f);
which gives either: (compiled without flush-to-zero)
support denormals: 1
a = 0.000000000000000000000000000000000000000047683384, aa = 0.000000000000000000000000000000000000000047683384
a==aa 1, a==0.0f 0, aa==0.0f 0
b = 0.000000000000000000000000000000000000000047683384, bb = 0.000000000000000000000000000000000000000047683384
b==bb 1, b==0.0f 0, bb==0.0f 0
or: (compiled with gcc -ffast-math)
support denormals: 0
a = 0.000000000000000000000000000000000000000000000000, aa = 0.000000000000000000000000000000000000000000000000
a==aa 1, a==0.0f 1, aa==0.0f 1
b = 0.000000000000000000000000000000000000000047683384, bb = 0.000000000000000000000000000000000000000000000000
b==bb 1, b==0.0f 0, bb==0.0f 1
Where that last line is of course the odd one out: b==bb && b!=0.0f && bb==0.0f would be true.
So if you're still thinking about comparing floating point values, at least stay away from small values.
update to offset some comments about this being due to the use of floats instead of doubles, it also works for double, but you would need to set the constant to somewhere below DBL_MIN, e.g. 1e-309.
update 2 a code sample relating to some comments made below. This shows that the problem exists for doubles as well, and that comparisons can become inconsistent (when flush to zero is enabled)
double a;
const double b = 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001225;
const double c = 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225;
printf("b==c %d\n",b==c);
a = b;
printf("assigned a=b: a==b %d\n",a==b);
a = c;
printf("assigned a=c: a==b %d\n",a==b);
output:
b==c 0
assigned a=b: a==b 1
assigned a=c: a==b 1
The issue shows in the last line, where you would naively expect that a==b would become false after assigning a=c with c!=b.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to cast perforrmantly when dividing float and int? - casting

Related

C/C++ What does casting do in the low level?

Confusion about float data type declaration in C++

Getting an int instead of a float

Is -!(condition) a correct way to obtain a full-bitvector from a boolean (mask-boolean)?

Is 'const' double copying + comparison safe?

Categories

Resources