something wrong with my float operation - c++

i must be missing something, whats wrong with this?
float controlFrameRate = 1/60;
It should be assigning something like 0.0166666667 but its coming out 0.00000 etc. is visual studio just lying to me?

That is because 1/60 is an integer, which is 0 because integer division truncates. This is used to initialize the float, giving 0. You can fix it by making the RHS expression a float in the first place:
float controlFrameRate = 1.0f/60;
of
float controlFrameRate = 1/60.0f;
In C++, literals such as 1, 42 etc. are int, 1.0, 3.1416 are double, and the f in 1.0f makes the literal a float. Note that the f could have been omitted in the examples above. However, assigning a double to a float could be problematic if the double's value goes beyond the range of a float.

A division of an integer by another integer yields an integer, and is a truncating operation. You'll get the value that is less than or equal to the actual value.
Make at least one of the constants floating-point to fix it:
float controlFrameRate = 1.0 / 60;
float controlFrameRate = 1 / 60.0;
float controlFrameRate = 1.0 / 60.0;

Do
float controlFrameRate = 1.f/60;
or
float controlFrameRate = 1/60.f;
or
float controlFrameRate = 0.1f/6;
;-)

You should use 1/60.0. otherwise you will not get float result.

Related

Loss of precision when casting float to double

I guess I'm hitting a precision issue with my c++ program. And I don't understand why I'm getting different results in my values.
res equals to 1321.0000001192093 if I write:
float sy = -0.207010582f;
double res = -1512.*((double)sy - (2. / 3.));
but res2 equals to 1320.9999999839999 if I write:
double res2 = -1512.*(-0.207010582 - (2. / 3.));
Why even syd is different from syd2 when I write this:
double syd = -0.207010582f;
double syd2 = -0.207010582000000000;
Can somebody give me a hand, to cast my float into a double properly and to understand what's going on ?
-0.207010582f is a decimal floating-point literal. But your computer doesn't use decimal floating point, it uses binary floating point. So the value of that literal will be rounded to float precision.
Similarly, -0.207010582 is rounded to double precision. While that's closer, it still is not equal to -0.207010582 decimal.
Since double has more precision than float, you will not lose precision by casting from float to double. Any rounding will have happened earlier.
Single-Precision
As others have said, float sy = -0.207010582f; initializes a single-precision (32-bit) floating point variable from a single-precision floating point literal.
This will be treated (in storage and calculations) as the nearest representable number in that format. This number is -0.20701058208942413330078125
You code is effectively then float sy = -0.20701058208942413330078125;
You can confirm that this is the nearest representable value by looking at the adjacent single-precision floating point numbers.
-0.20701059699058532714843750 // std::nextafter( sy, std::numeric_limits<float>::lowest() )
-0.20701058208942413330078125 // sy
-0.20701056718826293945312500 // std::nextafter( sy, std::numeric_limits<float>::max() )
Double-Precision
Exactly the same occurs with double-precision floating point numbers, it's just their increased resolution means the differences are small.
e.g double dy = -0.207010582; actually represents the value 0.20701058199999999853702092877938412129878997802734375
Similarly, the adjacent values that can be represented are -
-0.2070105820000000262925965444082976318895816802978515625 // std::nextafter( dy, std::numeric_limits<double>::lowest() )
-0.2070105819999999985370209287793841212987899780273437500 // dy
-0.2070105819999999707814453131504706107079982757568359375 // std::nextafter( dy, std::numeric_limits<double>::max() )
Single to Double Conversion
All single precision floating point values are exactly representable in double-precision. Hence, nothing is lost in conversions from single to double precision.
All the above assumes IEEE754 floating-point representation.

Multi line math results different than single line

I was having an issue with some floating point math and I've found that if I do my math on one line, I get -0 passed to tan(), and if I do it across two lines, I get 0 passed to tan(). Have a look:
float theta = PI / 2.f;
float p = (PI / 2.f) - theta;
float result = tan(p);
The above, p = -0, result = -4.37...
float theta = PI / 2.f;
float p = PI / 2.f;
p -= theta;
float result = tan(p);
The above, p = 0, result = 0.
Can anyone explain the difference? I assume the -0 is causing that result from tan(), although I can't find anything on google that explains why. Why does the exact same calculation spread across different lines result in a different answer?
Thanks
It is probably because of the type of PI.
If you use double it will change to float and then the outcome
will be as you just represent.
But if PI is float both of this test scenarios are equal.
What #Naor says is probably correct. but I'd like to add something.
You probably not getting -4.37xx but -4.37xxxe-xx which is a pretty small negative number.
Since you can always get errors in floating point math. I'd say there is no need to change your code. Both snips are correct.
So this is what, in my opinion, is happening:
In both examples PI is a define, probably defined like this:
#define 3.14 //and some more digits
In C++, number like this is treated as double.
After preprocessing, this expression:
PI / 2.0f
will be treated as double-typed prvalue. This means that this line hides one more operation:
float theta = PI / 2.f;
which is a double-to-float conversion, which definitely looses some precision in this case.
In first example this also happens here:
float p = (PI / 2.f) - theta;
but only after evaluating whole expression. Note that during this evaluation (PI / 2.f) will be still double, but theta will be a float-to-double converted value, which explains the slight difference in result from 0.0.
In your last example you first convert (PI / 2.f) to float:
float p = PI / 2.f;
to subtract float-typed theta from it in next line. Which must result to 0.0, which probably compiler optimized out anyway ; ).

A small number is rounded to zero

I have the following values:
i->fitness = 160
sum_fitness = 826135
I do the operation:
i->roulette = (int)(((i->fitness / sum_fitness)*100000) + 0.5);
But i keep getting 0 in i->roulette.
I also tried to save i->fitness / sum_fitness in a double variable and only then applying the other operations, but also this gets a 0.
I'm thinking that's because 160/826135 is such a small number, then it rounds it down to 0.
How can i overcome this?
Thank you
edit:
Thanks everyone, i eventually did this:
double temp = (double)(i->fitness);
i->roulette = (int)(((temp / sum_fitness)*100000) + 0.5);
And it worked.
All the answers are similar so it's hard to choose one.
You line
i->roulette = (int)(((i->fitness / sum_fitness)*100000) + 0.5);
is casting the value to int which is why any float operation is truncated
try
i->roulette = (((i->fitness / sum_fitness)*100000) + 0.5);
and make sure that either 'sum_fitness' or 'i->fitness' is of of a float or double type to make the division a floating point division -- if they are not you will need to cast one of them before dividing, like this
i->roulette = (((i->fitness / (double)sum_fitness)*100000) + 0.5);
If you want to make this as a integer calculation you could also try to change the order of the division and multiplication, like
i->roulette = ( i->fitness *100000) / sum_fitness;
which would work as long as you don't get any integer overflow, which in your case would occur only if fitness risk to be above 2000000.
I'm thinking that's because 160/826135 is such a small number, then it rounds it down to 0.
It is integer division, and it is truncated to the integral part. So yes, it is 0, but there is no rounding. 99/100 would also be 0.
You could fix it like by casting the numerator or the denominator to double:
i->roulette = ((i->fitness / static_cast<double>(sum_fitness))*100000) + 0.5;

How to calculate double variable with 10 decimal precision in C++

I'm going to calculate double type with 10 decimal point precision. of course, I hope the result precesion has same 10 decimal precision. However, it doesn't work, only 6 decimal point is possible in VS2008. Is there any reason or any idea ?
double dMag = 10;
double dPixelSize = 4.4;
double PixelWidth = 1024/2;
double PixelHeight = 768/2;
double umtomm = 1000;
double origin_PosX = 813.227696;
double origin_PosY = 676.195748;
double PosX = (origin_PosX - PixelWidth) * dPixelSize / dMag / umtomm;
double PosY = (origin_PosY - PixelHeight ) * dPixelSize / dMag / umtomm;
If I check PosX and PosY, then results are "0.132540, 0.128566" respectively. I expect that results are "0.13254018624000002, 0.12856612912" respectively.
thanks
From what I gather you are confusing a few things...
The precision of double is more than just 6 decimal places. I strongly recommend reading this wikipedia article which explains how the underlying double datatype stores your number and explains its precision.
The precision that is printed is limited to 6 decimal places by default. In order to fix that you need to learn how to use the format string.
In your comments you mentioned using
tmp1.Format("%f,%f\r\n", PosX, PosY); file.Write(tmp1,lstrlen(tmp1));
to print the strings.
Try changing this line to
tmp1.Format("%.10f,%.10f\r\n", PosX, PosY); file.Write(tmp1,lstrlen(tmp1));
and you will see that it will print 10 digits after the decimal point.

C/C++ - How to convert from a signed 32bit integer to a float and back

I need to be able to convert a C SInt32 integer to a float in the range [-1, 1] and back. I've seen discussions of this question regarding 24 bit integers:
C/C++ - Convert 24-bit signed integer to float
And I've tried something similar:
// Convert int - float
SInt32 integer = 1;
Float32 factor = 1;
Float32 f = integer / (0x7FFFFFF + 0.5);
// Perform some processing on the float
Process(f);
// Scale the float
f = f * factor;
// Convert float - int
integer = f * (0x7FFFFFF + 0.5);
However this doesn't work. I know it doesn't work because the work I'm doing involves audio programming and the conversion causes a hissing sound.
I'm pretty sure it is a conversion problem because when I make the float smaller by setting the factor to 0.0001 the crackling disappears. Maybe the back conversion is putting the int out of it's limits and is causing it to be truncated.
Any advice would be greatly appreciated.
Read up on IEEE floating point formats. The IEEE 32-bit float only supports 24 significant bits, so if you convert a 32-bit integer you will lose the low 8 bits.
const float recip = 1.0 / (32768.0*65536.0);
// hope that compiler will calculate this in advance
// From the expression an semi-advanced programmer can also immediately spot
// where the value comes from
float value = int_value * recip;
int value2 = value * (32768.0*65536.0);
The process is not reversible: one can lose up to 7 bits of accuracy.