Summing up float number loses precision with type conversion - c++

I tried to add two digits with different weights. Here is my code:
void onTimeStepOp::updatePointsType1_2(boost::tuples::tuple<float,int,int,int> &_prev,
boost::tuples::tuple<float,int,int,int> &_result,
boost::tuples::tuple<float,float> weights)
{
_result.get<0>() = _result.get<0>() * weights.get<0>() + _prev.get<0>() * weights.get<1>();
std::cout<<"deb:"<<(float)_result.get<2>() * weights.get<0>()<<" "<<(float)_prev.get<2>() * weights.get<1>()<<std::endl;
_result.get<2>() = (int)((float)(_result.get<2>()) * weights.get<0>() + (float)(_prev.get<2>()) * weights.get<1>());
std::cout<<"deb2:"<<(float)_result.get<3>() * weights.get<0>() <<" "<< (float)_prev.get<3>() * weights.get<1>()<<std::endl;
_result.get<3>() = (int)((float)(_result.get<3>()) * weights.get<0>() + (float)(_prev.get<3>()) * weights.get<1>());
}
weights.get<0> = 0.3,weights.get<1> = 0.7.
The output I get looks like this:
resultBefore=36.8055 4 69 91 previousPPos=41.192 4 69 91
deb:20.7 48.3
deb2:27.3 63.7
resultAfter=39.8761 4 **68** 91
The third number should be 69(69 * 0.3 + 69 * 0.7). However, it is 68 instead. What's the problem with the type conversion expression?

Conversion to int truncates, so the slightest rounding error could cause you to be one off. Rather than converting directly to int, you might want to use the function round.
I might add that weights.get<0> is certainly not 0.3, and weights.get<1> is certainly not 0.7, since neither 0.3 nor 0.7 are representable in machine floating point (at least not on any machine you're likely to be using).

You should round() instead of just casting to int. Casting trims everything after the decimal point, and the number due to rounding error may be something like 68.99999999991 (just an example but gives the idea).

Casting to int will result the number before the dot, so 68.1..68.9 will be all 68 as written before.
Another solution could be, which is not so nice, that is to add 0.5 to your float value before casting. So 68.1 will be 68.6, which will be still 68, but 68.5 will be 69 which will be 69.

Related

Faulty Function in C++

Currently I have a function in an application which takes in a float as a parameter and should perform a simple multiplication and division on the value passed in. Before the value is passed into the function in the application, it is typecast to a float as the particulars of the main application deal with the numerical data in ints. Unfortunately when I pass in the value of 0.0 to the function, it does not generate an output of 1.0 (which it should from the calculation the function performs) but merely outputs a value of 0.0 and I was wondering why the calulation was failing to produce the correct output as the program compiles and the calculation is correct as far as I'm aware.
Here is the code:
void CarPositionClass::centre(float inputPos)
{
if ((inputPos <= 0) && (inputPos >= -125))
{
membershipC = ((inputPos + 125)*(1 / 125));
}
}
It should also be noted that membershipC is a float variable that is a member of the CarPositionClass.
Change 1 / 125 to, say, 1.0 / 125. 1 / 125 uses integer division, so the result is 0.
Or change this expression
((inputPos + 125)*(1 / 125))
to
(inputPos + 125) / 125
Since inputPos is floating point, so is inputPos + 125, and then dividing a float by an integer is a float.
P.S. This is surely a duplicate question. I expect the C++ gurus to lower the dup hammer any second now. :)
The division between two integers results in an integer. At least one operand has to be a floating point type for it not to truncate the result:
membershipC = ((inputPos + 125)*(1.0 / 125));
// ^^^

Arithmetic operation gives incorrect result

I might be missing something very basic here. But I don't know how to figure out that basic thing. When I set T to 10 and dt to 0.1, I should get the result 101 but I am getting the result as 100. Why is it so?
n_sim_steps = (int)(T/dt) + 1
Furthermore, if I execute this as a watch in eclipse, it returns 101, but in code it results in 100.
It should be
n_sim_steps = (int)(T/dt + 0.5) + 1
You are a victim of precission loss
10 / 0.1 may be 99.999999999999 because of this loss and may be casted back to int as 99. Adding 0.5 and then casting would make sure that the result is rounded.
You better to use ceil function.
function signature
double ceil (double x);
like ceil(2.3) will results 3

appending anything but a string to std::stringstream returns 0

Code in question:
std::stringstream cd;
int f = int((15 / allyCounter) * 100);
cd << f;
allyCounter is equal to 45. the idea is to get what percentage 15 is of allyCounter where allyCounter is a dynamic int that is constantly changing. i don't have much experience with c++ so I'm sure what I'm missing is something fundamental.
The problem here is almost certainly with integer vs. floating point math.
15/45 (when done in integer math) is 0.
Try
std::stringstream cd;
int f = int((15.0 / allyCounter) * 100);
cd << f;
...and see if things aren't better. 15.0 is a double precision floating point constant, so that'll force the math to be done in floating point instead of integers, so you'll get a percentage.
Another possibility would be to do the multiplication ahead of the division:
int f = 1500 / allyCounter;
If the numerator were a variable, this could lead to a problem from the numerator overflowing, but in this case we know it's a value that can't overflow.
In C++, 15 / 45 is 0. (It's called "integer division": the result of dividing two ints in C++ is also an int, and thus the real answer is truncated, and 15 / 45 is 0.)
If this is your issue, just make it a double before doing the division:
int f = static_cast<double>(15) / allyCounter * 100;
or even:
int f = 15. / allyCounter * 100;
(The . in 15. causes that to be a double.)
You are using integer division:
std::stringstream cd;
int f = int((15.0 / allyCounter) * 100);
cd << f;
The compiler sees 15/allyCounter and thinks it should return an integer (you passed it two integers, right?). 15/150 == 0 with integer division, you always round down. In this case the compiler sees 15.0 as a double, and uses decimal places.

How does floating-point arithmetic work when one is added to a big number?

If we run this code:
#include <iostream>
int main ()
{
using namespace std;
float a = 2.34E+22f;
float b = a+1.0f;
cout<<"a="<<a<<endl;
cout<<"b-a"<<b-a<<endl;
return 0;
}
Then the result will be 0, because float number has only 6 significant digits. But float number 1.0 tries to be added to 23 digit of number. So, how program realizes that there is no place for number 1, what is the algorithm?
Step by step:
IEEE-754 32-bit binary floating-point format:
sign 1 bit
significand 23 bits
exponent 8 bits
I) float a = 23400000000.f;
Convert 23400000000.f to float:
23,400,000,000 = 101 0111 0010 1011 1111 1010 1010 0000 00002
= 1.01011100101011111110101010000000002 • 234.
But the significand can store only 23 bits after the point. So we must round:
1.01011100101011111110101 010000000002 • 234
≈ 1.010111001010111111101012 • 234
So, after:
float a = 23400000000.f;
a is equal to 23,399,991,808.
II) float b = a + 1;
a = 101011100101011111110101000000000002.
b = 101011100101011111110101000000000012
= 1.01011100101011111110101000000000012 • 234.
But, again, the significand can store only 23 binary digits after the point. So we must round:
1.01011100101011111110101 000000000012 • 234
≈ 1.010111001010111111101012 • 234
So, after:
float b = a + 1;
b is equal to 23,399,991,808.
III) float c = b - a;
101011100101011111110101000000000002 - 101011100101011111110101000000000002 = 0
This value can be stored in a float without rounding.
So, after:
float c = b - a;
с is equal to 0.
The basic principle is that the two numbers are aligned so that the decimal point is in the same place. I'm using a 10 digit number to make it a little easier to read:
a = 1.234E+10f;
b = a+1.0f;
When calculating a + 1.0f, the decimal points need to be lined up:
1.234E+10f becomes 1234000000.0
1.0f becomes 1.0
+
= 1234000001.0
But since it's float, the 1 on the right is outside the valid range, so the number stored will be 1.234000E+10- any digits beyond that are lost, because there is just not enough digits.
[Note that if you do this on an optimizing compiler, it may still show 1.0 as a difference, because the floating point unit uses a 64- or 80-bit internal representation, so if the calculation is done without storing the intermediate results in a variable (and a decent compiler can certainly achieve that here) With 2.34E+22f it is guaranteed to not fit in a 64-bit float, and probably not in a 80-bit one either].
When adding two FP numbers, they're first converted to the same exponent. In decimal:
2.34000E+22 + 1.00000E0 = 2.34000E22 + 0.000000E+22. In this step, the 1.0 is lost to rounding.
Binary floating point works pretty much the same, except that E+22 is replaced by 2^77.

Could anyone tell me why float can't hold 3153600000?

I know this is stupid but I'm a quiet a noob in a programming world here is my code.
This one works perfectly:
#include <stdio.h>
int main() {
float x = 3153600000 ;
printf("%f", x);
return 0;
}
But this one has a problem:
#include <stdio.h>
int main() {
float x = 60 * 60 * 24 * 365 * 100 ;
printf("%f", x);
return 0;
}
So 60 * 60 * 24 * 365 * 100 is 3153600000 right ??? if yes then why does it produced different results ??? I got the overflow in the second one it printed "-1141367296.000000" as a result. Could anyone tell me why ?
You're multiplying integers, then putting the result in a float. By that time, it has already overflowed.
Try float x = 60.0f * 60.0f * 24.0f * 365.0f * 100.0f;. You should get the result you want.
60 is an integer, as are 24, 365, and 100. Therefore, the entire expression 60 * 60 * 24 * 365 * 100 is carried out using integer arithmetic (the compiler evaluates the expression before it sees what type of variable you're assigning it into).
In a typical 32-bit architecture, a signed integer can only hold values up to 2,147,483,647. So the value would get truncated to 32 bits before it gets assigned into your float variable.
If you tell the compiler to use floating-point arithmetic, e.g. by tacking f onto the first value to make it float, then you'll get the expected result. (A float times an int is a float, so the float propagates to the entire expression.) E.g.:
float x = 60f * 60 * 24 * 365 * 100;
Doesn't your compiler spit this warning? Mine does:
warning: integer overflow in
expression
The overflow occurs before the all-integer expression is converted to a float before being stored in x. Add a .0f to all numbers in the expression to make them floats.
If you multiply two integers, the result will be an integer too.
60 * 60 * 24 * 365 * 100 is an integer.
Since integers can go up to 2^31-1 (2147483647) such values overflows and becomes -1141367296, which is only then converted to float.
Try multiplying float numbers, instead of integral ones.