int conversion -- gives me weird numbers - c++

I'm new to the programming world and C++ is already losing me. I had these lines in my program
pennies=(amount-nickels*.05)/.01;
amount is a double, while and nickels are int.
The program returns the right value when pennies is a double but things are a little off (off by 1 most of the time) whenever pennies is an int.
Why is this happening?

This is happening because the value is truncated thus losing the decimal precision.
For integers:
int i;
i = 1.1 //stores 1 in i
i = 1.2 //stores 1 in i
i = 1.3 //stores 1 in i
i = 1.4 //stores 1 in i
i = 1.5 //stores 1 in i
i = 1.6 //stores 1 in i
i = 1.7 //stores 1 in i
i = 1.8 //stores 1 in i
i = 1.9 //stores 1 in i
i = 1.999999 //stores 1 in i
I would suggest you to change your expression to:
pennies=amount*100-nickels*5;

As others have pointed out, the problem is because 0.1 and
0.05 don't have an exact representations in machine floating
point. The simplest solution here is to simply round the
results, instead of truncating:
pennies = round( (amount - nickels * 0.05) / 0.01 );
A better solution in this particular case (where you are dealing
with coins) is simply to use integers everywhere, and do
everything in terms of cents, rather than dollars. This stops
working (or requires some very complex programming) very quickly
however: even things like calculating VAT or sales tax cause
problems. For such cases, there are two possible solutions:
Continue using double, rounding as necessary. This is the
simplest and most general solution. Be aware, however, that
when using it, rounding is done on binary values, and it may
in some cases differ from what you would get with rounding
decimal values. The difference should be small enough that it
won't matter for every day use, but most countries have laws
specifying how such values must be rounded, and those rounding
rules are always specified in decimal. Which means that this
solution cannot be used in cases where legal requirements hold
(e.g calculating sales tax, corporate bookkeeping, etc.).
Use some sort of decimal class. There are a number available
on the network. This will result in slower execution times.
(Typically, who cares. You'll get the results in 100
microseconds, rather than 10. But there are exceptions; e.g.
when doing Monte Carlo simulations.)
Also, you must be aware when large values are involved. For
your personal finances, it's almost certainly irrelevant, but
int does have a maximum value, and there is a point where
double can no longer represent integer values correctly.
Of course, if this is just an exercise to help in learning C++,
just use round or integers, and don't worry too much about the
rest.
And as for your comment "C++ is already losing me": this has
nothing to do with C++ per se. You'd encounter exactly the same
issues in (almost) any programming language.

You compute a float number on the right part of the expression, but then you try to store a double in an int value. When you do this, the fractional part is lost.

pennies=(amount-nickels*.05)/.01;
In the above expression, the numerator and denominator arithmetic occurs in double because of the variable amount(which is double)-Implicit Conversion happens here. At the end if pennies is double the computed result will get stored in pennies, if it is int, fractional part of the computed result will be lost. The intermediate compution will always happens in double, and it doesn't depends upon pennies type anyway.
Example
#include <stdio.h>
int main()
{
int a ;
double c,d=10.55555;
a = c= (d+ 10*.5)/0.5 ;
printf("%f : %d/n",c,a);
return 0;
}
The output is:
31.111100 : 31
Here first 10 is promoted to float then 10*.5 will get evaluated, which will get promoted to double because of addition with a double value, then the denominator also converted to double as numerator is double, hence the compution occurs in double format only.

Related

Float point numbers and incorrect result due to rounding behavior

I need to output float point numbers with two digits after the decimal point. In addition, I also need to round off the numbers. However, sometimes I don't get the results I need. Below is an example.
#include <iomanip>
#include <iostream>
using namespace std;
int main(){
cout << setprecision(2);
cout << fixed;
cout<<(1.7/20)<<endl;
cout<<(1.1/20)<<endl;
}
The results are:
0.08
0.06
Since 1.7/20=0.085 and 1.1/20=0.055. In theory I should get 0.09 and 0.06. I know it has something to do with the binary expression of floating point numbers. My questions is how can I get the right results when fixing the number of digits after the decimal point with rounding off?
Edit: This is not a duplicate of another question. Using fesetround(FE_UPWARD) will not solve the problem. fesetround(FE_UPWARD) will round (1.0/30) to 0.04 while the correct results should be 0.03. In addition, fesetround(FE_TONEAREST) doesn't help either. (1.7/20) still round to 0.08.
Edit: Now I understand that this behavior might be due to the half-to-even rounding. But how can I avoid this? Namely, if the result is exact half, it should round up.
Yes, you're right - it has to do with the representation in base 2, and the fact that sometimes the base 2 value will be higher than the base 10 number and sometimes it will be lower. But never by much!
If you want something that matches expectations more often, you can do two stage rounding. A double is generally accurate to at least 15 digits (total, including those to the left of the decimal point). Your first rounding will leave you with a number that has more stability for the second phase of rounding. No rounding is going to match the results you would get in decimal 100%, but it's possible to get very close.
double round_2digits(double d)
{
double intermediate = floor(d * 100000000000000.0 + 0.5); // round to 14 digits
return floor(intermediate / 1000000000000.0 + 0.5) / 100.0;
}
See it in action.
For a totally different approach, you can simply ensure that the base 2 number that you start with is always larger than the desired decimal, instead of being larger half the time and smaller half the time. Simply increment the least significant bit of the number with nextafter before rounding.
double round_2digits(double d)
{
return floor(100.0 * std::nextafter(d, std::numeric_limits<double>::max())) / 100.0;
}
You can define round_with_precision() method of your own, which would invoke tgmath.h provided round() method passing modified value, and then returning the value after dividing with same factor.
#include <tgmath.h>
double round_with_precision(double d, const size_t &prec)
{
d *= pow(10, prec);
return (std::round(d) / pow(10, prec));
}
int main(){
const size_t prec = 2;
cout << round_with_precision(1.7/20, prec) << endl; //prints 0.09
cout << round_with_precision(1.1/20, prec) << endl; //prints 0.06
}
The issue is due to binary floating-point representation and floating-point constants in C. The fact is that 1.7 and 1.1 are not exactly representable in binary. The ISO C standard says (I suppose that this is similar in C++): "Floating constants are converted to internal format as if at translation-time." This means that the active rounding mode (set by fesetround) will not have any influence at all for the constant (it may have an influence for the roundings that occur at run time).
The division by 20 will introduce another rounding error. Depending on the full code and compiler options, it may or may not be done at compile time, so that the active rounding mode may be ignored. In any case, if you expect 0.085 and 0.055 exactly, this is not possible because these values are not representable exactly in binary.
So, even if you have perfect code that rounds double values on 2 decimal digits, this may not work as you want, because of the rounding errors that occurred before, and it is too late to recover the information in a way that works in all cases.
If you want to be able to handle "midpoint" values such as 0.085 exactly, you need to use a number system that can represent them exactly, such as decimal arithmetic (but you may still get rounding errors in other kinds of operations). You may also want to use integers scaled by a power of 10. There is no general answer because this really depends on the application, as any workaround will have drawbacks.
For more information, see all the general articles on floating point and Goldberg's article (PDF version).

c++ incorrect floating point arithmetic

For the following program:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
for (float a = 1.0; a < 10; a++)
cout << std::setprecision(30) << 1.0/a << endl;
return 0;
}
I recieve the following output:
1
0.5
0.333333333333333314829616256247
0.25
0.200000000000000011102230246252
0.166666666666666657414808128124
0.142857142857142849212692681249
0.125
0.111111111111111104943205418749
Which is definitely not right right for the lower place digits, particularly with respect to 1/3,1/5,1/7, and 1/9. things just start going wrong around 10^-16 I would expect to see out put more resembling:
1
0.5
0.333333333333333333333333333333
0.25
0.2
0.166666666666666666666666666666
0.142857142857142857142857142857
0.125
0.111111111111111111111111111111
Is this an inherit flaw in the float class? Is there a way to overcome this and have proper division? Is there a special datatype for doing precise decimal operations? Am I just doing something stupid or wrong in my example?
There are a lot of numbers that computers cannot represent, even if you use float or double-precision float. 1/3, or .3 repeating, is one of those numbers. So it just does the best it can, which is the result you get.
See http://floating-point-gui.de/, or google float precision, there's a ton of info out there (including many SO questions) on this subject.
To answer your questions -- yes, this is an inherent limitation in both the float class and the double class. Some mathematical programs (MathCAD, probably Mathematica) can do "symbolic" math, which allows calculation of the "correct" answers. In many cases, the round-off error can be managed, even over really complex computations, such that the top 6-8 decimal places are correct. However, the opposite is true as well -- naive computations can be constructed that return wildly incorrect answers.
For small problems like division of whole numbers, you'll get a decent number of decimal place accuracy (maybe 4-6 places). If you use double precision floats, that will go up to maybe 8. If you need more... well, I'd start questioning why you want that many decimal places.
First of all, since your code does 1.0/a, it gives you double (1.0 is a double value, 1.0f is float) as the rules of C++ (and C) always extends a smaller type to the larger one if the operands of an operation is different size (so, int + char makes the char into an int before adding the values, long + int will make the int long, etc, etc).
Second floating point values have a set number of bits for the "number". In float, that is 23 bits (+ 1 'hidden' bit), and in double it's 52 bits (+1). Yet get approximately 3 digits per bit (exactly: log2(10), if we use decimal number representation), so a 23 bit number gives approximately 7-8 digits, a 53 bit number approximately 16-17 digits. The remainder is just "noise" caused by the last few bits of the number not evening out when converting to a decimal number.
To have infinite precision, we would have to either store the value as a fraction, or have an infinite number of bits. And of course, we could have some other finite precision, such as 100 bits, but I'm sure you'd complain about that too, because it would just have another 15 or so digits before it "goes wrong".
Floats only have so much precision (23 bits worth to be precise). If you REALLY want to see "0.333333333333333333333333333333" output, you could create a custom "Fraction" class which stores the numerator and denominator separately. Then you could calculate the digit at any given point with complete accuracy.

C++ float to int

Maybe, it's very simple question but I couldn't get the answer. I've been searching quite a while ( now Google think that I'm sending automated queries http://twitter.com/michaelsync/status/17177278608 ) ..
int n = 4.35 *100;
cout << n;
Why does the output become "434" instead of "435"? 4.35 * 100 = 435 which is a integer value and this should be assignable to the integer variable "n", right?
OR Does the C++ compiler cast 4.35 to integer before multiplying? I think it won't. Why does the compiler automatically change 4.35 to 4.34 which is still a float??
Thanks.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
That's really just a starting point, sadly, as then languages introduce their own foibles as to when they do type conversions, etc. In this case you've merely created a situation where the constant 4.35 can't be represented precisely, and thus 4.35*100 is more like 434.9999999999, and the cast to int does trunc, not round.
If you run this statement:
cout << 4.35
Dollars to donuts you get something approximately like 4.3499998821 because 4.35 isn't exactly representable in a float.
When the compiler casts a float to an int it truncates.
To get the behavior your expect, try:
int n = floor((4.35 * 100.0) + 0.5);
(The trickyness with floor is because C++ doesn't have a native round() function)
The internal representation of 4.35 ends up being 4.349999999 or similar. Multiplying by 100 shifts the decimal, and the .9999 is dropped off (truncated) when converting to int.
Edit: Was looking for the link Nick posted. :)
Floating point numbers don't work that way. Many (most, technically an infinite number of...) values cannot be stored or manipulated precisely as floating point. 4.35 would seem to be one of them. It's getting stored as something that's actually below 4.35, hence your result.
When a float is converted to an int the fractional part is truncated, the conversion doesn't take the nearest int to the float in value.
4.35 can't be exactly represented as a float, the nearest representable number is (we can deduce) very slightly less that 4.35, i.e. 4.34999... , so when multiplied by 100 you get 434.999...
If you want to convert a positive float to the nearest int you should add 0.5 before converting to int.
E.g.
int n = (4.35 * 100) + 0.5;
cout << n;

C++ integer floor function

I want to implement greatest integer function. [The "greatest integer function" is a quite standard name for what is also known as the floor function.]
int x = 5/3;
My question is with greater numbers could there be a loss of precision as 5/3 would produce a double?
EDIT: Greatest integer function is integer less than or equal to X.
Example:
4.5 = 4
4 = 4
3.2 = 3
3 = 3
What I want to know is 5/3 going to produce a double? Because if so I will have loss of precision when converting to int.
Hope this makes sense.
You will lose the fractional portion of the quotient. So yes, with greater numbers you will have more relative precision, such as compared with 5000/3000.
However, 5 / 3 will return an integer, not a double. To force it to divide as double, typecast the dividend as static_cast<double>(5) / 3.
Integer division gives integer results, so 5 / 3 is 1 and 5 % 3 is 2 (the remainder operator). However, this doesn't necessarily hold with negative numbers. In the original C++ standard, -5 / 3 could be either -1 (rounding towards zero) or -2 (the floor), but -1 was recommended. In the latest C++0B draft (which is almost certainly very close to the final standard), it is -1, so finding the floor with negative numbers is more involved.
5/3 will always produce 1 (an integer), if you do 5.0/3 or 5/3.0 the result will be a double.
As far as I know, there is no predefined function for this purpose.
It might be necessary to use such a function, if for some reason floating-point calculations are out of question (e.g. int64_t has a higher precision than double can represent without error)
We could define this function as follows:
#include <cmath>
inline long
floordiv (long num, long den)
{
if (0 < (num^den))
return num/den;
else
{
ldiv_t res = ldiv(num,den);
return (res.rem)? res.quot-1
: res.quot;
}
}
The idea is to use the normal integer divison, but adjust for negative results to match the behaviour of the double floor(double) function. The point is to truncate always towards the next lower integer, irrespective of the position of the zero point. This can be very important if the intention is to create even sized intervals.
Timing measurements show that this function here only creates a small overhead compared with the built-in / operator, but of course the floating point based floor function is significantly faster....
Since in C and C++, as others have said, / is integer division, it will return an int. in particular, it will return the floor of the double answer... (C and C++ always truncate) So, basically 5/3 is exactly what you want.
It may get a little weird in negatives as -5/3 => -2 which may or may not be what you want...

Unexpected loss of precision when dividing doubles

I have a function getSlope which takes as parameters 4 doubles and returns another double calculated using this given parameters in the following way:
double QSweep::getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
The problem is that when calling this function with arguments for example:
getSlope(2.71156, -1.64161, 2.70413, -1.72219);
the returned result is:
10.8557
and this is not a good result for my computations.
I have calculated the slope using Mathematica and the result for the slope for the same parameters is:
10.8452
or with more digits for precision:
10.845222072678331.
The result returned by my program is not good in my further computations.
Moreover, I do not understant how does the program returns 10.8557 starting from 10.845222072678331 (supposing that this is the approximate result for the division)?
How can I get the good result for my division?
thank you in advance,
madalina
I print the result using the command line:
std::cout<<slope<<endl;
It may be that my parameters are maybe not good, as I read them from another program (which computes a graph; after I read this parameters fromt his graph I have just displayed them to see their value but maybe the displayed vectors have not the same internal precision for the calculated value..I do not know it is really strange. Some numerical errors appears..)
When the graph from which I am reading my parameters is computed, some numerical libraries written in C++ (with templates) are used. No OpenGL is used for this computation.
thank you,
madalina
I've tried with float instead of double and I get 10.845110 as a result. It still looks better than madalina result.
EDIT:
I think I know why you get this results. If you get a, b, c and d parameters from somewhere else and you print it, it gives you rounded values. Then if you put it to Mathemtacia (or calc ;) ) it will give you different result.
I tried changing a little bit one of your parameters. When I did:
double c = 2.7041304;
I get 10.845806. I only add 0.0000004 to c!
So I think your "errors" aren't errors. Print a, b, c and d with better precision and then put them to Mathematica.
The following code:
#include <iostream>
using namespace std;
double getSlope(double a, double b, double c, double d){
double slope;
slope=(d-b)/(c-a);
return slope;
}
int main( ) {
double s = getSlope(2.71156, -1.64161, 2.70413, -1.72219);
cout << s << endl;
}
gives a result of 10.8452 with g++. How are you printing out the result in your code?
Could it be that you use DirectX or OpenGL in your project? If so they can turn off double precision and you will get strange results.
You can check your precision settings with
std::sqrt(x) * std::sqrt(x)
The result has to be pretty close to x.
I met this problem long time ago and spend a month checking all the formulas. But then I've found
D3DCREATE_FPU_PRESERVE
The problem here is that (c-a) is small, so the rounding errors inherent in floating point operations is magnified in this example. A general solution is to rework your equation so that you're not dividing by a small number, I'm not sure how you would do it here though.
EDIT:
Neil is right in his comment to this question, I computed the answer in VB using Doubles and got the same answer as mathematica.
The results you are getting are consistent with 32bit arithmetic. Without knowing more about your environment, it's not possible to advise what to do.
Assuming the code shown is what's running, ie you're not converting anything to strings or floats, then there isn't a fix within C++. It's outside of the code you've shown, and depends on the environment.
As Patrick McDonald and Treb brought both up the accuracy of your inputs and the error on a-c, I thought I'd take a look at that. One technique to look at rounding errors is interval arithmetic, which makes the upper and lower bounds which value represents explicit (they are implicit in floating point numbers, and are fixed to the precision of the representation). By treating each value as an upper and lower bound, and by extending the bounds by the error in the representation ( approx x * 2 ^ -53 for a double value x ), you get a result which gives the lower and upper bounds on the accuracy of a value, taking into account worst case precision errors.
For example, if you have a value in the range [1.0, 2.0] and subtract from it a value in the range [0.0, 1.0], then the result must lie in the range [below(0.0),above(2.0)] as the minimum result is 1.0-1.0 and the maximum is 2.0-0.0. below and above are equivalent to floor and ceiling, but for the next representable value rather than for integers.
Using intervals which represent worst-case double rounding:
getSlope(
a = [2.7115599999999995262:2.7115600000000004144],
b = [-1.6416099999999997916:-1.6416100000000002357],
c = [2.7041299999999997006:2.7041300000000005888],
d = [-1.7221899999999998876:-1.7221900000000003317])
(d-b) = [-0.080580000000000526206:-0.080579999999999665783]
(c-a) = [-0.0074300000000007129439:-0.0074299999999989383218]
to double precision [10.845222072677243474:10.845222072679954195]
So although c-a is small compared to c or a, it is still large compared to double rounding, so if you were using the worst imaginable double precision rounding, then you could trust that value's to be precise to 12 figures - 10.8452220727. You've lost a few figures off double precision, but you're still working to more than your input's significance.
But if the inputs were only accurate to the number significant figures, then rather than being the double value 2.71156 +/- eps, then the input range would be [2.711555,2.711565], so you get the result:
getSlope(
a = [2.711555:2.711565],
b = [-1.641615:-1.641605],
c = [2.704125:2.704135],
d = [-1.722195:-1.722185])
(d-b) = [-0.08059:-0.08057]
(c-a) = [-0.00744:-0.00742]
to specified accuracy [10.82930108:10.86118598]
which is a much wider range.
But you would have to go out of your way to track the accuracy in the calculations, and the rounding errors inherent in floating point are not significant in this example - it's precise to 12 figures with the worst case double precision rounding.
On the other hand, if your inputs are only known to 6 figures, it doesn't actually matter whether you get 10.8557 or 10.8452. Both are within [10.82930108:10.86118598].
Better Print out the arguments, too. When you are, as I guess, transferring parameters in decimal notation, you will lose precision for each and every one of them. The problem being that 1/5 is an infinite series in binary, so e.g. 0.2 becomes .001001001.... Also, decimals are chopped when converting an binary float to a textual representation in decimal.
Next to that, sometimes the compiler chooses speed over precision. This should be a documented compiler switch.
Patrick seems to be right about (c-a) being the main cause:
d-b = -1,72219 - (-1,64161) = -0,08058
c-a = 2,70413 - 2,71156 = -0,00743
S = (d-b)/(c-a)= -0,08058 / -0,00743 = 10,845222
You start out with six digits precision, through the subtraction you get a reduction to 3 and four digits. My best guess is that you loose additonal precision because the number -0,00743 can not be represented exaclty in a double. Try using intermediate variables with a bigger precision, like this:
double QSweep::getSlope(double a, double b, double c, double d)
{
double slope;
long double temp1, temp2;
temp1 = (d-b);
temp2 = (c-a);
slope = temp1/temp2;
return slope;
}
While the academic discussion going on is great for learning about the limitations of programming languages, you may find the simplest solution to the problem is an data structure for arbitrary precision arithmetic.
This will have some overhead, but you should be able to find something with fairly guaranteeable accuracy.