I have a program in C++ where I divide two numbers, and I need to know if the answer is an integer or not. What I am using is:
if(fmod(answer,1) == 0)
I also tried this:
if(floor(answer)==answer)
The problem is that answer usually is a 5 digit number, but with many decimals. For example, answer can be: 58696.000000000000000025658 and the program considers that an integer.
Is there any way I can make this work?
I am dividing double a/double b= double answer
(sometimes there are more than 30 decimals)
Thanks!
EDIT:
a and b are numbers in the thousands (about 100,000) which are then raised to powers of 2 and 3, added together and divided (according to a complicated formula). So I am plugging in various a and b values and looking at the answer. I will only keep the a and b values that make the answer an integer. An example of what I got for one of the answers was: 218624 which my program above considered to be an integer, but it really was: 218624.00000000000000000056982 So I need a code that can distinguish integers with more than 20-30 decimals.
You can use std::modf in cmath.h:
double integral;
if(std::modf(answer, &integral) == 0.0)
The integral part of answer is stored in fraction and the return value of std::modf is the fractional part of answer with the same sign as answer.
The usual solution is to check if the number is within a very short distance of an integer, like this:
bool isInteger(double a){
double b=round(a),epsilon=1e-9; //some small range of error
return (a<=b+epsilon && a>=b-epsilon);
}
This is needed because floating point numbers have limited precision, and numbers that indeed are integers may not be represented perfectly. For example, the following would fail if we do a direct comparison:
double d=sqrt(2); //square root of 2
double answer=2.0/(d*d); //2 divided by 2
Here, answer actually holds the value 0.99999..., so we cannot compare that to an integer, and we cannot check if the fractional part is close to 0.
In general, since the floating point representation of a number can be either a bit smaller or a bit bigger than the actual number, it is not good to check if the fractional part is close to 0. It may be a number like 0.99999999 or 0.000001 (or even their negatives), these are all possible results of a precision loss. That's also why I'm checking both sides (+epsilon and -epsilon). You should adjust that epsilon variable to fit your needs.
Also, keep in mind that the precision of a double is close to 15 digits. You may also use a long double, which may give you some extra digits of precision (or not, it is up to the compiler), but even that only gets you around 18 digits. If you need more precision than that, you will need to use an external library, like GMP.
Floating point numbers are stored in memory using a very different bit format than integers. Because of this, comparing them for equality is not likely to work effectively. Instead, you need to test if the difference is smaller than some epsilon:
const double EPSILON = 0.00000000000000000001; // adjust for whatever precision is useful for you
double remainder = std::fmod(numer, denom);
if(std::fabs(0.0 - remainder) < EPSILON)
{
//...
}
Alternatively, if you want to include values that are close to integers (based on your desired precision), you can modify the if condition slightly (since the remainder returned by std::fmod will be in the range [0, 1)):
if (std::fabs(std::round(d) - d) < EPSILON)
{
// ...
}
You can see the test for this here.
Floating point numbers are generally somewhat precise to about 12-15 digits (as a double), but as they are stored as a mantissa (fraction) and a exponent, rational numbers (integers or common fractions) are not likely to be stored as such. For example,
double d = 2.0; // d might actually be 1.99999999999999995
Because of this, you need to compare the difference of what you expect to some very small number that encompasses the precision you desire (we will call this value, epsilon):
double d = 2.0;
bool test = std::fabs(2 - d) < epsilon; // will return true
So when you are trying to compare the remainder from std::fmod, you need to check it against the difference from 0.0 (not for actual equality to 0.0), which is what is done above.
Also, the std::fabs call prevents you from having to do 2 checks by asserting that the value will always be positive.
If you desire a precision that is greater than 15-18 decimal places, you cannot use double or long double; you will need to use a high precision floating point library.
Related
#include <iostream>
using namespace std;
int main() {
int steps=1000000000;
float s = 0;
for (int i=1;i<(steps+1);i++){
s += (i/2.0) ;
}
cout << s << endl;
}
Declaring s as float: 9.0072e+15
Declaring s as double: 2.5e+17 (same result as implementing it in Julia)
I understand double has double precision than float, but float should still handle numbers up to 10^38.
I did read similar topics where results where not the same, but in that cases the differences were very small, here the difference is 25x.
I also add that using long double instead gives me the same result as double. If the matter is the precision, I would have expected to have something a bit different.
The problem is the lack of precision: https://en.wikipedia.org/wiki/Floating_point
After 100 million numbers you are adding 1e8 to 1e16 (or at least numbers of that magnitude), but single precision numbers are only accurate to 7 digits - so it is the same as adding 0 to 1e16; that's why your result is considerably lower for float.
Prefer double over float in most cases.
Problem with floating point precision! Infinite real numbers cannot possibly be represented by the finite memory of a computer. Float, in general, are just approximations of the number they are meant to represent.
For more details, please check the following documentation:
https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors
You didn't mention what type of floating point numbers you are using, but I'm going to assume that you use IEEE 754, or similar.
I understand double has double precision
To be more precise with the terminology, double uses twice as many bits. That's not double the number of reprensentable values, it's 4294967296 times as many representable values, despite being named "double precision".
but float should still handle numbers up to 10^38.
Float can handle a few numbers up to that magnitude. But that does't mean that float values in that range are precise. For example, 3,4028235E+38 can be represented as a single precision float. How much would you imagine is the difference between the previous value representable by float? Is it the machine epsilon? Perhaps 0.1? Maybe 1? No. The difference is about 2E+31.
Now, your numbers aren't quite in that range. But, they're outside the continuous range of whole integers that can be precisely represented by float. The highest value in that range happens to be 16777217, or about 1.7E+7, which is way less than 2.5E+17. So, every addition beyond that range adds some error to the result. You perform a billion calculations so those errors add up.
Conclusions:
Understand that single precision is way less precise than double precision.
Avoid long sequences of calculations where precision errors can accumulate.
Is there an algorithm in C++ that will allow me to, given a floating-point value V of type T (e.g. double or float), returns the closest value to V in a given direction (up or down) that can be represented exactly in less than or equal to a specified number of decimal places D ?
For example, given
T = double
V = 670000.08267799998
D = 6
For direction = towards +inf I would like the result to be 670000.082678, and for direction = towards -inf I would like the result to be 670000.082677
This is somewhat similar to std::nexttoward(), but with the restriction that the 'next' value needs to be exactly representable using at most D decimal places.
I've considered a naive solution involving separating out the fractional portion and scaling it by 10^D, truncating it, and scaling it again by 10^-D and tacking it back onto the whole number portion, but I don't believe that guarantees that the resulting value will be exactly representable in the underlying type.
I'm hopeful that there's a way to do this properly, but so far I've been unable to find one.
Edit: I think my original explanation didn't properly convey my requirements. At the suggestion of #patricia-shanahan I'll try to describing my higher-level goal and then reformulate the problem a little differently in that context.
At the highest level, the reason I need this routine is due to some business logic wherein I must take in a double value K and a percentage P, split it into two double components V1 and V2 where V1 ~= P percent of K and V1 + V2 ~= K. The catch is that V1 is used in further calculations before being sent to a 3rd party over a wire protocol that accepts floating-point values in string format with a max of D decimal places. Because the value sent to the 3rd party (in string format) needs to be reconcilable with the results of the calculations made using V1 (in double format) , I need to "adjust" V1 using some function F() so that it is as close as possible to being P percent of K while still being exactly representable in string format using at most D decimal places. V2 has none of the restrictions of V1, and can be calculated as V2 = K - F(V1) (it is understood and acceptable that this may result in V2 such that V1 + V2 is very close to but not exactly equal to K).
At the lower level, I'm looking to write that routine to 'adjust' V1 as something with the following signature:
double F(double V, unsigned int D, bool roundUpIfTrueElseDown);
where the output is computed by taking V and (if necessary, and in the direction specified by the bool param) rounding it to the Dth decimal place.
My expectation would be that when V is serialized out as follows
const auto maxD = std::numeric_limits<double>::digits10;
assert(D <= maxD); // D will be less than maxD... e.g. typically 1-6, definitely <= 13
std::cout << std::fixed
<< std::setprecision(maxD)
<< F(V, D, true);
then the output contains only zeros beyond the Dth decimal place.
It's important to note that, for performance reasons, I am looking for an implementation of F() that does not involve conversion back and forth between double and string format. Though the output may eventually be converted to a string format, in many cases the logic will early-out before this is necessary and I would like to avoid the overhead in that case.
This is a sketch of a program that does what is requested. It is presented mainly to find out whether that is really what is wanted. I wrote it in Java, because that language has some guarantees about floating point arithmetic on which I wanted to depend. I only use BigDecimal to get exact display of doubles, to show that the answers are exactly representable with no more than D digits after the decimal point.
Specifically, I depended on double behaving according to IEEE 754 64-bit binary arithmetic. That is likely, but not guaranteed by the standard, for C++. I also depended on Math.pow being exact for simple exact cases, on exactness of division by a power of two, and on being able to get exact output using BigDecimal.
I have not handled edge cases. The big missing piece is dealing with large magnitude numbers with large D. I am assuming that the bracketing binary fractions are exactly representable as doubles. If they have more than 53 significant bits that will not be the case. It also needs code to deal with infinities and NaNs. The assumption of exactness of division by a power of two is incorrect for subnormal numbers. If you need your code to handle them, you will have to put in corrections.
It is based on the concept that a number that is both exactly representable as a decimal with no more than D digits after the decimal point and is exactly representable as a binary fraction must be representable as a fraction with denominator 2 raised to the D power. If it needs a higher power of 2 in the denominator, it will need more than D digits after the decimal point in its decimal form. If it cannot be represented at all as a fraction with a power-of-two denominator, it cannot be represented exactly as a double.
Although I ran some other cases for illustration, the key output is:
670000.082678 to 6 digits Up: 670000.09375 Down: 670000.078125
Here is the program:
import java.math.BigDecimal;
public class Test {
public static void main(String args[]) {
testIt(2, 0.000001);
testIt(10, 0.000001);
testIt(6, 670000.08267799998);
}
private static void testIt(int d, double in) {
System.out.print(in + " to " + d + " digits");
System.out.print(" Up: " + new BigDecimal(roundUpExact(d, in)).toString());
System.out.println(" Down: "
+ new BigDecimal(roundDownExact(d, in)).toString());
}
public static double roundUpExact(int d, double in) {
double factor = Math.pow(2, d);
double roundee = factor * in;
roundee = Math.ceil(roundee);
return roundee / factor;
}
public static double roundDownExact(int d, double in) {
double factor = Math.pow(2, d);
double roundee = factor * in;
roundee = Math.floor(roundee);
return roundee / factor;
}
}
In general, decimal fractions are not precisely representable as binary fractions. There are some exceptions, like 0.5 (½) and 16.375 (16⅜), because all binary fractions are precisely representable as decimal fractions. (That's because 2 is a factor of 10, but 10 is not a factor of 2, or any power of two.) But if a number is not a multiple of some power of 2, its binary representation will be an infinitely-long cyclic sequence, like the representation of ⅓ in decimal (.333....).
The standard C library provides the macro DBL_DIG (normally 15); any decimal number with that many decimal digits of precision can be converted to a double (for example, with scanf) and then converted back to a decimal representation (for example, with printf). To go in the opposite direction without losing information -- start with a double, convert it to decimal and then convert it back -- you need 17 decimal digits (DBL_DECIMAL_DIG). (The values I quote are based on IEEE-754 64-bit doubles).
One way to provide something close to the question would be to consider a decimal number with no more than DBL_DIG digits of precision to be an "exact-but-not-really-exact" representation of a floating point number if that floating point number is the floating point number which comes closest to the value of the decimal number. One way to find that floating point number would be to use scanf or strtod to convert the decimal number to a floating point number, and then try the floating point numbers in the vicinity (using nextafter to explore) to find which ones convert to the same representation with DBL_DIG digits of precision.
If you trust the standard library implementation to not be too far off, you could convert your double to a decimal number using sprintf, increment the decimal string at the desired digit position (which is just a string operation), and then convert it back to a double with strtod.
Total re-write.
Based on OP's new requirement and using power-of-2 as suggested by #Patricia Shanahan, simple C solution:
double roundedV = ldexp(round(ldexp(V, D)),-D); // for nearest
double roundedV = ldexp(ceil (ldexp(V, D)),-D); // at or just greater
double roundedV = ldexp(floor(ldexp(V, D)),-D); // at or just less
The only thing added here beyond #Patricia Shanahan fine solution is C code to match OP's tag.
In C++ integers must be represented in binary, but floating point types can have a decimal representation.
If FLT_RADIX from <limits.h> is 10, or some multiple of 10, then your goal of exact representation of a decimal values is attainable.
Otherwise, in general, it's not attainable.
So, as a first step, try to find a C++ implementation where FLT_RADIX is 10.
I wouldn't worry about algorithm or efficiency thereof until the C++ implementation is installed and proved to be working on your system. But as a hint, your goal seems to be suspiciously similar to the operation known as “rounding”. I think, after obtaining my decimal floating point C++ implementation, I’d start by investigating techniques for rounding, e.g., googling that, maybe Wikipedia, …
I want to truncate floor number to be 3 digit decimal number. Example:
input : x = 0.363954;
output: 0.364
i used
double myCeil(float v, int p)
{
return int(v * pow(float(10),p))/pow(float(10),p );
}
but the output was 0.3630001 .
I tried to use trunc from <cmath> but it doesn't exist.
Floating-point math typically uses a binary representation; as a result, there are decimal values that cannot be exactly represented as floating-point values. Trying to fiddle with internal precisions runs into exactly this problem. But mostly when someone is trying to do this they're really trying to display a value using a particular precision, and that's simple:
double x = 0.363954;
std::cout.precision(3);
std::cout << x << '\n';
The function your looking for is the std::ceil, not std::trunc
double myCeil(double v, int p)
{
return std::ceil(v * std::pow(10, p)) / std::pow(10, p);
}
substitue in std::floor or std::round for a myFloor or myRound as desired. (Note that std::round appears in C++11, which you will have to enable if it isn't already done).
It is just impossible to get 0.364 exactly. There is no way you can store the number 0.364 (364/1000) exactly as a float, in the same way you would need an infinite number of decimals to write 1/3 as 0.3333333333...
You did it correctly, except for that you probably want to use std::round(), to round to the closest number, instead of int(), which truncates.
Comparing floating point numbers is tricky business. Typically the best you can do is check that the numbers are sufficiently close to each other.
Are you doing your rounding for comparison purposes? In such case, it seems you are happy with 3 decimals (this depends on each problem in question...), in such case why not just
bool are_equal_to_three_decimals(double a, double b)
{
return std::abs(a-b) < 0.001;
}
Note that the results obtained via comparing the rounded numbers and the function I suggested are not equivalent!
This is an old post, but what you are asking for is decimal precision with binary mathematics. The conversion between the two is giving you an apparent distinction.
The main point, I think, which you are making is to do with identity, so that you can use equality/inequality comparisons between two numbers.
Because of the fact that there is a discrepancy between what we humans use (decimal) and what computers use (binary), we have three choices.
We use a decimal library. This is computationally costly, because we are using maths which are different to how computers work. There are several, and one day they may be adopted into std. See eg "ISO/IEC JTC1 SC22 WG21 N2849"
We learn to do our maths in binary. This is mentally costly, because it's not how we do our maths normally.
We change our algorithm to include an identity test.
We change our algorithm to use a difference test.
With option 3, it is where we make a decision as to just how close one number needs to be to another number to be considered 'the same number'.
One simple way of doing this is (as given by #SirGuy above) where we use ceiling or floor as a test - this is good, because it allows us to choose the significant number of digits we are interested in. It is domain specific, and the solution that he gives might be a bit more optimal if using a power of 2 rather than of 10.
You definitely would only want to do the calculation when using equality/inequality tests.
So now, our equality test would be (for 10 binary places (nearly 3dp))
// Normal identity test for floats.
// Quick but fails eg 1.0000023 == 1.0000024
return (a == b);
Becomes (with 2^10 = 1024).
// Modified identity test for floats.
// Works with 1.0000023 == 1.0000024
return (std::floor(a * 1024) == std::floor(b * 1024));
But this isn't great
I would go for option 4.
Say you consider any difference less than 0.001 to be insignificant, such that 1.00012 = 1.00011.
This does an additional subtraction and a sign removal, which is far cheaper (and more reliable) than bit shifts.
// Modified equality test for floats.
// Returns true if the ∂ is less than 1/10000.
// Works with 1.0000023 == 1.0000024
return abs(a - b) < 0.0001;
This boils down to your comment about calculating circularity, I am suggesting that you calculate the delta (difference) between two circles, rather than testing for equivalence. But that isn't exactly what you asked in the question...
I need to represent numbers using the following structure. The purpose of this structure is not to lose the precision.
struct PreciseNumber
{
long significand;
int exponent;
}
Using this structure actual double value can be represented as value = significand * 10e^exponent.
Now I need to write utility function which can covert double into PreciseNumber.
Can you please let me know how to extract the exponent and significand from the double?
The prelude is somewhat flawed.
Firstly, barring any restrictions on storage space, conversion from a double to a base 10 significand-exponent form won't alter the precision in any form. To understand that, consider the following: any binary terminating fraction (like the one that forms the mantissa on a typical IEEE-754 float) can be written as a sum of negative powers of two. Each negative power of two is a terminating fraction itself, and hence it follows that their sum must be terminating as well.
However, the converse isn't necessarily true. For instance, 0.3 base 10 is equivalent to the non-terminating 0.01 0011 0011 0011 ... in base 2. Fitting this into a fixed size mantissa would blow some precision out of it (which is why 0.3 is actually stored as something that translates back to 0.29999999999999999.)
By this, we may assume that any precision that is intended by storing the numbers in decimal significand-exponent form is either lost, or isn't simply gained at all.
Of course, you might think of the apparent loss of accuracy generated by storing a decimal number as a float as loss in precision, in which case the Decimal32 and Decimal64 floating point formats may be of some interest -- check out http://en.wikipedia.org/wiki/Decimal64_floating-point_format.
This is a very difficult problem. You might want to see how much code it takes to implement a double-to-string conversion (for printf, e.g.). You might steal the code from gnu's implementation of gcc.
You cannot convert an "imprecise" double into a "precise" decimal number, because the required "precision" simply isn't there to begin with (otherwise why would you even want to convert?).
This is what happens if you try something like it in Java:
BigDecimal x = new BigDecimal(0.1);
System.out.println(x);
The output of the program is:
0.1000000000000000055511151231257827021181583404541015625
Well you're at less precision than a typical double. Your significand is a long giving you a range from -2 billion to +2 billion which is more than 9 but fewer than 10 digits of precision.
Here's an untested starting point on what you'd want to do for some simple math on PreciseNumbers
PreciseNumber Multiply(PreciseNumber lhs, PreciseNumber rhs)
{
PreciseNumber ret;
ret.s=lhs.s;
ret.e=lhs.e;
ret.s*=rhs.s;
ret.e+=lhs.e;
return ret;
}
PreciseNumber Add(PreciseNumber lhs, PreciseNumber rhs)
{
PreciseNumber ret;
ret.s=lhs.s;
ret.e=lhs.e;
ret.s+=(rhs.s*pow(10,rhs.e-lhs.e));
}
I didn't take care of any renormalization, but in both cases there are places where you have to worry about over/under flows and loss of precision. Just because you're doing it yourself rather than letting the computer take care of it in a double, doesn't meat the same pitfalls aren't there. The only way to not lose precision is to keep track of all of the digits.
Here's a very rough algorithm. I'll try to fill in some details later.
Take the log10 of the number to get the exponent. Multiply the double by 10^x if positive, or divide by 10^-x if negative.
Start with a significand of zero. Repeat the following 15 times, since a double contains 15 digits of significance:
Multiply the previous significand by 10.
Take the integer portion of the double, add it to the significand, and subtract it from the double.
Subtract 1 from the exponent.
Multiply the double by 10.
When finished, take the remaining double value and use it for rounding: if it's >= 5, add one to the significand.
I was wondering whether it is possible to limit the number of characters we enter in a float.
I couldn't seem to find any method. I have to read in data from an external interface which sends float data of the form xx.xx. As of now I am using conversion to char and vice-versa, which is a messy work-around. Can someone suggest inputs to improve the solution?
If you always have/want only 2 decimal places for your numbers, and absolute size is not such a big issue, why not work internally with integers instead, but having their meaning be "100th of the target unit". At the end you just need to convert them back to a float and divide by 100.0 and you're back to what you want.
This is a slight misunderstanding. You cannot think of a float or double as being a decimal number.
Most any attempt to use it as a fixed decimal number of precision, say, 2, will incur problems as some values will not be precisely equal to xxx.xx but only approximately so.
One solution that many apps use is to ensure that:
1) display of floating point numbers is well controlled using printf/sprintf to a certain number of significant digits,
2) one does not do exact comparison between floating point numbers, i.e. to compare to the 2nd decimal point of precision two numbers a, b : abs(a-b) <= epsilon should generally be used. Outright equality is dangerous as 0.01 might have multiple floating point values, e.g. 0.0101 and 0.0103 might result if you do arithmetic, but be indistinguishable to the user if values are truncated to 2 dp, and they may be logically equivalent to your application which is assuming 2dp precision.
Lastly, I would suggest you use double instead of float. These days there is no real overhead as we aren't doing floating point without a maths coprocessor any more! And a float under 32-bit architectures has 7 decimal points of precision, and a double has 15, and this is enough to be significant in many case.
Rounding a float (that is, binary floating-point number) to 2 decimal digits doesn't make much sense because you won't be able to round it exactly in some cases anyway, so you'll still get a small delta which will affect subsequent calculations. If you really need it to be precisely 2 places, then you need to use decimal arithmetic; for example, using IBM's decNumber++ library, which implements ISO C/C++ TR 24773 draft
You can limit the number of significant numbers to output:
http://www.cplusplus.com/reference/iostream/manipulators/setprecision/
but I don't think there is a function to actually lop off a certain number of digits. You could write a function using ftoa() (or stringstream), lop off a certain number of digits, and use atof() (or stringstream) and return that.
You should checks the string rather than the converted float. It will be easier to check the number of digits.
Why don't you just round the floats to the desired precision?
double round(double val, int decimalPlaces)
{
double power_of_10 = pow(10.0, static_cast<double>(decimalPlaces));
return floor(val * power_of_10 + 0.5) / power_of_10;
}
int main()
{
double d;
cin >> d;
// round d to 3 decimal places...
d = round(d, 3);
// do something with d
d *= 1.75;
cout << setprecision(3) << d; // now output to 3 decimal places
}
There exist no fixed point decimal datatype in C, but you can mimic pascal's decimal with a struct of two ints.
If the need is to take 5 digits [ including or excluding the decimal point ], you could simply write like below.
scanf( "%5f", &a );
where a is declared as float.
Fo eg:
If you enter 123.45, scanf will consider the first 5 characters i.e., 4 digits and the decimal point & will store 123.4
If entered 123456, the value of a will be 12345 [ ~ 12345.00 ]
With printf, we would be able to control how many characters can be printed after decimal as well.
printf( "%5.2f \n", a );
The value of 123.4 will be printed as 12.30 [ total 5, including the decimal & 2 digits after decimal ]
But this have a limitation, where if the digits in the value are more than 5, it will display the actual value.
eg: The value of 123456.7, will be displayed as 123456.70.
This [ specifying the no. of digits after the decimal, as mentioned for printf ] I heard can be used for scanf as well, I am not sure sure & the compiler I use doesn't support that format. Verify whether your compiler does.
Now, when it comes to taking data from an external interface, are you talking about serialization here, I mean transmission of data on netwrok.
Then, to my knowledge your approach is fine.
We generally tend to read in the form of char only, to make sure the application works for any format of data.
You can print a float use with printf("%.2f", float), or something similar.