c++ round floating numbers to set precision

c++ round floating numbers to set precision - c++

I wish to round a floating point number to set precision and return the result from a function. For example, I currently have the following function:
inline bool R3Point::
operator==(const R3Point& point) const
{
// Return whether point is equal
return ((v[0] == point.v[0]) && (v[1] == point.v[1]) && (v[2] == point.v[2]));
}
What I wish to do is instead of doing a direct v[i] == point.v[i] comparison, I wish to compare only digits to a certain set precision, so that if v[i] = 0.33349999999999996 and point.v[i] = 0.33350000000000002, my equal comparison will result in TRUE.
I am aware that there's a c++ smanip setprecision ( int n ); function and I've seen it used a lot when displaying output on screen using cout. However, I'm not sure if this can be used within the function like I described.
Thanks.

Generally, == should not be used to compare doubles, you should do something like :
if(v[0] - point.v[0] < 1e-9) { }
You can use abs or fabs if you are not sure of the sign and change the precision 1e-9 accordingly.

Comparing 2 floating point numbers (say a and b), is best done using the following: abs(a-b) < precision. Where abs(x) is the absolute value function, and precision is some small positive number. Often you want to set the precision as a function of the absolute value of the numbers being compared themselves.

Related

Display of Double Precision Floating Points Vs Their comparrison

Preamble
I am looking into a system developed to be used by people who don't understand floating point arithmetic. For this reason the implementation of comparison for floating point numbers is not exposed to the people using the system. Currently comparisons of floating point numbers occur like this (And this cannot change due to legacy reasons):
// If either number is not finite, do default comparison
if (!IsFinite(num1) || !IsFinite(num2)) {
output = (num1 == num2);
} else {
// Get exponents of both numbers to determine epsilon for comparison
tmp = (OSINT32*)&num1+1;
exp1 = (((*tmp)>>20)& 0x07ff) - 1023;
tmp = (OSINT32*)&num2+1;
exp2 = (((*tmp)>>20)& 0x07ff) - 1023;
// Check if exponent is the same
if (exp1 != exp2) {
output = false;
} else {
// Calculate epsilon based on the magic number 47 (presumably calculated experimentally)?
epsilon = pow(2.0,exp1-47);
output = (fabs(num2-num1) <= eps);
}
}
The crux of it is, we calculate the epsilon based on the exponent of the number to stop users of the interface from making floating point comparison mistakes. A BIG NOTE: This is for people who are not software programmers so when they do pow(sqrt(2), 2) == 2 they don't get a big surprise. Maybe this is not the best idea, but like i said, it cannot be changed.
The Problem
We are having trouble figuring out how to display numbers to the user. In the past they simply displayed the number to 15 significant digits. But this results in problems of the following type:
>> SHOW 4.1 MOD 1
>> 0.099999999999999996
>> SHOW (4.1 MOD 1) == 0.1
>> TRUE
The comparison calls this correct because of the generated epsilon. But the printing of the number is confusing for people, how is 0.099999999999999996 = 0.1?. We need a way to show the number such that it represents the shortest number of significant bits to which a number compared to it would be TRUE. So for 0.099999999999999996 this would be 0.1, for 0.569999999992724327 it would be 0.569999999992725.
Is this possible?

You could calculate (num - pow(2.0, exp - 47)) and (num + pow(2.0, exp - 47)), convert both to string and search the smallest decimal between the range.
The exact value of a double is mantissa * pow(2.0, exp - 51) with an integer value mantissa, so if you add/subtract pow(2.0, exp - 47) you change the mantissa by 2^4, which should be exactly representable without rounding errors (unless in corner cases where the mantissa under/overflows, i.e if it is binary <= pow(2,4) or >= pow(2, 53) - pow(2,4). you might want to check for these*).
Then you have two strings, search the first position where the digits differ and cut it off there. Although there are a lot of rounding cases, especially when you not just want a correct number in the range, but the number closes to the input number (but that might not be needed). For example if you get "1.23" and "1.24", you might even want to output `"1.235".
This also shows that your example is wrong. epsilon for 0.569999999992724327 is (to maximal precision) 0.000000000000003552713678800500929355621337890625. The ranges are 0.569999999992720773889232077635824680328369140625 to 0.569999999992727879316589678637683391571044921875 and would be cut off at 0.569999999992725 (or 0.569999999992723 if you prefer that rounding)
An easier to implement sledgehammer method would be to output it to the maximal precision, cut one digit off, convert it back to double, check if it compares correctly. Then continue cutting, till the comparison fails. (could be improved with a binary search)
* They should still be exactly representable, but your comparison method will behave very odd. Consider num1 == 1 and num2 == 1 - pow(2.0, -53) = 0.99999999999999988897769753748434595763683319091796875. There difference 0.00000000000000011102230246251565404236316680908203125 is below your epsilon0.000000000000003552713678800500929355621337890625, but the comparison will say they differ, because they have different exponents

Yes, it's possible.
double a=fmod(4.1,1);
cerr<<std::setprecision(0)<<a<<"\n";
cerr<<std::setprecision(10)<<a<<"\n";
cerr<<std::setprecision(20)<<a<<"\n";
produces:
0.1
0.1
0.099999999999999644729
I think you just need to determine what level of display precision corresponds to your epsilon value.

We need a way to show the number such that it represents the shortest
number of significant bits to which a number compared to it would be
TRUE.
Can't you just do it the brute-force-ish way?
float num = 0.09999999;
for (int precision = 0; precision < MAX_PRECISION; ++precision) {
std::stringstream str;
float tmp = 0;
str << std::fixed << std::setprecision(precision) << num;
str >> tmp;
if (num == tmp) {
std::cout << std::fixed << std::setprecision(precision) << num;
break;
}
}

It is not possible to avoid confusing users given the constraints you've specified. For one thing, 0.0999999999999996447 compares equal to 0.1, and 0.1000000000000003664 compares equal to 0.1, but 0.0999999999999996447 does not compare equal to 0.1000000000000003664. For another, 2.00000000000001421 compares equal to 2.0, but 1.999999999999999778 does not compare equal to 2.0 even though it's much closer to 2.0 than 2.00000000000001421 is.
Enjoy.

How to get rid of -0 in C++

I am writing a program in which there are some operations being performed on a floating point number. After I debugged the program, I came to know that for a particular test case, the value of the variable equals -2.38418579e-07. Now I have cout precision set to 2 digits after decimal. So when I print it, it prints it as -0.00.
However, I would like the output to be 0.00 instead of -0.00. I have tried various if conditions on the variable's value. However, they do not help. Can anyone suggest how to get rid of -0.00 in C++

Firstly, you should define a tolerance number as threshold, where the absolute value of any floating point number bellow this threshold would be considered as zero. For example you could define this threshold as:
#define zero 1e-6
Then you could use the following construct to "filter" your floating point numbers:
template<typename T>
std::enable_if_t<std::is_floating_point<T>::value, T> sanitize(T &&num) {
return std::abs(num) < zero? T{} : num;
}
Live Demo
Notice that I use SFINAE in order for the sanitize function to accepts as input only floating point numbers.

I would like the output to be 0.00 instead of -0.00
I like the other answers better. But in a crunch you can always use brute force ... (are you sure you can ignore the actual results?)
std::string rslt;
{
std::stringstream ss;
ss << variable; // use same formatting as in your example
size_t minusSignIndx = ss.str().find("-0.00");
if (minusSignIndx != std::string::npos)
rslt = " 0.00"; // found the nasty, ignore it
else
rslt = ss.str(); // not nasty, use it
}
//... use rslt

The problem is that every floating point in a certain interval [ low , -0.0] will be printed "-0.00".
Thus you have to find low:
such that print(predecessor(low)) => "-0.01"
such that print(low) => "-0.00"
Then, you'll be able to write something like (nan apart...)
double filter(double x) {
double low = ... ;
return (x < low)
? x
: ((x > 0.0)
? x
: 0.0) ;
}
If you have a correctly rounded printf, and manage your arithmetic to be strictly IEEE754 conformant with appropriate compiler flags, the exact value of low is the nearest double to -1/200, greater than -1/200 (I write 1/200 rather than -0.005 because I'm speaking of the decimal value, not the double)
What we have with correctly rounded sscanf("-0.005","%lf",d): the double result is smaller than -1/200. I did check that with exact arithmetic like for example found in Pharo Smalltalk language:
[-0.005 < (-1/200) and: [-0.005 successor > (-1/200)]] assert.
Its successor is greater than -1/200 (necessarily, above check is just foolproofing).
Thus you can write (notice the <= low):
double filter(double x) {
double low = 0.005 ;
return (x <= low)
? x
: ((x > 0.0)
? x
: 0.0) ;
}

Do multiples of Pi to the thousandths have a value that may change how a loop executes?

Recently I decided to get into c++, and after going through the basics I decided to build a calculator using only iostream (just to challenge myself). After most of it was complete, I came across an issue with my loop for exponents. Whenever a multiple of Pi was used as the exponent, it looped way too many times. I fixed it in a somewhat redundant way and now I'm hoping someone might be able to tell me what happened. My unfixed code snippet is below. Ignore everything above and just look at the last bit of fully functioning code. All I was wondering was why values of pi would throw off the loop so much. Thanks.
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
And then heres where it all goes wrong (Denominator is set to a value of 1)
if (TestForDecimal(Power) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(Power) == 1) {
Power = Power * 10;
Denominator = Denominator * 10;
}
}
If anyone could give me an explanation that would be great!
To clarify further, the while loop kept looping even after Power became a whole number (This only happened when Power was equal to a multiple of pi such as 3.1415 or 6.2830 etc.)
Heres a complete code you can try:
#include <iostream>
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
void foo(double Power) {
double x = Power;
if (TestForDecimal(x) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(x) == 1) {
x = x * 10;
std::cout << x << std::endl;
}
}
}
int main() {
foo(3.145); // Substitute this with 3.1415 and it doesn't work (this was my problem)
system("Pause");
return 0;
}

What's wrong with doing something like this?
#include <cmath> // abs and round
#include <cfloat> // DBL_EPSILON
bool TestForDecimal(double Num) {
double diff = abs(round(Num) - Num);
// true if not a whole number
return diff > DBL_EPSILON;
}

The look is quite inefficient...what if Num is large...
A faster way could be something like
if (Num == static_cast<int>(Num))
or
if (Num == (int)Num)
if you prefer a C-style syntax.
Then a range check may be useful... it oes not make sense to ask if Num is an intger when is larger than 2^32 (about 4 billions)
Finally do not think od these numers as decimals. They are stored as binary numbers, instead of multiplying Power and Denominator by 2 you are better of multiplying them by 2.

Most decimal fractions can't be represented exactly in a binary floating-point format, so what you're trying to do can't work in general. For example, with a standard 64-bit double format, the closest representable value to 3.1415 is more like 3.1415000000000001812.
If you need to represent decimal fractions exactly, then you'll need a non-standard type. Boost.Multiprecision has some decimal types, and there's a proposal to add decimal types to the standard library; some implementations may have experimental support for this.

Beware. A double is (generally but I think you use a standard architecture) represented in IEE-754 format, that is mantissa * 2exponent. For a double, you have 53 bits for the mantissa part, one for the sign and 10 for the exponent. When you multiply it by 10 it will grow, and will get an integer value as soon as exponent will be greater than 53.
Unfortunately, unless you have a 64 bits system, an 53 bits integer cannot be represented as a 32 bits int, and your test will fail again.
So if you have a 32 bits system, you will never reach an integer value. You will more likely reach an infinity representation and stay there ...
The only use case where it could work, would be if you started with a number that can be represented with a small number of negative power of 2, for example 0.5 (1/2), 0.25(1/4), 0.75(1/2 + 1/4), giving almost all digits of mantissa part being 0.

After studying your "unfixed" function, from what I can tell, here's your basic algorithm:
double TestForDecimal(double Num) { ...
A function that accepts a double and returns a double. This would make sense if the returned value was the decimal value, but since that's not the case, perhaps you meant to use bool?
while (Num > 1) { make it less }
While there is nothing inherently wrong with this, it doesn't really address negative numbers with large magnitudes, so you'll run into problems there.
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) { return 0; }
This means that if Num is within the signed integer range and its integer typecast is equal to itself, return a 0 typecasted to a double. This means you don't care whether numbers outside the integer range are whole numbers or not. To fix this, change the condition to if (Num == (long)Num) since sizeof(long) == sizeof(double).
Perhaps the algorithm your function follows that I've just explained might shed some light on your problem.

Sorting/comparison of doubles in C(++) not stable?

I've ran into a pretty wierd problem with doubles. I have a list of floating point numbers (double) that are sorted in decreasing order. Later in my program I find however that they are not exactly sorted anymore. For example:
0.65801139819
0.6545651031 <-- a
0.65456513001 <-- b
0.64422968678
The two numbers in the middle are flipped. One might think that this problem lies in the representations of the numbers, and they are just printed out wrong. But I compare each number with the previous one using the same operator I use to sort them - there is no conversion to base 10 or similar going on:
double last_pt = 0;
for (int i = 0; i < npoints; i++) {
if (last_pt && last_pt < track[i]->Pt()) {
cout << "ERROR: new value " << track[i]->Pt()
<< " is higher than previous value " << last_pt << endl;
}
last_pt = track[i]->Pt();
}
The values are compared during sorting by
bool moreThan(const Track& a, const Track& b) {
return a.Pt() > b.Pt();
}
and I made sure that they are always double, and not converted to float. Pt() returns a double. There are no NaNs in the list, and I don't touch the list after sorting.
Why is this, what's wrong with these numbers, and (how) can I sort the numbers so that they stay sorted?

Are you sure you're not converting double to float at some time? Let us take a look at binary representation of these two numbers:
0 01111111110 0100111100100011001010000011110111010101101100010101
0 01111111110 0100111100100011001010010010010011111101011010001001
In double we've got 1 bit of sign, 11 bits of exponent and 53 bits of mantissa, while in float there's 1 bit of sign, 8 bit of exponent and 23 bits of mantissa. Notice that mantissa in both numbers are identical at their first 23 bits.
Depending on rounding method, there would be different behaviour. In case when bits >23 are just trimmed, these two numbers as float are identical:
0 011111110 01001111001000110010100 (trim: 00011110111010101101100010101)
0 011111110 01001111001000110010100 (trim: 10010010011111101011010001001)

You're comparing the return value of a function. Floating point return
values are returned in a floating point register, which has higher
precision than a double. When comparing two such values (e.g. a.Pt() >
b.Pt()), the compiler will call one of the functions, store the return
value in an unnamed temporary of type double (thus rounding the
results to double), then call the other function, and compare its
results (still in the floating point register, and not rounded to
double) with the stored value. This means that you can end up with
cases where a.Pt() > b.Pt() and b.Pt() > a.Pt(), or a.Pt() >
a.Pt(). Which will cause sort to get more than a little confused.
(Formally, if we're talking about std::sort here, this results in
undefined behavior, and I've heard of cases where it did cause a core
dump.)
On the other hand, you say that Pt() "just returns a double field".
If Pt() does no calculation what so ever; if it's only:
double Pt() const { return someDouble; }
, then this shouldn't be an issue (provided someDouble has type
double). The extended precision can represent all possible double
values exactly.

C++: how can I test if a number is power of ten?

I want to test if a number double x is an integer power of 10. I could perhaps use cmath's log10 and then test if x == (int) x?
edit: Actually, my solution does not work because doubles can be very big, much bigger than int, and also very small, like fractions.

A lookup table will be by far the fastest and most precise way to do this; only about 600 powers of 10 are representable as doubles. You can use a hash table, or if the table is ordered from smallest to largest, you can rapidly search it with binary chop.
This has the advantage that you will get a "hit" if and only if your number is exactly the closest possible IEEE double to some power of 10. If this isn't what you want, you need to be more precise about exactly how you would like your solution to handle the fact that many powers of 10 can't be exactly represented as doubles.
The best way to construct the table is probably to use string -> float conversion; that way hopefully your library authors will already have solved the problem of how to do the conversion in a way that gives the most precise answer possible.

Your solution sounds good but I would replace the exact comparison with a tolerance one.
double exponent = log10(value);
double rounded = floor(exponent + 0.5);
if (fabs(exponent - rounded) < some_tolerance) {
//Power of ten
}

I am afraid you're in for a world of hurt. There is no way to cast down a very large or very small floating point number to a BigInt class because you lost precision when using the small floating point number.
For example float only has 6 digits of precision. So if you represent 109 as a float chances are it will be converted back as 1 000 000 145 or something like that: nothing guarantees what the last digits will be, they are off the precision.
You can of course use a much more precise representation, like double which has 15 digits of precision. So normally you should be able to represent integers from 0 to 1014 faithfully.
Finally some platforms may have a long long type with an ever greater precision.
But anyway, as soon as your value exceed the number of digits available to be converted back to an integer without loss... you can't test it for being a power of ten.
If you really need this precision, my suggestion is not to use a floating point number. There are mathematical libraries available with BigInt implementations or you can roll your own (though efficiency is difficult to achieve).

bool power_of_ten(double x) {
if(x < 1.0 || x > 10E15) {
warning("IEEE754 doubles can only precisely represent powers "
"of ten between 1 and 10E15, answer will be approximate.");
}
double exponent;
// power of ten if log10 of absolute value has no fractional part
return !modf(log10(fabs(x)), &exponent);
}

Depending on the platform your code needs to run on the log might be very expensive.
Since the amount of numbers that are 10^n (where n is natural) is very small,
it might be faster to just use a hardcoded lookup table.
(Ugly pseudo code follows:)
bool isPowerOfTen( int16 x )
{
if( x == 10 // n=1
|| x == 100 // n=2
|| x == 1000 // n=3
|| x == 10000 ) // n=4
return true;
return false;
}
This covers the whole int16 range and if that is all you need might be a lot faster.
(Depending on the platform.)

How about a code like this:
#include <stdio.h>
#define MAX 20
bool check_pow10(double num)
{
char arr[MAX];
sprintf(arr,"%lf",num);
char* ptr = arr;
bool isFirstOne = true;
while (*ptr)
{
switch (*ptr++)
{
case '1':
if (isFirstOne)
isFirstOne = false;
else
return false;
break;
case '0':
break;
case '.':
break;
default:
return false;
}
}
return true;
}
int main()
{
double number;
scanf("%lf",&number);
printf("isPower10: %s\n",check_pow10(number)?"yes":"no");
}
That would not work for negative powers of 10 though.
EDIT: works for negative powers also.

if you don't need it to be fast, use recursion. Pseudocode:
bool checkifpoweroften(double Candidadte)
if Candidate>=10
return (checkifpoweroften(Candidadte/10)
elsif Candidate<=0.1
return (checkifpoweroften(Candidadte*10)
elsif Candidate == 1
return 1
else
return 0
You still need to choose between false positives and false negatives and add tolerances accordingly, as other answers pointed out. The tolerances should apply to all comparisons, or else, for exemple, 9.99999999 would fail the >=10 comparison.

how about that:
bool isPow10(double number, double epsilon)
{
if (number > 0)
{
for (int i=1; i <16; i++)
{
if ( (number >= (pow((double)10,i) - epsilon)) &&
(number <= (pow((double)10,i) + epsilon)))
{
return true;
}
}
}
return false;
}
I guess if performance is an issue the few values could be precomputed, with or without the epsilon according to the needs.

A variant of this one:
double log10_value= log10(value);
double integer_value;
double fractional_value= modf(log10_value, &integer_value);
return fractional_value==0.0;
Note that the comparison to 0.0 is exact rather than within a particular epsilon since you want to ensure that log10_value is an integer.
EDIT: Since this sparked a bit of controversy due to log10 possibly being imprecise and the generic understanding that you shouldn't compare doubles without an epsilon, here's a more precise way of determining if a double is a power of 10 using only properties of powers of 10 and IEEE 754 doubles.
First, a clarification: a double can represent up to 1E22, as 1e22 has only 52 significant bits. Luckily, 5^22 also only has 52 significant bits, so we can determine if a double is (2*5)^n for n= [0, 22]:
bool is_pow10(double value)
{
int exponent;
double mantissa= frexp(value, &exponent);
int exponent_adjustment= exponent/10;
int possible_10_exponent= (exponent - exponent_adjustment)/3;
if (possible_10_exponent>=0 &&
possible_10_exponent<=22)
{
mantissa*= pow(2.0, exponent - possible_10_exponent);
return mantissa==pow(5.0, possible_10_exponent);
}
else
{
return false;
}
}
Since 2^10==1024, that adds an extra bit of significance that we have to remove from the possible power of 5.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ round floating numbers to set precision - c++

Generally, == should not be used to compare doubles, you should do something like : if(v[0] - point.v[0] < 1e-9) { } You can use abs or fabs if you are not sure of the sign and change the precision 1e-9 accordingly.

Related

Display of Double Precision Floating Points Vs Their comparrison

How to get rid of -0 in C++

Do multiples of Pi to the thousandths have a value that may change how a loop executes?

Sorting/comparison of doubles in C(++) not stable?

C++: how can I test if a number is power of ten?

Categories

Resources