Split floating point number into fractional and integral parts - c++

I'm writing a template class designed to work with any floating point type. For some of the methods I need to split a number into its integral and fractional parts. With primitive floating point types I can just cast to an integer to truncate the fractional part, but this wouldn't work with a big number class. Ideally my class would only use the four basic arithmetic operations (addition, subtraction, multiplication, division) in its calculations.
The method below is the solution I came up with. All it does is subtract powers of ten until the original number is less than 1. It works well, but seems like a brute-force approach. Is there a more efficient way to do this?
template< typename T >
class Math
{
public:
static T modf( T const & x, T & intpart )
{
T sub = 1;
T ret = x;
while( x >= sub )
{
sub *= 10;
}
sub /= 10;
while( sub >= 1 )
{
while( ret >= sub )
{
ret -= sub;
}
sub /= 10;
}//while [sub] > 0
intpart = x - ret;
return ret;
}
}
Note that I've removed the sign management code for brevity.

You could perhaps replace the subtraction loop with a binary search, although that's not an improvement in complexity class.
What you have requires a number of subtractions approximately equal to the sum of the decimal digits of x, whereas a binary search requires a number of addition-and-divide-by-two operations approximately equal to 3-and-a-bit times the number of decimal digits of x.
With what you're doing and with the binary search, there's no particular reason to use powers of 10 when looking for the upper bound, you could use any number. Some other number might be a bit quicker on average, although it probably depends on the type T.
Btw, I would also be tempted to make modf a function template within Math (or a free template function in a namespace), rather than Math a class template. That way you can specialize or overload one function at a time for particular types (especially the built-in types) without having to specialize the whole of Math.
Example:
namespace Math
{
template <typename T>
T modf( T const & x, T & intpart )
{ ... }
}
Call it like this:
float f = 1.5, fint;
std::cout << Math::modf(f, fint) << '\n';
double d = 2.5, dint;
std::cout << Math::modf(d, dint) << '\n';
mpf_class ff(3.5), ffint(0); // GNU multi-precision
std::cout << Math::modf(ff, ffint) << '\n';
Overload it like this:
namespace Math {
double modf(double x, double &intpart) {
return std::modf(x, &intpart);
}
mpf_class modf(const mpf_class &x, mpf_class &intpart) {
intpart = floor(x);
return x - intpart;
}
}

mb use std::modf is better?
for custom type you can release Math class specialization.
#include <cmath>
#include <iostream>
template<typename T>
class Math
{
public:
static T modf(const T& x, T& integral_part)
{
return std::modf(x, &integral_part);
}
};
int main()
{
double d_part = 0.;
double res = Math<double>::modf(5.2123, d_part);
std::cout << d_part << " " << res << std::endl;
}

I don't know how strict your "ideally use only mathematical operations" restraint is, but nonetheless for the fractional part, could you extract it to a string and convert back to a float?

Related

C++ - dealing with infinitesimal numbers

I need to find some way to deal with infinitesimial double values.
For example:
exp(-0.00000000000000000000000000000100000000000000000003)= 0.99999999999999999999999999999899999999999999999997
But exp function produce result = 1.000000000000000000000000000000
So my first thought was to make my own exp function. Unfortunately I am getting same output.
double my_exp(double x)
{
bool minus = x < 0;
x = abs(x);
double exp = (double)1 + x;
double temp = x;
for (int i = 2; i < 100000; i++)
{
temp *= x / (double)i;
exp = exp + temp;
}
return minus ? exp : (double)1 / exp;
}
I found that issue is when such small numbers like 1.00000000000000000003e-030 doesn't work well when we try to subtract it, neither both if we subtracting or adding such a small number the result always is equal to 1.
Have U any idea how to manage with this?
Try using std::expm1
Computes the e (Euler's number, 2.7182818) raised to the given power
arg, minus 1.0. This function is more accurate than the expression
std::exp(arg)-1.0 if arg is close to zero.
#include <iostream>
#include <cmath>
int main()
{
std::cout << "expm1(-0.00000000000000000000000000000100000000000000000003) = " << std::expm1(-0.00000000000000000000000000000100000000000000000003) << '\n';
}
Run the example in the below source by changing the arguments to your very small numbers.
Source: https://en.cppreference.com/w/cpp/numeric/math/expm1
I think the best way of dealing with such small numbers is to use existing libraries. You could try GMP starting with their example to calculate billions of digits of pi. Another library, MPFR which is based on GMP, seems to be a good choice. I don't know when to choose one over the other.

How to get the coefficient from a std::decimal?

Background
I want to write an is_even( decimal::decimal64 d ) function that returns true if the least-significant digit is even.
Unfortunately, I can't seem to find any methods to extract the coefficient from a decimal64.
Code
#include <iostream>
#include <decimal/decimal>
using namespace std;
static bool is_even( decimal::decimal64 d )
{
return true; // fix this - want to: return coefficient(d)%2==0;
}
int main()
{
auto d1 = decimal::make_decimal64( 60817ull, -4 ); // not even
auto d2 = decimal::make_decimal64( 60816ull, -4 ); // is even
cout << decimal64_to_float( d1 ) << " " << is_even( d1 ) << endl;
cout << decimal64_to_float( d2 ) << " " << is_even( d2 ) << endl;
return 0;
}
It's a little odd that there's no provided function to recover the coefficient of a decimal; but you can just multiply by 10 raised to its negative exponent:
bool is_even(decimal::decimal64 d)
{
auto q = quantexpd64(d);
auto coeff = static_cast<long long>(d * decimal::make_decimal64(1, -q));
return coeff % 2 == 0;
}
assert(!is_even(decimal::make_decimal64(60817ull, -4)));
assert(!is_even(decimal::make_decimal64(60816ull, -4)));
I would use corresponding fmod function if possible.
static bool is_even( decimal::decimal64 d )
{
auto e = quantexpd64(d);
auto divisor = decimal::make_decimal64(2, e);
return decimal::fmodd64(d, divisor) == decimal::make_decimal64(0,0);
}
It constructs a divisor that is 2*10^e where e is exponent of the tested value. Then it performs fmod and checks whether it is equal to a decimal 0. (NOTE: operator== for decimal is said to be IEEE 754-2008 conformant so we don't need to take care of -0.0).
An alternative would be to multiply the number by 10^-e (to "normalize" it) and cast it to an integer type and traditionally check modulo. I think this is #ecatmur's proposal. Though the "normalization" might fail if it goes out of chosen integer type bounds.
I think fmod is better when it comes to overflows. You are guaranteed to hold 2*10^e given that is a proper d decimal (i.e. not a NaN, or an inf).
One caveat I see is the definition of least significant digit. The above methods assume that least significant digit is denoted by e, which sometimes might be counterintuitive. I.e. is decimal(21,2) even? Then is decimal(2100,0)?

How can I check whether two numbers are within "x" significant figures of the precision limits of the floating point type?

So suppose we have a float type XType in which we have two numbers:
XType const a = 1.2345
XType const b = 1.2300
Then I want a function IsClose(XType const f1,XType const f2,unsigned const truncated_figures) such that
// the numbers are equal if the last two figures are ignored (1.23 == 1.23)
IsClose<XType>(a,b,2) == true
// the numbers are not equal if only the last is ignored (1.234 != 1.230)
IsClose<XType>(a,b,1) == false
So far I have this ugly mess, but I'm yet to convince myself it's correct:
// check if two floating point numbers are close to within "figures_tolerance" figures of precision for the applicable type
template <typename FloatType>
bool const IsClose(FloatType const f1, FloatType const f2, unsigned const figures_tolerance)
{
FloatType const tolerance_exponent = std::pow(10.0,figures_tolerance);
FloatType const tolerance =
std::pow(tolerance_exponent,std::log10(f1)) *
std::numeric_limits<FloatType>::epsilon()
;
return std::abs(f1 - f2) < tolerance;
}
My reasoning is that the tolerance should be the epsilon raised to the order of magnitude that the number exceeds or subseeds 1.0 (the significant figures for which the epsilon is based). Does this make sense? Is there a better, more reliable way?
EDIT: My solution using the template function is below (it is based on user763305's answer below)
// check if two floating point numbers are within the last n digits of precision for the
// largest of the two numbers being compared.
template <typename FloatType>
bool const IsWithinPrecision(FloatType const f1, FloatType const f2, unsigned const n = 1U)
{
FloatType const f_ref = std::max(std::abs(f1), std::abs(f2));
FloatType const distance = std::abs(f1 - f2);
FloatType const e = std::numeric_limits<FloatType>::epsilon();
return distance < std::pow((FloatType) 10.0, (FloatType) n) * e * f_ref;
}
To test whether two numbers are within n significant digits of each other, use the inequality
abs(a - b) < pow(0.1, n) * max(abs(a), abs(b))
However, I usually find it more useful to test if the number of significant digits is at least the maximum possible number of significant digits (given the precision of the floating point type) minus n. This can be done using the inequality
abs(a - b) < pow(10.0, n) * std::numeric_limits<...>::epsilon() * max(abs(a), abs(b))
In other words, n is the number of significant digits we have lost through rounding errors. Something like n = 2 or 3 usually works in practice.
The reason this works is that the distances between a floating point number a and the next representable floating point numbers below and above a lie between
0.5 * std::numeric_limits<...>::epsilon() * abs(a)
and
std::numeric_limits<...>::epsilon() * abs(a)
Also, the above inequality does not work if you are dealing with very small, or more precisely, denormal numbers. Then you should instead use the inequality
abs(a - b) < pow(10.0, n) * max(
std::numeric_limits<...>::epsilon() * max(abs(a), abs(b)),
std::numeric_limits<...>::denorm_min()
)
Since this is just for debugging, it may be possible to be lax and use a simple test for relative error, such as:
if (fabs(f1 - f2) <= SomeNumber * fabs(f2)) ThingsAreGood else ThingsAreBad;
This supposes that f2 is the known good (or at least known-to-be-better) value and that the error from rounding in floating-point operations is proportional to f2. Note that computations can produce errors in complicated ways. For example, if various other values were added to and subtracted from f1 along the way, so that intermediate values had much larger magnitudes than the final result represented by f2, then the rounding errors may be proportional to those large intermediate values rather than to f2. In this case, you may need to compute an error threshold based on the intermediate calculations rather than on f2.
Taking care of the situation pointed out by Eric Postpischil as well, This function tells whether the 2 numbers are close enough or not according to the precision.
bool const IsClose(FloatType const f1, FloatType const f2, unsigned const figures_tolerance)
{
FloatType res = f1-f2;
res = res*pow(10.0,figures_tolerance);
return !bool(int(res));
}
The solution used in the edit answer did not work in my case for large number comparison.
I wrote a function using string comparison with a given digit precision
#include <iomanip>
/**
* Compare two number with a given digit precision
*
* #tparam T - Number precision
*
* #param n1 - First number to compare
* #param n2 - Second number to compare
* #param n - The first n digits that must be equals between the two numbers
*
* #return True if the n first digits of the two numbers are equals, false otherwise
*/
template<typename T>
bool isEqual(T n1, T n2, int n)
{
int index = 0;
std::ostringstream a, b;
a << std::setprecision(n);
b << std::setprecision(n);
std::cout << std::setprecision(n);
a << std::fixed;
b << std::fixed;
std::cout << std::fixed;
a << n1;
b << n2;
while (a.str()[index] == b.str()[index] && index < n) {
index++;
}
if (index != n) {
std::cout << "n1 != n2\n\nn1 = " << a.str() << "\nn2 = " << b.str() << "\ndiffer at index " << index << std::endl;
}
return index == n;
}

Sudoku checker algorithm in C++

I'm currently working on an algorithm to find all numbers with 9 digits using numbers 1-9 without any repeats. I'm testing a theory I have that filtering numbers as such will make for a more efficient sudoku checker.
The code that I implemented does the following. It uses a for loop for places 1-9 in a number, such that (a)(b)(c)(d)(e)(f)(g)(h)(i) = #########.
My theory is that by checking if the sum of the numbers (a-i) is equal to 45, that the product of a through i is equal to 9! and that the sum of the inverses of a-i is equal to roughly 2.828968 (or 1 + 1/2 + 1/3 ... 1/9)
The issue is that after I filter the 9-digit numbers by the sum of the inverses of a-i, the count of possible 9-digit numbers predicted is less than 9! (the actual amount of possible numbers). I'm not sure why it's filtering so much, but the numbers that it does catch do not have any repeats (which is good).
My thoughts are that the way I am playing with doubles is messing up the algorithm.
Here is my code:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
int product;
int sum;
int count=0;
double inverseSum;
double correctInverseSum=(1.0/1.0)+(1.0/2.0)+(1.0/3.0)+(1.0/4.0)+(1.0/5.0)+
(1.0/6.0)+(1.0/7.0)+(1.0/8.0)+(1.0/9.0);
for(double a=1.0; a<10.0; a++){
for(double b=1.0; b<10.0; b++){
for(double c=1.0; c<10.0; c++){
for(double d=1.0; d<10.0; d++){
for(double e=1.0; e<10.0; e++){
for(double f=1.0; f<10.0; f++){
for(double g=1.0; g<10.0; g++){
for(double h=1.0; h<10.0; h++){
for(double i=1.0; i<10.0; i++){
product=a*b*c*d*e*f*g*h*i;
sum=a+b+c+d+e+f+g+h+i;
if(product==9*8*7*6*5*4*3*2*1 && sum==45){
inverseSum=(1.0/a)+(1.0/b)+(1.0/c)+(1.0/d)+
(1.0/e)+(1.0/f)+(1.0/g)+(1.0/h)+(1.0/i);
if(inverseSum==correctInverseSum)
{
count++;
}
}
}
}
}
}
}
}
}
}
}
cout<<"This is the count:"<<count<<endl;
return 0;
}
Now that I washed my eyes after seeing so many for loops, I'd say a candidate is:
if(inverseSum==correctInverseSum)
doubles aren't exactly representable, so you'll have to check for equality using a small epsilon. Something like:
if (fabs(inverseSum - correctInverseSum) < std::numeric_limits<double>::epsilon())
You'll need to #include <limits>.
You're going to need some error tolerance in your checking:
if(fabs(inverseSum-correctInverseSum) < 1e-6) count++
Alternatively, multiply through by 9!, you get
b*c*d*e*f*g*h*i + a*c*d*e*f*g*h*i ...
(one missing factor in each term the sum). Then you can use integer arithmetic instead of floats.
Let's run a quick experiment: Let's try to compute the inverse sum from big to small and in reverse order:
#include <algorithm>
#include <numeric>
#include <iostream>
#include <iterator>
#include <vector>
struct generator
{
generator(): d_value() {}
double operator()() { return 1.0 / ++this->d_value; }
double d_value;
};
int main()
{
std::vector<double> values;
std::generate_n(std::back_inserter(values), 9, generator());
double ordered(std::accumulate(values.begin(), values.end(), 0.0));
double reversed(std::accumulate(values.rbegin(), values.rend(), 0.0));
std::cout << "ordered=" << ordered << " "
<< "reversed=" << reversed << " "
<< "difference=" << (reversed - ordered) << " "
<< "\n";
}
If this where exact math, clearly this should yield the same sum. After all, they are the same set of values. Unfortunately, it turns out that the values are not exactly the same. Here is the output it shows for me:
ordered=2.82897 reversed=2.82897 difference=4.44089e-16
The problem is that the values are not exact and adding two of these non-exact values introduces some error. Often the error doesn't matter too much but trying to compare the results for identity won't work: depending on the order of the operations different operands with different rounded results are involved.
An old adage, but please: Don't repeat yourself.
Keep it DRY.
When you find yourself writing this kind of code you should ask yourself why Do I need to repeat myself in this way.
There are plenty of other options.
1 - recursion. get yourself comfortable with the concept.
2 - the mod operator for i = 0 to 100 r = i % 10, c = i / 10
3 - reevaluating the problem. You are trying to solve a problem that is harder than necessary
Haven't you heard about std::bitset? You only need nine bits to verify, which is probably within your budget.
I've been meaning to get some practice with variadic templates, so I wrote this for you: (c++11 experts, feel free to rip it to pieces.)
#include <bitset>
#include <iostream>
template<unsigned long i>
bool test_helper(std::bitset<i> seen) {
return seen.count() == i;
}
template<unsigned long i, typename T, typename... Args>
bool test_helper(std::bitset<i> seen, T arg1, Args... args) {
return test_helper(seen.set(arg1 - 1), args...);
}
template<typename... Args>
bool test(Args... args) {
return test_helper(std::bitset<sizeof... (Args)>(), args...);
}
template<unsigned long size, bool done = false>
struct Counter {
template<typename ... Args>
unsigned long operator()(Args... args) {
unsigned long count = 0;
for (int a = 1; a < 10; ++a)
count += Counter<size, size == sizeof...(Args)+1>()(a, args...);
return count;
}
};
template<unsigned long i>
struct Counter<i, true> {
template<typename ... Args>
unsigned long operator()(Args... args) {
return test(args...);
}
};
int main(int argc, char** argv) {
std::cout << Counter<9>()() << std::endl;
return 0;
}
If you really insist on using complicated and heuristics, you could also get some experience with rational arithmetic to compute your inverse sum. It should be clear sum of 1/ai is Σj (Πi ai)/aj all divided by Πi ai; you're already computing the denominator, so it only is necessary to compute the numerator, whose maximum value is 99. But, still, the bitset solution seems a lot simpler to me.

How to correctly and standardly compare floats?

Every time I start a new project and when I need to compare some float or double variables I write the code like this one:
if (fabs(prev.min[i] - cur->min[i]) < 0.000001 &&
fabs(prev.max[i] - cur->max[i]) < 0.000001) {
continue;
}
Then I want to get rid of these magic variables 0.000001(and 0.00000000001 for double) and fabs, so I write an inline function and some defines:
#define FLOAT_TOL 0.000001
So I wonder if there is any standard way of doing this? May be some standard header file?
It would be also nice to have float and double limits(min and max values)
From The Floating-Point Guide:
This is a bad way to do it because a
fixed epsilon chosen because it “looks
small” could actually be way too large
when the numbers being compared are
very small as well. The comparison
would return “true” for numbers that
are quite different. And when the
numbers are very large, the epsilon
could end up being smaller than the
smallest rounding error, so that the
comparison always returns “false”.
The problem with the "magic number" here is not that it's hardcoded but that it's "magic": you didn't really have a reason for choosing 0.000001 over 0.000005 or 0.0000000000001, did you? Note that float can approximately represent the latter and still smaller values - it's just about 7 decimals of precision after the first nonzero digit!
If you're going to use a fixed epsilon, you should really choose it according to the requirements of the particular piece of code where you use it. The alternative is to use a relative error margin (see link at the top for details) or, even better, or compare the floats as integers.
The Standard provides an epsilon value. It's in <limits> and you can access the value by std::numeric_limits<float>::epsilon and std::numeric_limits<double>::epsilon. There are other values in there, but I didn't check what exactly is.
You can use std::nextafter for testing two double with the smallest epsilon on a value (or a factor of the smallest epsilon).
bool nearly_equal(double a, double b)
{
return std::nextafter(a, std::numeric_limits<double>::lowest()) <= b
&& std::nextafter(a, std::numeric_limits<double>::max()) >= b;
}
bool nearly_equal(double a, double b, int factor /* a factor of epsilon */)
{
double min_a = a - (a - std::nextafter(a, std::numeric_limits<double>::lowest())) * factor;
double max_a = a + (std::nextafter(a, std::numeric_limits<double>::max()) - a) * factor;
return min_a <= b && max_a >= b;
}
Thanks for your answers, they helped me a lot. I've read these materials:first and second
The answer is to use my own function for relative comparison:
bool areEqualRel(float a, float b, float epsilon) {
return (fabs(a - b) <= epsilon * std::max(fabs(a), fabs(b)));
}
This is the most suitable solution for my needs. However I've wrote some tests and other comparison methods. I hope this will be useful for somebody. areEqualRel passes these tests, others don't.
#include <iostream>
#include <limits>
#include <algorithm>
using std::cout;
using std::max;
bool areEqualAbs(float a, float b, float epsilon) {
return (fabs(a - b) <= epsilon);
}
bool areEqual(float a, float b, float epsilon) {
return (fabs(a - b) <= epsilon * std::max(1.0f, std::max(a, b)));
}
bool areEqualRel(float a, float b, float epsilon) {
return (fabs(a - b) <= epsilon * std::max(fabs(a), fabs(b)));
}
int main(int argc, char *argv[])
{
cout << "minimum: " << FLT_MIN << "\n";
cout << "maximum: " << FLT_MAX << "\n";
cout << "epsilon: " << FLT_EPSILON << "\n";
float a = 0.0000001f;
float b = 0.0000002f;
if (areEqualRel(a, b, FLT_EPSILON)) {
cout << "are equal a: " << a << " b: " << b << "\n";
}
a = 1000001.f;
b = 1000002.f;
if (areEqualRel(a, b, FLT_EPSILON)) {
cout << "are equal a: " << a << " b: " << b << "\n";
}
}
Here is a c++11 implementation of #geotavros 's solution. It makes use of the new std::numeric_limits<T>::epsilon() function and the fact that std::fabs() and std::fmax() now have overloads for float, double and long float.
template<typename T>
static bool AreEqual(T f1, T f2) {
return (std::fabs(f1 - f2) <= std::numeric_limits<T>::epsilon() * std::fmax(std::fabs(f1), std::fabs(f2)));
}
You should use the standard define in float.h:
#define DBL_EPSILON 2.2204460492503131e-016 /* smallest float value such that 1.0+DBL_EPSILON != 1.0 */
or the numeric_limits class:
// excerpt
template<>
class numeric_limits<float> : public _Num_float_base
{
public:
typedef float T;
// return minimum value
static T (min)() throw();
// return smallest effective increment from 1.0
static T epsilon() throw();
// return largest rounding error
static T round_error() throw();
// return minimum denormalized value
static T denorm_min() throw();
};
[EDIT: Made it just a little bit more readable.]
But in addition, it depends on what you're after.
You should be aware that if you are comparing two floats for equality, you
are intrinsically doing the wrong thing. Adding a slop factor to the comparison
is not good enough.
This post has a comprehensive explanation of how to compare floating point numbers:
http://www.altdevblogaday.com/2012/02/22/comparing-floating-point-numbers-2012-edition/
Excerpt:
If you are comparing against zero, then relative epsilons and ULPs based comparisons are usually meaningless. You’ll need to use an
absolute epsilon, whose value might be some small multiple of
FLT_EPSILON and the inputs to your calculation. Maybe.
If you are comparing against a non-zero number then relative epsilons or ULPs based comparisons are probably what you want. You’ll
probably want some small multiple of FLT_EPSILON for your relative
epsilon, or some small number of ULPs. An absolute epsilon could be
used if you knew exactly what number you were comparing against.