Converting a floating-point decimal value to a fraction - c++

Given a decimal floating-point value, how can you find its fractional equivalent/approximation? For example:
as_fraction(0.1) -> 1/10
as_fraction(0.333333) -> 1/3
as_fraction(514.0/37.0) -> 514/37
Is there a general algorithm that can convert a decimal number to fractional form? How can this be implemented simply and efficiently in C++?

First get the fractional part and then take the gcd. Use the Euclidean algorithm http://en.wikipedia.org/wiki/Euclidean_algorithm
void foo(double input)
{
double integral = std::floor(input);
double frac = input - integral;
const long precision = 1000000000; // This is the accuracy.
long gcd_ = gcd(round(frac * precision), precision);
long denominator = precision / gcd_;
long numerator = round(frac * precision) / gcd_;
std::cout << integral << " + ";
std::cout << numerator << " / " << denominator << std::endl;
}
long gcd(long a, long b)
{
if (a == 0)
return b;
else if (b == 0)
return a;
if (a < b)
return gcd(a, b % a);
else
return gcd(b, a % b);
}

#include <iostream>
#include <valarray>
using namespace std;
void as_fraction(double number, int cycles = 10, double precision = 5e-4){
int sign = number > 0 ? 1 : -1;
number = number * sign; //abs(number);
double new_number,whole_part;
double decimal_part = number - (int)number;
int counter = 0;
valarray<double> vec_1{double((int) number), 1}, vec_2{1,0}, temporary;
while(decimal_part > precision & counter < cycles){
new_number = 1 / decimal_part;
whole_part = (int) new_number;
temporary = vec_1;
vec_1 = whole_part * vec_1 + vec_2;
vec_2 = temporary;
decimal_part = new_number - whole_part;
counter += 1;
}
cout<<"x: "<< number <<"\tFraction: " << sign * vec_1[0]<<'/'<< vec_1[1]<<endl;
}
int main()
{
as_fraction(3.142857);
as_fraction(0.1);
as_fraction(0.333333);
as_fraction(514.0/37.0);
as_fraction(1.17171717);
as_fraction(-1.17);
}
x: 3.14286 Fraction: 22/7
x: 0.1 Fraction: 1/10
x: 0.333333 Fraction: 1/3
x: 13.8919 Fraction: 514/37
x: 1.17172 Fraction: 116/99
x: 1.17 Fraction: -117/100
Sometimes you would want to approximate the decimal, without needing the equivalence. Eg pi=3.14159 is approximated as 22/7 or 355/113. We could use the cycles argument to obtain these:
as_fraction(3.14159, 1);
as_fraction(3.14159, 2);
as_fraction(3.14159, 3);
x: 3.14159 Fraction: 22/7
x: 3.14159 Fraction: 333/106
x: 3.14159 Fraction: 355/113

(Too long for a comment.)
Some comments claim that this is not possible. But I am of a contrary opinion.
I am of the opinion that it is possible in the right interpretation, but it is too easy to misstate the question or misunderstand the answer.
The question posed here is to find rational approximation(s) to a given floating point value.
This is certainly possible since floating point formats used in C++ can only store rational values, most often in the form of sign/mantissa/exponent. Taking IEEE-754 single precision format as an example (to keep the numbers simpler), 0.333 is stored as 1499698695241728 * 2^(-52). That is equivalent to the fraction 1499698695241728 / 2^52 whose convergents provide increasingly accurate approximations, all the way up to the original value: 1/3, 333/1000, 77590/233003, 5586813/16777216.
Two points of note here.
For a variable float x = 0.333; the best rational approximation is not necessarily 333 / 1000, since the stored value is not exactly 0.333 but rather 0.333000004291534423828125 because of the limited precision of the internal representation of floating points.
Once assigned, a floating point value has no memory of where it came from, or whether the source code had it defined as float x = 0.333; vs. float x = 0.333000004; because both of those values have the same internal representation. This is why the (related, but different) problem of separating a string representation of a number (for example, a user-entered value) into integer and fractional parts cannot be solved by first converting to floating point then running floating point calculations on the converted value.
[ EDIT ]   Following is the step-by-step detail of the 0.333f example.
The code to convert a float to an exact fraction.
#include <cfloat>
#include <cmath>
#include <limits>
#include <iostream>
#include <iomanip>
void flo2frac(float val, unsigned long long* num, unsigned long long* den, int* pwr)
{
float mul = std::powf(FLT_RADIX, FLT_MANT_DIG);
*den = (unsigned long long)mul;
*num = (unsigned long long)(std::frexp(val, pwr) * mul);
pwr -= FLT_MANT_DIG;
}
void cout_flo2frac(float val)
{
unsigned long long num, den; int pwr;
flo2frac(val, &num, &den, &pwr);
std::cout.precision(std::numeric_limits<float>::max_digits10);
std::cout << val << " = " << num << " / " << den << " * " << FLT_RADIX << "^(" << pwr << ")" << std::endl;
}
int main()
{
cout_flo2frac(0.333f);
}
Output.
0.333000004 = 11173626 / 16777216 * 2^(-1)
This gives the rational representation of float val = 0.333f; as 5586813/16777216.
What remains to be done is determine the convergents of the exact fraction, which can be done using integer calculations, only. The end result is (courtesy WA):
0, 1/3, 333/1000, 77590/233003, 5586813/16777216

I came up with an algorithm for this problem, but I think it is too lengthy and can be accomplished with less lines of code. Sorry about the poor indentation it is hard trying to align everything on overflow.
#include <iostream>
using namespace std;
// converts the string half of the inputed decimal number into numerical values void converting
(string decimalNumber, float&numerator, float& denominator )
{ float number; string valueAfterPoint =decimalNumber.substr(decimalNumber.find("." ((decimalNumber.length() -1) )); // store the value after the decimal into a valueAfterPoint
int length = valueAfterPoint.length(); //stores the length of the value after the decimal point into length
numerator = atof(valueAfterPoint.c_str()); // converts the string type decimal number into a float value and stores it into the numerator
// loop increases the decimal value of the numerator by multiples of ten as long as the length is above zero of the decimal
for (; length > 0; length--)
numerator *= 10;
do
denominator *=10;
while (denominator < numerator);
// simplifies the the converted values of the numerator and denominator into simpler values for an easier to read output
void simplifying (float& numerator, float& denominator) { int maximumNumber = 9; //Numbers in the tenths place can only range from zero to nine so the maximum number for a position in a position for the decimal number will be nine
bool isDivisble; // is used as a checker to verify whether the value of the numerator has the found the dividing number that will a value of zero
// Will check to see if the numerator divided denominator is will equal to zero
if(int(numerator) % int(denominator) == 0) {
numerator /= denominator;
denominator = 1;
return; }
//check to see if the maximum number is greater than the denominator to simplify to lowest form while (maximumNumber < denominator) { maximumNumber *=10; }
// the maximum number loops from nine to zero. This conditions stops if the function isDivisible is true
for(; maximumNumber > 0;maximumNumber --){
isDivisble = ((int(numerator) % maximumNumber == 0) && int(denominator)% maximumNumber == 0);
if(isDivisble)
{
numerator /= maximumNumber; // when is divisible true numerator be devided by the max number value for example 25/5 = numerator = 5
denominator /= maximumNumber; //// when is divisible true denominator be devided by themax number value for example 100/5 = denominator = 20
}
// stop value if numerator and denominator is lower than 17 than it is at the lowest value
int stop = numerator + denominator;
if (stop < 17)
{
return;
} } }

I agree completely with dxiv's solution but I needed a more general function (I threw in the signed stuff for fun because my use cases only included positive values):
#include <concepts>
/**
* \brief Multiply two numbers together checking for overflow.
* \tparam T The unsigned integral type to check for multiplicative overflow.
* \param a The multiplier.
* \param b The multicland.
* \return The result and a value indicating whether the multiplication
* overflowed.
*/
template<std::unsigned_integral T>
auto mul_overflow(T a, T b) -> std::tuple<T, bool>
{
size_t constexpr shift{ std::numeric_limits<T>::digits / 2 };
T constexpr mask{ (T{ 1 } << shift) - T{ 1 } };
T const a_high = a >> shift;
T const a_low = a & mask;
T const b_high = b >> shift;
T const b_low = b & mask;
T const low_low{ a_low * b_low };
if (!(a_high || b_high))
{
return { low_low, false };
}
bool overflowed = a_high && b_high;
T const low_high{ a_low * b_high };
T const high_low{ a_high * b_low };
T const ret{ low_low + ((low_high + high_low) << shift) };
return
{
ret,
overflowed
|| ret < low_low
|| (low_high >> shift) != 0
|| (high_low >> shift) != 0
};
}
/**
* \brief Converts a floating point value to a numerator and
* denominator pair.
*
* If the floating point value is larger than the maximum that the Tout
* type can hold, the results are silly.
*
* \tparam Tout The integral output type.
* \tparam Tin The floating point input type.
* \param f The value to convert to a numerator and denominator.
* \return The numerator and denominator.
*/
template <std::integral Tout, std::floating_point Tin>
auto to_fraction(Tin f) -> std::tuple<Tout, Tout>
{
const Tin multiplier
{
std::pow(std::numeric_limits<Tin>::radix,
std::numeric_limits<Tin>::digits)
};
uint64_t denominator{ static_cast<uint64_t>(multiplier) };
int power;
Tout num_fix{ 1 };
if constexpr (std::is_signed_v<Tout>)
{
num_fix = f < static_cast<Tin>(0) ? -1 : 1;
f = std::abs(f);
}
uint64_t numerator
{
static_cast<uint64_t>(std::frexp(f, &power) * multiplier)
};
uint64_t const factor
{
static_cast<uint64_t>(std::pow(
std::numeric_limits<Tin>::radix, std::abs(power)))
};
if (power > 0)
{
while(true)
{
auto const [res, overflow]{ mul_overflow(numerator, factor) };
if (!overflow)
{
numerator = res;
break;
}
numerator >>= 1;
denominator >>= 1;
}
}
else
{
while (true)
{
auto const [res, overflow]{ mul_overflow(denominator, factor) };
if (!overflow)
{
denominator = res;
break;
}
numerator >>= 1;
denominator >>= 1;
}
}
// get the results into the requested sized integrals.
while ((numerator > std::numeric_limits<Tout>::max()
|| denominator > std::numeric_limits<Tout>::max())
&& denominator > 1)
{
numerator >>= 1;
denominator >>= 1;
}
return
{
num_fix * static_cast<Tout>(numerator),
static_cast<Tout>(denominator)
};
}
You can call this like:
auto [n, d] { to_fraction<int8_t>(-124.777f) };
And you get n=-124, d=1;
auto [n, d] { to_fraction<uint64_t>(.33333333333333) };
gives n=6004799503160601, d=18014398509481984

#include<iostream>
#include<cmath>
#include<algorithm>
#include<functional>
#include<iomanip>
#include<string>
#include<vector>
#include <exception>
#include <sstream>
// note using std = c++11
// header section
#ifndef rational_H
#define rational_H
struct invalid : std::exception {
const char* what() const noexcept { return "not a number\n"; }};
struct Fraction {
public:
long long value{0};
long long numerator{0};
long long denominator{0};
}; Fraction F;
class fraction : public Fraction{
public:
fraction() {}
void ctf(double &);
void get_fraction(std::string& w, std::string& d, long double& n) {
F.value = (long long )n;
set_whole_part(w);
set_fraction_part(d);
make_fraction();
}
long long set_whole_part(std::string& w) {
return whole = std::stoll(w);
}
long long set_fraction_part(std::string& d) {
return decimal = std::stoll(d);
}
void make_fraction();
bool cmpf(long long&, long long&, const long double& epsilon);
int Euclids_method(long long&, long long&);
long long get_f_part() { return decimal; };
void convert(std::vector<long long>&);
bool is_negative{ false };
friend std::ostream& operator<<(std::ostream& os, fraction& ff);
struct get_sub_length;
private:
long long whole{ 0 };
long long decimal{ 0 };
};
#endif // rational_H
// definitions/source
struct get_sub_length {
size_t sub_len{};
size_t set_decimal_length(size_t& n) {
sub_len = n;
return sub_len;
}
size_t get_decimal_length() { return sub_len; }
}; get_sub_length slen;
struct coefficient {
std::vector<long long>coef;
}; coefficient C;
//compare's the value returned by convert with the original
// decimal value entered.
//if its within the tolarence of the epsilon consider it the best
//approximation you can get.
//feel free to experiment with the epsilon.
//for better results.
bool fraction::cmpf(long long& n1, long long& d1, const long double& epsilon = 0.0000005)
{
long double ex = pow(10, slen.get_decimal_length());
long long d = get_f_part(); // the original fractional part to use for comparison.
long double a = (long double)d / ex;
long double b = ((long double)d1 / (long double)n1);
if ((fabs(a - b) <= epsilon)) { return true; }
return false;
}
//Euclids algorithm returns the cofficients of a continued fraction through recursive division,
//for example: 0.375 -> 1/(375/1000) (note: for the fractional portion only).
// 1000/375 -> Remainder of 2.6666.... and 1000 -(2*375)=250,using only the integer value
// 375/250 -> Remainder of 1.5 and 375-(1*250)=125,
// 250/125 -> Remainder of 2.0 and 250-(2*125)=2
//the coefficients of the continued fraction are the integer values 2,1,2
// These are generally written [0;2,1,2] or [0;2,1,1,1] were 0 is the whole number value.
int fraction::Euclids_method(long long& n_dec, long long& exp)
{
long long quotient = 0;
if ((exp >= 1) && (n_dec != 0)) {
quotient = exp / n_dec;
C.coef.push_back(quotient);
long long divisor = n_dec;
long long dividend = exp - (quotient * n_dec);
Euclids_method(dividend, divisor); // recursive division
}
return 0;
}
// Convert is adding the elements stored in coef as a simple continued fraction
// which should result in a good approximation of the original decimal number.
void fraction::convert(std::vector<long long>& coef)
{
std::vector<long long>::iterator pos;
pos = C.coef.begin(), C.coef.end();
long long n1 = 0;
long long n2 = 1;
long long d1 = 1;
long long d2 = 0;
for_each(C.coef.begin(), C.coef.end(), [&](size_t pos) {
if (cmpf(n1, d1) == false) {
F.numerator = (n1 * pos) + n2;
n2 = n1;
n1 = F.numerator;
F.denominator = (d1 * pos) + d2;
d2 = d1;
d1 = F.denominator;
}
});
//flip the fraction back over to format the correct output.
F.numerator = d1;
F.denominator = n1;
}
// creates a fraction from the decimal component
// insures its in its abs form to ease calculations.
void fraction::make_fraction() {
size_t count = slen.get_decimal_length();
long long n_dec = decimal;
long long exp = (long long)pow(10, count);
Euclids_method(n_dec, exp);
convert(C.coef);
}
std::string get_w(const std::string& s)
{
std::string st = "0";
std::string::size_type pos;
pos = s.find(".");
if (pos - 1 == std::string::npos) {
st = "0";
return st;
}
else { st = s.substr(0, pos);
return st;
}
if (!(s.find("."))){
st = "0";
return st;
}
return st;
}
std::string get_d(const std::string& s)
{
std::string st = "0";
std::string::size_type pos;
pos = s.find(".");
if (pos == std::string::npos) {
st = "0";
return st;
}
std::string sub = s.substr(pos + 1);
st = sub;
size_t sub_len = sub.length();
slen.set_decimal_length(sub_len);
return st;
}
void fraction::ctf(double& nn)
{
//using stringstream for conversion to string
std::istringstream is;
is >> nn;
std::ostringstream os;
os << std::fixed << std::setprecision(14) << nn;
std::string s = os.str();
is_negative = false; //reset for loops
C.coef.erase(C.coef.begin(), C.coef.end()); //reset for loops
long double n = 0.0;
int m = 0;
//The whole number part will be seperated from the decimal part leaving a pure fraction.
//In such cases using Euclids agorithm would take the reciprocal 1/(n/exp) or exp/n.
//for pure continued fractions the cf must start with 0 + 1/(n+1/(n+...etc
//So the vector is initilized with zero as its first element.
C.coef.push_back(m);
std::cout << '\n';
if (s == "q") { // for loop structures
exit(0);
}
if (s.front() == '-') { // flag negative values.
is_negative = true; // represent nagative in output
s.erase(remove(s.begin(), s.end(), '-'), s.end()); // using abs
}
// w, d, seperate the string components
std::string w = get_w(s);
std::string d = get_d(s);
try
{
if (!(n = std::stold(s))) {throw invalid(); } // string_to_double()
get_fraction(w, d, n);
}
catch (std::exception& e) {
std::cout << e.what();
std::cout <<'\n'<< std::endl;
}
}
// The ostream formats and displays the various outputs
std::ostream& operator<<(std::ostream& os, fraction& f)
{
std::cout << '\n';
if (f.is_negative == true) {
os << "The coefficients are [" << '-' << f.whole << ";";
for (size_t i = 1; i < C.coef.size(); ++i) {
os << C.coef[i] << ',';
}
std::cout << "]" << '\n';
os << "The cf is: " << '-' << f.whole;
for (size_t i = 1; i < C.coef.size(); ++i) {
os << "+1/(" << C.coef[i];
}
for (size_t i = 1; i < C.coef.size(); ++i) {
os << ')';
}
std::cout << '\n';
if (F.value >= 1 && F.numerator == 0 && F.denominator == 1) {
F.numerator = abs(f.whole);
os << '-' << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator == 0 && F.denominator == 1) {
os << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator != 0 && F.denominator != 0) {
os << '-' << abs(F.numerator) << '/' << abs(F.denominator) << '\n';
return os;
}
else if (F.numerator == 0 && F.denominator == 0) {
os << '-' << f.whole << '\n';
return os;
}
else
os << '-' << (abs(f.whole) * abs(F.denominator) + abs(F.numerator)) << '/' << abs(F.denominator) << '\n';
}
if (f.is_negative == false) {
os << "The coefficients are [" << f.whole << ";";
for (size_t i = 1; i < C.coef.size(); ++i) {
os << C.coef[i] << ',';
}
std::cout << "]" << '\n';
os << "The cf is: " << f.whole;
for (size_t i = 1; i < C.coef.size(); ++i) {
os << "+1/(" << C.coef[i];
}
for (size_t i = 1; i < C.coef.size(); ++i) {
os << ')';
}
std::cout << '\n';
if (F.value >= 1 && F.numerator == 0 && F.denominator == 1) {
F.numerator = abs(f.whole);
os << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator != 0 && F.denominator != 0) {
os << abs(F.numerator) << '/' << abs(F.denominator) << '\n';
return os;
}
else if (F.numerator == 0 && F.denominator == 0) {
os << f.whole << '\n';
return os;
}
else
os << (abs(f.whole) * abs(F.denominator) + abs(F.numerator)) << '/' << abs(F.denominator) << '\n';
os << f.whole << ' ' << F.numerator << '/' << F.denominator << '\n';
}
return os;
}
int main()
{
fraction f;
double s = 0;
std::cout << "Enter a number to convert to a fraction\n";
std::cout << "Enter a \"q\" to quit\n";
// uncomment for a loop
while (std::cin >> s) {
f.ctf(s);
std::cout << f << std::endl;
}
// comment out these lines if you want the loop
//std::cin >> s;
//f.ctf(s);
//std::cout << f << std::endl;
}

Related

C++ - How to count the length of the digits after the dot in a double / float?

How to count the length of the digits after the dot in a double / float?
Without std::string.
So the length after the dot of 1234.5678 is 4.
And how should you correctly use epsilon in such a situation?
I have something like this, one version with epsilon and another without, neither work fully.
// Version 1.
template <typename T> inline constexpr
unsigned int get_length(T x, const bool& include_dot = false) {
unsigned int len = 0;
x = abs(x);
auto c = x - floor(x);
T factor = 10;
T eps = epsilon() * c;
while (c > eps && c < 1 - eps) {
c = x * factor;
c = c - floor(c);
factor *= 10;
eps = epsilon() * x * factor;
++len;
}
if (include_dot && len > 0) { ++len; }
return len;
}
// Version 2.
template <typename T> inline constexpr
unsigned int get_length(const T& x, const bool& include_dot = false, const unsigned int& max_consequtive_zeros = 6) {
unsigned int len = 0;
unsigned int zero_count = 0;
short digit;
auto decimals = get_decimals(x);
while (decimals != 0 && len < 14) {
decimals *= 10;
digit = decimals;
if (digit == 0) {
if (zero_count >= max_consequtive_zeros) { break; }
++zero_count;
}
else { zero_count = 0; }
decimals -= digit;
++len;
// std::cout << len << ": " << decimals << " zeros: " << zero_count << "\n";
}
if (include_dot && len > 0) { ++len; }
return len - zero_count;
}
A floating point number doesn't store a value in decimal format. It stores it in binary.
For example, if you try to store 234.56 in a float, the closest value that can actually be stored is:
234.55999755859375
1234.5678 is 1234.5677490234375
(go play with https://www.h-schmidt.net/FloatConverter/IEEE754.html to see for yourself)
As such, the length (in decimal digits) of a number in a float is ill-defined. To keep your sanity, please define an epsilon based on the magnitude, not on the number of digits.

Convert decimal to a fraction [duplicate]

Given a decimal floating-point value, how can you find its fractional equivalent/approximation? For example:
as_fraction(0.1) -> 1/10
as_fraction(0.333333) -> 1/3
as_fraction(514.0/37.0) -> 514/37
Is there a general algorithm that can convert a decimal number to fractional form? How can this be implemented simply and efficiently in C++?
First get the fractional part and then take the gcd. Use the Euclidean algorithm http://en.wikipedia.org/wiki/Euclidean_algorithm
void foo(double input)
{
double integral = std::floor(input);
double frac = input - integral;
const long precision = 1000000000; // This is the accuracy.
long gcd_ = gcd(round(frac * precision), precision);
long denominator = precision / gcd_;
long numerator = round(frac * precision) / gcd_;
std::cout << integral << " + ";
std::cout << numerator << " / " << denominator << std::endl;
}
long gcd(long a, long b)
{
if (a == 0)
return b;
else if (b == 0)
return a;
if (a < b)
return gcd(a, b % a);
else
return gcd(b, a % b);
}
#include <iostream>
#include <valarray>
using namespace std;
void as_fraction(double number, int cycles = 10, double precision = 5e-4){
int sign = number > 0 ? 1 : -1;
number = number * sign; //abs(number);
double new_number,whole_part;
double decimal_part = number - (int)number;
int counter = 0;
valarray<double> vec_1{double((int) number), 1}, vec_2{1,0}, temporary;
while(decimal_part > precision & counter < cycles){
new_number = 1 / decimal_part;
whole_part = (int) new_number;
temporary = vec_1;
vec_1 = whole_part * vec_1 + vec_2;
vec_2 = temporary;
decimal_part = new_number - whole_part;
counter += 1;
}
cout<<"x: "<< number <<"\tFraction: " << sign * vec_1[0]<<'/'<< vec_1[1]<<endl;
}
int main()
{
as_fraction(3.142857);
as_fraction(0.1);
as_fraction(0.333333);
as_fraction(514.0/37.0);
as_fraction(1.17171717);
as_fraction(-1.17);
}
x: 3.14286 Fraction: 22/7
x: 0.1 Fraction: 1/10
x: 0.333333 Fraction: 1/3
x: 13.8919 Fraction: 514/37
x: 1.17172 Fraction: 116/99
x: 1.17 Fraction: -117/100
Sometimes you would want to approximate the decimal, without needing the equivalence. Eg pi=3.14159 is approximated as 22/7 or 355/113. We could use the cycles argument to obtain these:
as_fraction(3.14159, 1);
as_fraction(3.14159, 2);
as_fraction(3.14159, 3);
x: 3.14159 Fraction: 22/7
x: 3.14159 Fraction: 333/106
x: 3.14159 Fraction: 355/113
(Too long for a comment.)
Some comments claim that this is not possible. But I am of a contrary opinion.
I am of the opinion that it is possible in the right interpretation, but it is too easy to misstate the question or misunderstand the answer.
The question posed here is to find rational approximation(s) to a given floating point value.
This is certainly possible since floating point formats used in C++ can only store rational values, most often in the form of sign/mantissa/exponent. Taking IEEE-754 single precision format as an example (to keep the numbers simpler), 0.333 is stored as 1499698695241728 * 2^(-52). That is equivalent to the fraction 1499698695241728 / 2^52 whose convergents provide increasingly accurate approximations, all the way up to the original value: 1/3, 333/1000, 77590/233003, 5586813/16777216.
Two points of note here.
For a variable float x = 0.333; the best rational approximation is not necessarily 333 / 1000, since the stored value is not exactly 0.333 but rather 0.333000004291534423828125 because of the limited precision of the internal representation of floating points.
Once assigned, a floating point value has no memory of where it came from, or whether the source code had it defined as float x = 0.333; vs. float x = 0.333000004; because both of those values have the same internal representation. This is why the (related, but different) problem of separating a string representation of a number (for example, a user-entered value) into integer and fractional parts cannot be solved by first converting to floating point then running floating point calculations on the converted value.
[ EDIT ]   Following is the step-by-step detail of the 0.333f example.
The code to convert a float to an exact fraction.
#include <cfloat>
#include <cmath>
#include <limits>
#include <iostream>
#include <iomanip>
void flo2frac(float val, unsigned long long* num, unsigned long long* den, int* pwr)
{
float mul = std::powf(FLT_RADIX, FLT_MANT_DIG);
*den = (unsigned long long)mul;
*num = (unsigned long long)(std::frexp(val, pwr) * mul);
pwr -= FLT_MANT_DIG;
}
void cout_flo2frac(float val)
{
unsigned long long num, den; int pwr;
flo2frac(val, &num, &den, &pwr);
std::cout.precision(std::numeric_limits<float>::max_digits10);
std::cout << val << " = " << num << " / " << den << " * " << FLT_RADIX << "^(" << pwr << ")" << std::endl;
}
int main()
{
cout_flo2frac(0.333f);
}
Output.
0.333000004 = 11173626 / 16777216 * 2^(-1)
This gives the rational representation of float val = 0.333f; as 5586813/16777216.
What remains to be done is determine the convergents of the exact fraction, which can be done using integer calculations, only. The end result is (courtesy WA):
0, 1/3, 333/1000, 77590/233003, 5586813/16777216
I came up with an algorithm for this problem, but I think it is too lengthy and can be accomplished with less lines of code. Sorry about the poor indentation it is hard trying to align everything on overflow.
#include <iostream>
using namespace std;
// converts the string half of the inputed decimal number into numerical values void converting
(string decimalNumber, float&numerator, float& denominator )
{ float number; string valueAfterPoint =decimalNumber.substr(decimalNumber.find("." ((decimalNumber.length() -1) )); // store the value after the decimal into a valueAfterPoint
int length = valueAfterPoint.length(); //stores the length of the value after the decimal point into length
numerator = atof(valueAfterPoint.c_str()); // converts the string type decimal number into a float value and stores it into the numerator
// loop increases the decimal value of the numerator by multiples of ten as long as the length is above zero of the decimal
for (; length > 0; length--)
numerator *= 10;
do
denominator *=10;
while (denominator < numerator);
// simplifies the the converted values of the numerator and denominator into simpler values for an easier to read output
void simplifying (float& numerator, float& denominator) { int maximumNumber = 9; //Numbers in the tenths place can only range from zero to nine so the maximum number for a position in a position for the decimal number will be nine
bool isDivisble; // is used as a checker to verify whether the value of the numerator has the found the dividing number that will a value of zero
// Will check to see if the numerator divided denominator is will equal to zero
if(int(numerator) % int(denominator) == 0) {
numerator /= denominator;
denominator = 1;
return; }
//check to see if the maximum number is greater than the denominator to simplify to lowest form while (maximumNumber < denominator) { maximumNumber *=10; }
// the maximum number loops from nine to zero. This conditions stops if the function isDivisible is true
for(; maximumNumber > 0;maximumNumber --){
isDivisble = ((int(numerator) % maximumNumber == 0) && int(denominator)% maximumNumber == 0);
if(isDivisble)
{
numerator /= maximumNumber; // when is divisible true numerator be devided by the max number value for example 25/5 = numerator = 5
denominator /= maximumNumber; //// when is divisible true denominator be devided by themax number value for example 100/5 = denominator = 20
}
// stop value if numerator and denominator is lower than 17 than it is at the lowest value
int stop = numerator + denominator;
if (stop < 17)
{
return;
} } }
I agree completely with dxiv's solution but I needed a more general function (I threw in the signed stuff for fun because my use cases only included positive values):
#include <concepts>
/**
* \brief Multiply two numbers together checking for overflow.
* \tparam T The unsigned integral type to check for multiplicative overflow.
* \param a The multiplier.
* \param b The multicland.
* \return The result and a value indicating whether the multiplication
* overflowed.
*/
template<std::unsigned_integral T>
auto mul_overflow(T a, T b) -> std::tuple<T, bool>
{
size_t constexpr shift{ std::numeric_limits<T>::digits / 2 };
T constexpr mask{ (T{ 1 } << shift) - T{ 1 } };
T const a_high = a >> shift;
T const a_low = a & mask;
T const b_high = b >> shift;
T const b_low = b & mask;
T const low_low{ a_low * b_low };
if (!(a_high || b_high))
{
return { low_low, false };
}
bool overflowed = a_high && b_high;
T const low_high{ a_low * b_high };
T const high_low{ a_high * b_low };
T const ret{ low_low + ((low_high + high_low) << shift) };
return
{
ret,
overflowed
|| ret < low_low
|| (low_high >> shift) != 0
|| (high_low >> shift) != 0
};
}
/**
* \brief Converts a floating point value to a numerator and
* denominator pair.
*
* If the floating point value is larger than the maximum that the Tout
* type can hold, the results are silly.
*
* \tparam Tout The integral output type.
* \tparam Tin The floating point input type.
* \param f The value to convert to a numerator and denominator.
* \return The numerator and denominator.
*/
template <std::integral Tout, std::floating_point Tin>
auto to_fraction(Tin f) -> std::tuple<Tout, Tout>
{
const Tin multiplier
{
std::pow(std::numeric_limits<Tin>::radix,
std::numeric_limits<Tin>::digits)
};
uint64_t denominator{ static_cast<uint64_t>(multiplier) };
int power;
Tout num_fix{ 1 };
if constexpr (std::is_signed_v<Tout>)
{
num_fix = f < static_cast<Tin>(0) ? -1 : 1;
f = std::abs(f);
}
uint64_t numerator
{
static_cast<uint64_t>(std::frexp(f, &power) * multiplier)
};
uint64_t const factor
{
static_cast<uint64_t>(std::pow(
std::numeric_limits<Tin>::radix, std::abs(power)))
};
if (power > 0)
{
while(true)
{
auto const [res, overflow]{ mul_overflow(numerator, factor) };
if (!overflow)
{
numerator = res;
break;
}
numerator >>= 1;
denominator >>= 1;
}
}
else
{
while (true)
{
auto const [res, overflow]{ mul_overflow(denominator, factor) };
if (!overflow)
{
denominator = res;
break;
}
numerator >>= 1;
denominator >>= 1;
}
}
// get the results into the requested sized integrals.
while ((numerator > std::numeric_limits<Tout>::max()
|| denominator > std::numeric_limits<Tout>::max())
&& denominator > 1)
{
numerator >>= 1;
denominator >>= 1;
}
return
{
num_fix * static_cast<Tout>(numerator),
static_cast<Tout>(denominator)
};
}
You can call this like:
auto [n, d] { to_fraction<int8_t>(-124.777f) };
And you get n=-124, d=1;
auto [n, d] { to_fraction<uint64_t>(.33333333333333) };
gives n=6004799503160601, d=18014398509481984
#include<iostream>
#include<cmath>
#include<algorithm>
#include<functional>
#include<iomanip>
#include<string>
#include<vector>
#include <exception>
#include <sstream>
// note using std = c++11
// header section
#ifndef rational_H
#define rational_H
struct invalid : std::exception {
const char* what() const noexcept { return "not a number\n"; }};
struct Fraction {
public:
long long value{0};
long long numerator{0};
long long denominator{0};
}; Fraction F;
class fraction : public Fraction{
public:
fraction() {}
void ctf(double &);
void get_fraction(std::string& w, std::string& d, long double& n) {
F.value = (long long )n;
set_whole_part(w);
set_fraction_part(d);
make_fraction();
}
long long set_whole_part(std::string& w) {
return whole = std::stoll(w);
}
long long set_fraction_part(std::string& d) {
return decimal = std::stoll(d);
}
void make_fraction();
bool cmpf(long long&, long long&, const long double& epsilon);
int Euclids_method(long long&, long long&);
long long get_f_part() { return decimal; };
void convert(std::vector<long long>&);
bool is_negative{ false };
friend std::ostream& operator<<(std::ostream& os, fraction& ff);
struct get_sub_length;
private:
long long whole{ 0 };
long long decimal{ 0 };
};
#endif // rational_H
// definitions/source
struct get_sub_length {
size_t sub_len{};
size_t set_decimal_length(size_t& n) {
sub_len = n;
return sub_len;
}
size_t get_decimal_length() { return sub_len; }
}; get_sub_length slen;
struct coefficient {
std::vector<long long>coef;
}; coefficient C;
//compare's the value returned by convert with the original
// decimal value entered.
//if its within the tolarence of the epsilon consider it the best
//approximation you can get.
//feel free to experiment with the epsilon.
//for better results.
bool fraction::cmpf(long long& n1, long long& d1, const long double& epsilon = 0.0000005)
{
long double ex = pow(10, slen.get_decimal_length());
long long d = get_f_part(); // the original fractional part to use for comparison.
long double a = (long double)d / ex;
long double b = ((long double)d1 / (long double)n1);
if ((fabs(a - b) <= epsilon)) { return true; }
return false;
}
//Euclids algorithm returns the cofficients of a continued fraction through recursive division,
//for example: 0.375 -> 1/(375/1000) (note: for the fractional portion only).
// 1000/375 -> Remainder of 2.6666.... and 1000 -(2*375)=250,using only the integer value
// 375/250 -> Remainder of 1.5 and 375-(1*250)=125,
// 250/125 -> Remainder of 2.0 and 250-(2*125)=2
//the coefficients of the continued fraction are the integer values 2,1,2
// These are generally written [0;2,1,2] or [0;2,1,1,1] were 0 is the whole number value.
int fraction::Euclids_method(long long& n_dec, long long& exp)
{
long long quotient = 0;
if ((exp >= 1) && (n_dec != 0)) {
quotient = exp / n_dec;
C.coef.push_back(quotient);
long long divisor = n_dec;
long long dividend = exp - (quotient * n_dec);
Euclids_method(dividend, divisor); // recursive division
}
return 0;
}
// Convert is adding the elements stored in coef as a simple continued fraction
// which should result in a good approximation of the original decimal number.
void fraction::convert(std::vector<long long>& coef)
{
std::vector<long long>::iterator pos;
pos = C.coef.begin(), C.coef.end();
long long n1 = 0;
long long n2 = 1;
long long d1 = 1;
long long d2 = 0;
for_each(C.coef.begin(), C.coef.end(), [&](size_t pos) {
if (cmpf(n1, d1) == false) {
F.numerator = (n1 * pos) + n2;
n2 = n1;
n1 = F.numerator;
F.denominator = (d1 * pos) + d2;
d2 = d1;
d1 = F.denominator;
}
});
//flip the fraction back over to format the correct output.
F.numerator = d1;
F.denominator = n1;
}
// creates a fraction from the decimal component
// insures its in its abs form to ease calculations.
void fraction::make_fraction() {
size_t count = slen.get_decimal_length();
long long n_dec = decimal;
long long exp = (long long)pow(10, count);
Euclids_method(n_dec, exp);
convert(C.coef);
}
std::string get_w(const std::string& s)
{
std::string st = "0";
std::string::size_type pos;
pos = s.find(".");
if (pos - 1 == std::string::npos) {
st = "0";
return st;
}
else { st = s.substr(0, pos);
return st;
}
if (!(s.find("."))){
st = "0";
return st;
}
return st;
}
std::string get_d(const std::string& s)
{
std::string st = "0";
std::string::size_type pos;
pos = s.find(".");
if (pos == std::string::npos) {
st = "0";
return st;
}
std::string sub = s.substr(pos + 1);
st = sub;
size_t sub_len = sub.length();
slen.set_decimal_length(sub_len);
return st;
}
void fraction::ctf(double& nn)
{
//using stringstream for conversion to string
std::istringstream is;
is >> nn;
std::ostringstream os;
os << std::fixed << std::setprecision(14) << nn;
std::string s = os.str();
is_negative = false; //reset for loops
C.coef.erase(C.coef.begin(), C.coef.end()); //reset for loops
long double n = 0.0;
int m = 0;
//The whole number part will be seperated from the decimal part leaving a pure fraction.
//In such cases using Euclids agorithm would take the reciprocal 1/(n/exp) or exp/n.
//for pure continued fractions the cf must start with 0 + 1/(n+1/(n+...etc
//So the vector is initilized with zero as its first element.
C.coef.push_back(m);
std::cout << '\n';
if (s == "q") { // for loop structures
exit(0);
}
if (s.front() == '-') { // flag negative values.
is_negative = true; // represent nagative in output
s.erase(remove(s.begin(), s.end(), '-'), s.end()); // using abs
}
// w, d, seperate the string components
std::string w = get_w(s);
std::string d = get_d(s);
try
{
if (!(n = std::stold(s))) {throw invalid(); } // string_to_double()
get_fraction(w, d, n);
}
catch (std::exception& e) {
std::cout << e.what();
std::cout <<'\n'<< std::endl;
}
}
// The ostream formats and displays the various outputs
std::ostream& operator<<(std::ostream& os, fraction& f)
{
std::cout << '\n';
if (f.is_negative == true) {
os << "The coefficients are [" << '-' << f.whole << ";";
for (size_t i = 1; i < C.coef.size(); ++i) {
os << C.coef[i] << ',';
}
std::cout << "]" << '\n';
os << "The cf is: " << '-' << f.whole;
for (size_t i = 1; i < C.coef.size(); ++i) {
os << "+1/(" << C.coef[i];
}
for (size_t i = 1; i < C.coef.size(); ++i) {
os << ')';
}
std::cout << '\n';
if (F.value >= 1 && F.numerator == 0 && F.denominator == 1) {
F.numerator = abs(f.whole);
os << '-' << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator == 0 && F.denominator == 1) {
os << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator != 0 && F.denominator != 0) {
os << '-' << abs(F.numerator) << '/' << abs(F.denominator) << '\n';
return os;
}
else if (F.numerator == 0 && F.denominator == 0) {
os << '-' << f.whole << '\n';
return os;
}
else
os << '-' << (abs(f.whole) * abs(F.denominator) + abs(F.numerator)) << '/' << abs(F.denominator) << '\n';
}
if (f.is_negative == false) {
os << "The coefficients are [" << f.whole << ";";
for (size_t i = 1; i < C.coef.size(); ++i) {
os << C.coef[i] << ',';
}
std::cout << "]" << '\n';
os << "The cf is: " << f.whole;
for (size_t i = 1; i < C.coef.size(); ++i) {
os << "+1/(" << C.coef[i];
}
for (size_t i = 1; i < C.coef.size(); ++i) {
os << ')';
}
std::cout << '\n';
if (F.value >= 1 && F.numerator == 0 && F.denominator == 1) {
F.numerator = abs(f.whole);
os << F.numerator << '/' << F.denominator << '\n';
return os;
}
else if (F.value == 0 && F.numerator != 0 && F.denominator != 0) {
os << abs(F.numerator) << '/' << abs(F.denominator) << '\n';
return os;
}
else if (F.numerator == 0 && F.denominator == 0) {
os << f.whole << '\n';
return os;
}
else
os << (abs(f.whole) * abs(F.denominator) + abs(F.numerator)) << '/' << abs(F.denominator) << '\n';
os << f.whole << ' ' << F.numerator << '/' << F.denominator << '\n';
}
return os;
}
int main()
{
fraction f;
double s = 0;
std::cout << "Enter a number to convert to a fraction\n";
std::cout << "Enter a \"q\" to quit\n";
// uncomment for a loop
while (std::cin >> s) {
f.ctf(s);
std::cout << f << std::endl;
}
// comment out these lines if you want the loop
//std::cin >> s;
//f.ctf(s);
//std::cout << f << std::endl;
}

C++ - serialize double to binary file in little endian

I'm trying to implement a function that writes double to binary file in little endian byte order.
So far I have class BinaryWriter implementation:
void BinaryWriter::open_file_stream( const String& path )
{
// open output stream
m_fstream.open( path.c_str(), std::ios_base::out | std::ios_base::binary);
m_fstream.imbue(std::locale::classic());
}
void BinaryWriter::write( int v )
{
char data[4];
data[0] = static_cast<char>(v & 0xFF);
data[1] = static_cast<char>((v >> 8) & 0xFF);
data[2] = static_cast<char>((v >> 16) & 0xFF);
data[3] = static_cast<char>((v >> 24) & 0xFF);
m_fstream.write(data, 4);
}
void BinaryWriter::write( double v )
{
// TBD
}
void BinaryWriter::write( int v ) was implemented using Sven answer to What is the correct way to output hex data to a file? post.
Not sure how to implement void BinaryWriter::write( double v ).
I tried naively follow void BinaryWriter::write( int v ) implementation but it didn't work. I guess I don't fully understand the implementation.
Thank you guys
You didn't write this, but I'm assuming the machine you're running on is BIG endian, otherwise writing a double is the same as writing an int, only it's 8 bytes.
const int __one__ = 1;
const bool isCpuLittleEndian = 1 == *(char*)(&__one__); // CPU endianness
const bool isFileLittleEndian = false; // output endianness - you choose :)
void BinaryWriter::write( double v )
{
if (isCpuLittleEndian ^ isFileLittleEndian) {
char data[8], *pDouble = (char*)(double*)(&v);
for (int i = 0; i < 8; ++i) {
data[i] = pDouble[7-i];
}
m_fstream.write(data, 8);
}
else
m_fstream.write((char*)(&v), 8);
}
But don't forget generally int is 4 octects and double is 8 octets.
Other problem is static_cast. See this example :
double d = 6.1;
char c = static_cast(d); //c == 6
Solution reinterpret value with pointer :
double d = 6.1;
char* c = reinterpret_cast<char*>(&d);
After, you can use write( Int_64 *v ), which is a extension from write( Int_t v ).
You can use this method with :
double d = 45612.9874
binary_writer.write64(reinterpret_cast<int_64*>(&d));
Don't forget size_of(double) depend of system.
A little program converting doubles to an IEEE little endian representation.
Besides the test in to_little_endian, it should work on any machine.
include <cmath>
#include <cstdint>
#include <cstring>
#include <iostream>
#include <limits>
#include <sstream>
#include <random>
bool to_little_endian(double value) {
enum { zero_exponent = 0x3ff };
uint8_t sgn = 0; // 1 bit
uint16_t exponent = 0; // 11 bits
uint64_t fraction = 0; // 52 bits
double d = value;
if(std::signbit(d)) {
sgn = 1;
d = -d;
}
if(std::isinf(d)) {
exponent = 0x7ff;
}
else if(std::isnan(d)) {
exponent = 0x7ff;
fraction = 0x8000000000000;
}
else if(d) {
int e;
double f = frexp(d, &e);
// A leading one is implicit.
// Hence one has has a zero fraction and the zero_exponent:
exponent = uint16_t(e + zero_exponent - 1);
unsigned bits = 0;
while(f) {
f *= 2;
fraction <<= 1;
if (1 <= f) {
fraction |= 1;
f -= 1;
}
++bits;
}
fraction = (fraction << (53 - bits)) & ((uint64_t(1) << 52) - 1);
}
// Little endian representation.
uint8_t data[sizeof(double)];
for(unsigned i = 0; i < 6; ++i) {
data[i] = fraction & 0xFF;
fraction >>= 8;
}
data[6] = (exponent << 4) | fraction;
data[7] = (sgn << 7) | (exponent >> 4);
// This test works on a little endian machine, only.
double result = *(double*) &data;
if(result == value || (std::isnan(result) && std::isnan(value))) return true;
else {
struct DoubleLittleEndian {
uint64_t fraction : 52;
uint64_t exp : 11;
uint64_t sgn : 1;
};
DoubleLittleEndian little_endian;
std::memcpy(&little_endian, &data, sizeof(double));
std::cout << std::hex
<< " Result: " << result << '\n'
<< "Fraction: " << little_endian.fraction << '\n'
<< " Exp: " << little_endian.exp << '\n'
<< " Sgn: " << little_endian.sgn << '\n'
<< std::endl;
std::memcpy(&little_endian, &value, sizeof(value));
std::cout << std::hex
<< " Value: " << value << '\n'
<< "Fraction: " << little_endian.fraction << '\n'
<< " Exp: " << little_endian.exp << '\n'
<< " Sgn: " << little_endian.sgn
<< std::endl;
return false;
}
}
int main()
{
to_little_endian(+1.0);
to_little_endian(+0.0);
to_little_endian(-0.0);
to_little_endian(+std::numeric_limits<double>::infinity());
to_little_endian(-std::numeric_limits<double>::infinity());
to_little_endian(std::numeric_limits<double>::quiet_NaN());
std::uniform_real_distribution<double> distribute(-100, +100);
std::default_random_engine random;
for (unsigned loop = 0; loop < 10000; ++loop) {
double value = distribute(random);
to_little_endian(value);
}
return 0;
}

Different output with c++ pi approximation [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Vastly different output C++ monte carlo approximation
On my 64-bit ubuntu computer, the following code works as expected, and returns a close approximation for pi with both algorithms. However, on the lab machine, where I must demo the code, a 32-bit rhel 3 machine, the second algorithm always returns 4, and I cannot figure out why. Any insight would be appreciated.
/*
* RandomNumber.h
*
*
*
*/
#ifndef RANDOMNUMBER_H_
#define RANDOMNUMBER_H_
class RandomNumber {
public:
RandomNumber() {
x = time(NULL);
m = pow(2, 31); //some constant value
M = 65915 * 7915; //multiply of some simple numbers p and q
method = 1;
}
RandomNumber(int seed) {
x = ((seed > 0) ? seed : time(NULL));
m = pow(2, 31); //some constant value
method = 1; //method number
M = 6543 * 7915; //multiply of some simple numbers p and q
}
void setSeed(long int seed) {
x = seed; //set start value
}
void chooseMethod(int method) {
this->method = ((method > 0 && method <= 2) ? method : 1); //choose one of two method
}
long int linearCongruential() { //first generator, that uses linear congruential method
long int c = 0; // some constant
long int a = 69069; //some constant
x = (a * x + c) % m; //solution next value
return x;
}
long int BBS() { //algorithm Blum - Blum - Shub
x = (long int) (pow(x, 2)) % M;
return x;
}
double nextPoint() { //return random number in range (-1;1)
double point;
if (method == 1) //use first method
point = linearCongruential() / double(m);
else
point = BBS() / double(M);
return point;
}
private:
long int x; //current value
long int m; // some range for first method
long int M; //some range for second method
int method; //method number
};
#endif /* RANDOMNUMBER_H_ */
And the test class:
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <iomanip>
#include "RandomNumber.h"
using namespace std;
int main() {
cout.setf(ios::fixed);
cout.precision(6);
RandomNumber random;
srand((unsigned) time(NULL));
cout << "---------------------------------" << endl;
cout << " Monte Carlo Pi Approximation" << endl;
cout << "---------------------------------" << endl;
cout << " Enter number of points: ";
long int k1;
cin >> k1;
cout << "Select generator number: ";
int method;
cin >> method;
random.chooseMethod(method);
cout << "---------------------------------" << endl;
long int k2 = 0;
double sumX = 0;
double sumY = 0;
for (long int i = 0; i < k1; i++) {
double x = pow(-1, int(random.nextPoint() * 10) % 2)
* random.nextPoint();
double y = pow(-1, int(random.nextPoint() * 10) % 2)
* random.nextPoint();
sumX += x;
sumY += y;
if ((pow(x, 2) + pow(y, 2)) <= 1)
k2++;
}
double pi = 4 * (double(k2) / k1);
cout << "M(X) = " << setw(10) << sumX / k1 << endl; //mathematical expectation of x
cout << "M(Y) = " << setw(10) << sumY / k1 << endl; //mathematical expectation of y
cout << endl << "Pi = " << pi << endl << endl; //approximate Pi
return 0;
}
The problem is that pow returns a double, which loses precision at the low end. Converting to long int for the % operator always returns the same result, and so your RNG outputs constant -60614748.
x = time(0) 1354284781
pow(x, 2) 1.83409e+18 0x1.973fdc9dc7787p+60
(long int) pow(x, 2) -2147483648 0x80000000
(long int) pow(x, 2) % M -60614748
The fix is to change x = (long int) (pow(x, 2)) % M; to x = x * x % M, performing all arithmetic within long int. Note that this is still strictly speaking incorrect, as signed overflow is undefined; more correct is to use unsigned long.
The truncation to long in BBS() causes the same "random" number to be generated.
PS. The return from the pow function is a number, which is too big to be represented in your machine's long type. When doing the conversion to long this results in undefined behaviour. One particular effect of the undefined behaviour might be the result of the conversion to be 0x80000000 or 0x7fffffff so you end up with a sequence of the same numbers.
x = time(0) 1354284781
pow(x, 2) 1.83409e+18 0x1.973fdc9dc7787p+60
A 32-bit int holds a value up to 2^31-1 the value of x^2 is greater than that.

self made pow() c++

I was reading through How can I write a power function myself? and the answer given by dan04 caught my attention mainly because I am not sure about the answer given by fortran, but I took that and implemented this:
#include <iostream>
using namespace std;
float pow(float base, float ex){
// power of 0
if (ex == 0){
return 1;
// negative exponenet
}else if( ex < 0){
return 1 / pow(base, -ex);
// even exponenet
}else if ((int)ex % 2 == 0){
float half_pow = pow(base, ex/2);
return half_pow * half_pow;
//integer exponenet
}else{
return base * pow(base, ex - 1);
}
}
int main(){
for (int ii = 0; ii< 10; ii++){\
cout << "pow(" << ii << ".5) = " << pow(ii, .5) << endl;
cout << "pow(" << ii << ",2) = " << pow(ii, 2) << endl;
cout << "pow(" << ii << ",3) = " << pow(ii, 3) << endl;
}
}
though I am not sure if I translated this right because all of the calls giving .5 as the exponent return 0. In the answer it states that it might need a log2(x) based on a^b = 2^(b * log2(a)), but I am unsure about putting that in as I am unsure where to put it, or if I am even thinking about this right.
NOTE: I know that this might be defined in a math library, but I don't need all the added expense of an entire math library for a few functions.
EDIT: does anyone know a floating-point implementation for fractional exponents? (I have seen a double implementation, but that was using a trick with registers, and I need floating-point, and adding a library just to do a trick I would be better off just including the math library)
I have looked at this paper here which describes how to approximate the exponential function for double precision. After a little research on Wikipedia about single precision floating point representation I have worked out the equivalent algorithms. They only implemented the exp function, so I found an inverse function for the log and then simply did
POW(a, b) = EXP(LOG(a) * b).
compiling this gcc4.6.2 yields a pow function almost 4 times faster than the standard library's implementation (compiling with O2).
Note: the code for EXP is copied almost verbatim from the paper I read and the LOG function is copied from here.
Here is the relevant code:
#define EXP_A 184
#define EXP_C 16249
float EXP(float y)
{
union
{
float d;
struct
{
#ifdef LITTLE_ENDIAN
short j, i;
#else
short i, j;
#endif
} n;
} eco;
eco.n.i = EXP_A*(y) + (EXP_C);
eco.n.j = 0;
return eco.d;
}
float LOG(float y)
{
int * nTemp = (int*)&y;
y = (*nTemp) >> 16;
return (y - EXP_C) / EXP_A;
}
float POW(float b, float p)
{
return EXP(LOG(b) * p);
}
There is still some optimization you can do here, or perhaps that is good enough.
This is a rough approximation but if you would have been satisfied with the errors introduced using the double representation, I imagine this will be satisfactory.
I think the algorithm you're looking for could be 'nth root'. With an initial guess of 1 (for k == 0):
#include <iostream>
using namespace std;
float pow(float base, float ex);
float nth_root(float A, int n) {
const int K = 6;
float x[K] = {1};
for (int k = 0; k < K - 1; k++)
x[k + 1] = (1.0 / n) * ((n - 1) * x[k] + A / pow(x[k], n - 1));
return x[K-1];
}
float pow(float base, float ex){
if (base == 0)
return 0;
// power of 0
if (ex == 0){
return 1;
// negative exponenet
}else if( ex < 0){
return 1 / pow(base, -ex);
// fractional exponent
}else if (ex > 0 && ex < 1){
return nth_root(base, 1/ex);
}else if ((int)ex % 2 == 0){
float half_pow = pow(base, ex/2);
return half_pow * half_pow;
//integer exponenet
}else{
return base * pow(base, ex - 1);
}
}
int main_pow(int, char **){
for (int ii = 0; ii< 10; ii++){\
cout << "pow(" << ii << ", .5) = " << pow(ii, .5) << endl;
cout << "pow(" << ii << ", 2) = " << pow(ii, 2) << endl;
cout << "pow(" << ii << ", 3) = " << pow(ii, 3) << endl;
}
return 0;
}
test:
pow(0, .5) = 0.03125
pow(0, 2) = 0
pow(0, 3) = 0
pow(1, .5) = 1
pow(1, 2) = 1
pow(1, 3) = 1
pow(2, .5) = 1.41421
pow(2, 2) = 4
pow(2, 3) = 8
pow(3, .5) = 1.73205
pow(3, 2) = 9
pow(3, 3) = 27
pow(4, .5) = 2
pow(4, 2) = 16
pow(4, 3) = 64
pow(5, .5) = 2.23607
pow(5, 2) = 25
pow(5, 3) = 125
pow(6, .5) = 2.44949
pow(6, 2) = 36
pow(6, 3) = 216
pow(7, .5) = 2.64575
pow(7, 2) = 49
pow(7, 3) = 343
pow(8, .5) = 2.82843
pow(8, 2) = 64
pow(8, 3) = 512
pow(9, .5) = 3
pow(9, 2) = 81
pow(9, 3) = 729
I think that you could try to solve it by using the Taylor's series,
check this.
http://en.wikipedia.org/wiki/Taylor_series
With the Taylor's series you can solve any difficult to solve calculation such as 3^3.8 by using the already known results such as 3^4. In this case you have
3^4 = 81 so
3^3.8 = 81 + 3.8*3( 3.8 - 4) +..+.. and so on depend on how big is your n you will get the closer solution of your problem.
I and my friend faced similar problem while we're on an OpenGL project and math.h didn't suffice in some cases. Our instructor also had the same problem and he told us to seperate power to integer and floating parts. For example, if you are to calculate x^11.5 you may calculate sqrt(x^115, 10) which may result more accurate result.
Reworked on #capellic answer, so that nth_root works with bigger values as well.
Without the limitation of an array that is allocated for no reason:
#include <iostream>
float pow(float base, float ex);
inline float fabs(float a) {
return a > 0 ? a : -a;
}
float nth_root(float A, int n, unsigned max_iterations = 500, float epsilon = std::numeric_limits<float>::epsilon()) {
if (n < 0)
throw "Invalid value";
if (n == 1 || A == 0)
return A;
float old_value = 1;
float value;
for (int k = 0; k < max_iterations; k++) {
value = (1.0 / n) * ((n - 1) * old_value + A / pow(old_value, n - 1));
if (fabs(old_value - value) < epsilon)
return value;
old_value = value;
}
return value;
}
float pow(float base, float ex) {
if (base == 0)
return 0;
if (ex == 0){
// power of 0
return 1;
} else if( ex < 0) {
// negative exponent
return 1 / pow(base, -ex);
} else if (ex > 0 && ex < 1) {
// fractional exponent
return nth_root(base, 1/ex);
} else if ((int)ex % 2 == 0) {
// even exponent
float half_pow = pow(base, ex/2);
return half_pow * half_pow;
} else {
// integer exponent
return base * pow(base, ex - 1);
}
}
int main () {
for (int i = 0; i <= 128; i++) {
std::cout << "pow(" << i << ", .5) = " << pow(i, .5) << std::endl;
std::cout << "pow(" << i << ", .3) = " << pow(i, .3) << std::endl;
std::cout << "pow(" << i << ", 2) = " << pow(i, 2) << std::endl;
std::cout << "pow(" << i << ", 3) = " << pow(i, 3) << std::endl;
}
std::cout << "pow(" << 74088 << ", .3) = " << pow(74088, .3) << std::endl;
return 0;
}
This solution of MINE will be accepted upto O(n) time complexity
utpo input less then 2^30 or 10^8
IT will not accept more then these inputs
It WILL GIVE TIME LIMIT EXCEED warning
but easy understandable solution
#include<bits/stdc++.h>
using namespace std;
double recursive(double x,int n)
{
// static is important here
// other wise it will store same values while multiplying
double p = x;
double ans;
// as we multiple p it will multiply it with q which has the
//previous value of this ans latter we will update the q
// so that q has fresh value for further test cases here
static double q=1; // important
if(n==0){ ans = q; q=1; return ans;}
if(n>0)
{
p *= q;
// stored value got multiply by p
q=p;
// and again updated to q
p=x;
//to update the value to the same value of that number
// cout<<q<<" ";
recursive(p,n-1);
}
return ans;
}
class Solution {
public:
double myPow(double x, int n) {
// double q=x;double N=n;
// return pow(q,N);
// when both sides are double this function works
if(n==0)return 1;
x = recursive(x,abs(n));
if(n<0) return double(1/x);
// else
return x;
}
};
For More help you may try
LEETCODE QUESTION NUMBER 50
**NOW the Second most optimize code pow(x,n) **
logic is that we have to solve it in O(logN) so we devide the n by 2
when we have even power n=4 , 4/2 is 2 means we have to just square it (22)(22)
but when we have odd value of power like n=5, 5/2 here we have square it to get
also the the number itself to it like (22)(2*2)*2 to get 2^5 = 32
HOPE YOU UNDERSTAND FOR MORE YOU CAN VISIT
POW(x,n) question on leetcode
below the optimized code and above code was for O(n) only
*
#include<bits/stdc++.h>
using namespace std;
double recursive(double x,int n)
{
// recursive calls will return the whole value of the program at every calls
if(n==0){return 1;}
// 1 is multiplied when the last value we get as we don't have to multiply further
double store;
store = recursive(x,n/2);
// call the function after the base condtion you have given to it here
if(n%2==0)return store*store;
else
{
return store*store*x;
// odd power we have the perfect square multiply the value;
}
}
// main function or the function for indirect call to recursive function
double myPow(double x, int n) {
if(n==0)return 1;
x = recursive(x,abs(n));
// for negatives powers
if(n<0) return double(1/x);
// else for positves
return x;
}