Efficient way of checking the length of a double in C++ - c++

Say I have a number, 100000, I can use some simple maths to check its size, i.e. log(100000) -> 5 (base 10 logarithm). Theres also another way of doing this, which is quite slow. std::string num = std::to_string(100000), num.size(). Is there an way to mathematically determine the length of a number? (not just 100000, but for things like 2313455, 123876132.. etc)

Why not use ceil? It rounds up to the nearest whole number - you can just wrap that around your log function, and add a check afterwards to catch the fact that a power of 10 would return 1 less than expected.

Here is a solution to the problem using single precision floating point numbers in O(1):
#include <cstdio>
#include <iostream>
#include <cstring>
int main(){
float x = 500; // to be converted
uint32_t f;
std::memcpy(&f, &x, sizeof(uint32_t)); // Convert float into a manageable int
uint8_t exp = (f & (0b11111111 << 23)) >> 23; // get the exponent
exp -= 127; // floating point bias
exp /= 3.32; // This will round but for this case it should be fine (ln2(10))
std::cout << std::to_string(exp) << std::endl;
}
For a number in scientific notation a*10^e this will return e (when 1<=a<10), so the length of the number (if it has an absolute value larger than 1), will be exp + 1.
For double precision this works, but you have to adapt it (bias is 1023 I think, and bit alignment is different. Check this)
This only works for floating point numbers, though so probably not very useful in this case. The efficiency in this case relative to the logarithm will also be determined by the speed at which int -> float conversion can occur.
Edit:
I just realised the question was about double. The modified result is:
int16_t getLength(double a){
uint64_t bits;
std::memcpy(&bits, &a, sizeof(uint64_t));
int16_t exp = (f >> 52) & 0b11111111111; // There is no 11 bit long int so this has to do
exp -= 1023;
exp /= 3.32;
return exp + 1;
}
There are some changes so that it behaves better (and also less shifting).
You can also use frexp() to get the exponent without bias.

If the number is whole, keep dividing by 10, until you're at 0. You'd have to divide 100000 6 times, for example. For the fractional part, you need to keep multiplying by 10 until trunc(f) == f.

Related

round double with N significant decimal digits in an overflow-safe manner

I want a overflow-safe function that round a double like std::round in addition it can handle the number of significant decimal digts.
f.e.
round(-17.747, 2) -> -17.75
round(-9.97729, 2) -> -9.98
round(-5.62448, 2) -> -5.62
round(std::numeric_limits<double>::max(), 10) ...
My first attempt was
double round(double value, int precision)
{
double factor=pow(10.0, precision);
return floor(value*factor+0.5)/factor;
}
but this can easily overflow.
Assuming IEEE, it is possible to decrease the possibility of overflows, like this.
double round(double value, int precision)
{
// assuming IEEE 754 with 64 bit representation
// the number of significant digits varies between 15 and 17
precision=std::min(17, precision);
double factor=pow(10.0, precision);
return floor(value*factor+0.5)/factor;
}
But this still can overflow.
Even this performance disaster does not work.
double round(double value, int precision)
{
std::stringstream ss;
ss << std::setprecision(precision) << value;
std::string::size_type sz;
return std::stod(ss.str(), &sz);
}
round(std::numeric_limits<double>::max(), 2.0) // throws std::out_of_range
Note:
I'm aware of setprecision, but i need rounding not only for displaying purpose. So that is not a solution.
Unlike this post here How to round a number to n decimal places in Java , my question is especially on overflow safety and in C++ (the anwser in the topic above are Java-specific or do not handle overflows)
I haven't heavily tested this code:
/* expects x in (-1, 1) */
double round_precision2(double x, int precision2) {
double iptr, factor = std::exp2(precision2);
double y = (x < 0) ? -x : x;
std::modf(y * factor + .5, &iptr);
return iptr/factor * ((x < 0) ? -1 : 1);
}
double round_precision(double x, int precision) {
int bits = precision * M_LN10 / M_LN2;
/* std::log2(std::pow(10., precision)); */
double iptr, frac = std::modf(x, &iptr);
return iptr + round_precision2(frac, bits);
}
The idea is to avoid overflow by only operating on the fractional part of the number.
We compute the number of binary bits to achieve the desired precision. You should be able to put a bound on them with the limits you describe in your question.
Next, we extract the fractional and integer parts of the number.
Then we add the integer part back to the rounded fractional part.
To compute the rounded fractional part, we compute the binary factor. Then we extract the integer part of the rounded number resulting from multiplying fractional part by the factor. Then we return the fraction by dividing the integral part by the factor.

Rounding error detection

I have two integers n and d. These can be exactly represented by double dn(n) and double dd(d). Is there a reliable way in C++ to check if
double result = dn/dd
contains a rounding error? If it was just an integer-division checking if (n/d) * d==n would work but doing that with double precision arithmetic could hide rounding errors.
Edit: Shortly after posting this it struck me that changing the rounding mode to round_down would make the (n/d)*d==n test work for double. But if there is a simpler solution, I'd still like to hear it.
If a hardware FMA is available, then, in most cases (cases where n is expected not to be small, per below), the fastest test may be:
#include <cmath>
…
double q = dn/dd;
if (std::fma(-q, dd, dn))
std::cout << "Quotient was not exact.\n";
This can fail if nd−q•dd is so small it is rounded to zero, which occurs in round-to-nearest-ties-to-even mode if its magnitude is smaller than half the smallest representable positive value (commonly 2−1074). That can happen only if dn itself is small. I expect I could calculate some bound on dn for that if desired, and, given that dn = n and n is an integer, that should not occur.
Ignoring the exponent bounds, a way to test the significands for divisibility is:
#include <cfloat>
#include <cmath>
…
int sink; // Needed for frexp argument but will be ignored.
double fn = std::ldexp(std::frexp(n, &sink), DBL_MANT_DIG);
double fd = std::frexp(d, &sink);
if (std::fmod(fn, fd))
std::cout << "Quotient will not be exact.\n";
Given that n and d are integers that are exactly representable in the floating-point type, I think we could show their exponents cannot be such that the above test would fail. There are cases where n is a small integer and d is large (a value from 21023 to 21024−2972, inclusive) that I need to think about.
If you ignore overflow and underflow (which you should be able to do unless the integer types representing d and n are very wide), then the (binary) floating-point division dn/dd is exact iff d is a divisor of n times a power of two.
An algorithm to check for this may look like:
assert(d != 0);
while (d & 1 == 0) d >>= 1; // extract largest odd divisor of d
int exact = n % d == 0;
This is cheaper than changing the FPU rounding mode if you want the rounding mode to be “to nearest” the rest of the time, and there probably exist bit-twiddling tricks that can speed up the extraction of the largest odd divisor of d.
Is there a reliable way in C++ to check if double result = dn/dd contains a rounding error?
Should your system allow access to the various FP flags, test for FE_INEXACT after the division.
If FP code is expensive, than at least this code can be used to check integer only solutions.
A C solution follow, (I do not have access to a compliant C++ compiler to test right now)
#include <fenv.h>
// Return 0: no rounding error
// Return 1: rounding error
// Return -1: uncertain
#pragma STDC FENV_ACCESS ON
int Rounding_error_detection(int n, int d) {
double dn = n;
double dd = d;
if (feclearexcept(FE_INEXACT)) return -1;
volatile double result = dn/dd;
(void) result;
int set_excepts = fetestexcept(FE_INEXACT);
return set_excepts != 0;
}
Test code
void Rounding_error_detection_Test(int n, int d) {
printf("Rounding_error_detection(%d, %d) --> %d\n",
n, d, Rounding_error_detection(n,d));
}
int main(void) {
Rounding_error_detection_Test(3, 6);
Rounding_error_detection_Test(3, 7);
}
Output
Rounding_error_detection(3, 6) --> 0
Rounding_error_detection(3, 7) --> 1
If the quotient q=dn/dd is exact, it will divide dn exactly dd times.
Since you have dd being integer, you could test exactness with integer division.
Instead of testing the quotient multiplied by dd with (dn/dd)*dd==dn where round off errors can compensate, you should rather test the remainder.
Indeed std:remainder is always exact:
if(std:remainder(dn,dn/dd)!=0)
std::cout << "Quotient was not exact." << std::endl;

Multiplication between big integers and doubles

I am managing some big (128~256bits) integers with gmp. It has come a point were I would like to multiply them for a double close to 1 (0.1 < double < 10), the result being still an approximated integer. A good example of the operation I need to do is the following:
int i = 1000000000000000000 * 1.23456789
I searched in the gmp documentation but I didn't find a function for this, so I ended up writing this code which seems to work well:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d, int prec=10) {
if (prec > 15) prec=15; //avoids overflows
uint_fast64_t m = (uint_fast64_t) floor(d);
r = i * m;
uint_fast64_t pos=1;
for (uint_fast8_t j=0; j<prec; j++) {
const double posd = (double) pos;
m = ((uint_fast64_t) floor(d * posd * 10.)) -
((uint_fast64_t) floor(d * posd)) * 10;
pos*=10;
r += (i * m) /pos;
}
}
Can you please tell me what do you think? Do you have any suggestion to make it more robust or faster?
this is what you wanted:
// BYTE lint[_N] ... lint[0]=MSB, lint[_N-1]=LSB
void mul(BYTE *c,BYTE *a,double b) // c[_N]=a[_N]*b
{
int i; DWORD cc;
double q[_N+1],aa,bb;
for (q[0]=0.0,i=0;i<_N;) // mul,carry down
{
bb=double(a[i])*b; aa=floor(bb); bb-=aa;
q[i]+=aa; i++;
q[i]=bb*256.0;
}
cc=0; if (q[_N]>127.0) cc=1.0; // round
for (i=_N-1;i>=0;i--) // carry up
{
double aa,bb;
cc+=q[i];
c[i]=cc&255;
cc>>=8;
}
}
_N is number of bits/8 per large int, large int is array of _N BYTEs where first byte is MSB (most significant BYTE) and last BYTE is LSB (least significant BYTE)
function is not handling signum, but it is only one if and some xor/inc to add.
trouble is that double has low precision even for your number 1.23456789 !!! due to precision loss the result is not exact what it should be (1234387129122386944 instead of 1234567890000000000) I think my code is mutch quicker and even more precise than yours because i do not need to mul/mod/div numbers by 10, instead i use bit shifting where is possible and not by 10-digit but by 256-digit (8bit). if you need more precision than use long arithmetic. you can speed up this code by using larger digits (16,32, ... bit)
My long arithmetics for precise astro computations are usually fixed point 256.256 bits numbers consist of 2*8 DWORDs + signum, but of course is much slower and some goniometric functions are realy tricky to implement, but if you want just basic functions than code your own lon arithmetics is not that hard.
also if you want to have numbers often in readable form is good to compromise between speed/size and consider not to use binary coded numbers but BCD coded numbers
I am not so familiar with either C++ or GMP what I could suggest source code without syntax errors, but what you are doing is more complicated than it should and can introduce unnecessary approximation.
Instead, I suggest you write function mpz_mult_d() like this:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d) {
d = ldexp(d, 52); /* exact, no overflow because 1 <= d <= 10 */
unsigned long long l = d; /* exact because d is an integer */
p = l * i; /* exact, in GMP */
(quotient, remainder) = p / 2^52; /* in GMP */
And now the next step depends on the kind of rounding you wish. If you wish the multiplication of d by i to give a result rounded toward -inf, just return quotient as result of the function. If you wish a result rounded to the nearest integer, you must look at remainder:
assert(0 <= remainder); /* proper Euclidean division */
assert(remainder < 2^52);
if (remainder < 2^51) return quotient;
if (remainder > 2^51) return quotient + 1; /* in GMP */
if (remainder == 2^51) return quotient + (quotient & 1); /* in GMP, round to “even” */
PS: I found your question by random browsing but if you had tagged it “floating-point”, people more competent than me could have answered it quickly.
Try this strategy:
Convert integer value to big float
Convert double value to big float
Make product
Convert result to integer
mpf_set_z(...)
mpf_set_d(...)
mpf_mul(...)
mpz_set_f(...)

How to write an std::floor function from scratch [duplicate]

This question already has answers here:
Write your own implementation of math's floor function, C
(5 answers)
Closed 1 year ago.
I would like to know how to write my own floor function to round a float down.
Is it possible to do this by setting the bits of a float that represent the numbers after the comma to 0?
If yes, then how can I access and modify those bits?
Thanks.
You can do bit twiddling on floating point numbers, but getting it right depends on knowing exactly what the floating point binary representation is. For most machines these days its IEEE-754, which is reasonably straight-forward. For example IEEE-754 32-bit floats have 1 sign bit, 8 exponent bits, and 23 mantissa bits, so you can use shifts and masks to extract those fields and do things with them. So doing trunc (round to integer towards 0) is pretty easy:
float trunc(float x) {
union {
float f;
uint32_t i;
} val;
val.f = x;
int exponent = (val.i >> 23) & 0xff; // extract the exponent field;
int fractional_bits = 127 + 23 - exponent;
if (fractional_bits > 23) // abs(x) < 1.0
return 0.0;
if (fractional_bits > 0)
val.i &= ~((1U << fractional_bits) - 1);
return val.f;
}
First, we extract the exponent field, and use that to calculate how many bits after the
decimal point are present in the number. If there are more than the size of the mantissa, then we just return 0. Otherwise, if there's at least 1, we mask off (clear) that many low bits. Pretty simple. We're ignoring denormal, NaN, and infinity her, but that works out ok, as they have exponents of all 0s or all 1s, which means we end up converting denorms to 0 (they get caught in the first if, along with small normal numbers), and leaving NaN/Inf unchanged.
To do a floor, you'd also need to look at the sign, and rounds negative numbers 'up' towards negative infinity.
Note that this is almost certainly slower than using dedicated floating point intructions, so this sort of thing is really only useful if you need to use floating point numbers on hardware that has no native floating point support. Or if you just want to play around and learn how these things work at a low level.
Define from scratch. And no, setting the bits of your floating point number representing the numbers after the comma to 0 will not work. If you look at IEEE-754, you will see that you basically have all your floating-point numbers in the form:
0.xyzxyzxyz 2^(abc)
So to implement flooring, you can get the xyzxyzxyz and shift left by abc+1 times. Drop the rest. I suggest you read up on the binary representation of a floating point number (link above), this should shed light on the solution I suggested.
NOTE: You also need to take care of the sign bit. And the mantissa of your number is off by 127.
Here is an example, Let's say you have the number pi: 3.14..., you want to get 3.
Pi is represented in binary as
0 10000000 10010010000111111011011
This translate to
sign = 0 ; e = 1 ; s = 110010010000111111011011
The above I get directly from Wikipedia. Since e is 1. You will want to shift left s by 1 + 1 = 2, so you get 11 => 3.
#include <iostream>
#include <iomanip>
double round(double input, double roundto) {
return int(input / roundto) * roundto;
}
int main() {
double pi = 3.1415926353898;
double almostpi = round(pi, 0.0001);
std::cout << std::setprecision(14) << pi << '\n' << std::setprecision(14) << almostpi;
}
http://ideone.com/mdqFA
output:
3.1415926353898
3.1415
This will pretty much be faster than any bit twiddling you can come up with. And it works on all computers (with floats) instead of just one type.
Casting to unsigned while returning as a double does what you are seeking, but under the hood. This simple piece of code works for any POSITIVE number.
#include <iostream>
double floor(const double& num) {
return (unsigned long long) num;
}
This has been tested on tio.run (Try It Online) and onlinegdb.com. The function itself doesn't require any #include files, but to print out the answers, I have included stdio.h (in the tio.run and onlinegdb.com, not here). Here it is:
long double myFloor(long double x) /* Change this to your liking: long double might
be float in your situation. */
{
long double xcopy=x<0?x*-1:x;
unsigned int zeros=0;
long double n=1;
for(n=1;xcopy>n*10;n*=10,++zeros);
for(xcopy-=n;zeros!=-1;xcopy-=n)
if(xcopy<0)
{
xcopy+=n;
n/=10;
--zeros;
}
xcopy+=n;
return x<0?(xcopy==0?x:x-(1-xcopy)):(x-xcopy);
}
This function works everywhere (pretty sure) because it just removes all of the non-decimal parts instead of trying to work with the parts of floats.
The floor of a floating point number is the biggest integer less than or equal to it. Here are a some examples:
floor(5.7) = 5
floor(3) = 3
floor(9.9) = 9
floor(7.0) = 7
floor(-7.9) = -8
floor(-5.0) = -5
floor(-3.3) = -3
floor(0) = 0
floor(-0.0) = -0
floor(-0) = -0
Note: this is almost an exact copy from my other answer which answered a question that was basically the same as this one.

Generating random floating-point values based on random bit stream

Given a random source (a generator of random bit stream), how do I generate a uniformly distributed random floating-point value in a given range?
Assume that my random source looks something like:
unsigned int GetRandomBits(char* pBuf, int nLen);
And I want to implement
double GetRandomVal(double fMin, double fMax);
Notes:
I don't want the result precision to be limited (for example only 5 digits).
Strict uniform distribution is a must
I'm not asking for a reference to an existing library. I want to know how to implement it from scratch.
For pseudo-code / code, C++ would be most appreciated
I don't think I'll ever be convinced that you actually need this, but it was fun to write.
#include <stdint.h>
#include <cmath>
#include <cstdio>
FILE* devurandom;
bool geometric(int x) {
// returns true with probability min(2^-x, 1)
if (x <= 0) return true;
while (1) {
uint8_t r;
fread(&r, sizeof r, 1, devurandom);
if (x < 8) {
return (r & ((1 << x) - 1)) == 0;
} else if (r != 0) {
return false;
}
x -= 8;
}
}
double uniform(double a, double b) {
// requires IEEE doubles and 0.0 < a < b < inf and a normal
// implicitly computes a uniform random real y in [a, b)
// and returns the greatest double x such that x <= y
union {
double f;
uint64_t u;
} convert;
convert.f = a;
uint64_t a_bits = convert.u;
convert.f = b;
uint64_t b_bits = convert.u;
uint64_t mask = b_bits - a_bits;
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask |= mask >> 32;
int b_exp;
frexp(b, &b_exp);
while (1) {
// sample uniform x_bits in [a_bits, b_bits)
uint64_t x_bits;
fread(&x_bits, sizeof x_bits, 1, devurandom);
x_bits &= mask;
x_bits += a_bits;
if (x_bits >= b_bits) continue;
double x;
convert.u = x_bits;
x = convert.f;
// accept x with probability proportional to 2^x_exp
int x_exp;
frexp(x, &x_exp);
if (geometric(b_exp - x_exp)) return x;
}
}
int main() {
devurandom = fopen("/dev/urandom", "r");
for (int i = 0; i < 100000; ++i) {
printf("%.17g\n", uniform(1.0 - 1e-15, 1.0 + 1e-15));
}
}
Here is one way of doing it.
The IEEE Std 754 double format is as follows:
[s][ e ][ f ]
where s is the sign bit (1 bit), e is the biased exponent (11 bits) and f is the fraction (52 bits).
Beware that the layout in memory will be different on little-endian machines.
For 0 < e < 2047, the number represented is
(-1)**(s) * 2**(e – 1023) * (1.f)
By setting s to 0, e to 1023 and f to 52 random bits from your bit stream, you get a random double in the interval [1.0, 2.0). This interval is unique in that it contains 2 ** 52 doubles, and these doubles are equidistant. If you then subtract 1.0 from the constructed double, you get a random double in the interval [0.0, 1.0). Moreover, the property about being equidistant is preserve.
From there you should be able to scale and translate as needed.
I'm surprised that for question this old, nobody had actual code for the best answer. User515430's answer got it right--you can take advantage of IEEE-754 double format to directly put 52 bits into a double with no math at all. But he didn't give code. So here it is, from my public domain ojrandlib:
double ojr_next_double(ojr_generator *g) {
uint64_t r = (OJR_NEXT64(g) & 0xFFFFFFFFFFFFFull) | 0x3FF0000000000000ull;
return *(double *)(&r) - 1.0;
}
NEXT64() gets a 64-bit random number. If you have a more efficient way of getting only 52 bits, use that instead.
This is easy, as long as you have an integer type with as many bits of precision as a double. For instance, an IEEE double-precision number has 53 bits of precision, so a 64-bit integer type is enough:
#include <limits.h>
double GetRandomVal(double fMin, double fMax) {
unsigned long long n ;
GetRandomBits ((char*)&n, sizeof(n)) ;
return fMin + (n * (fMax - fMin))/ULLONG_MAX ;
}
This is probably not the answer you want, but the specification here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf
in sections [rand.util.canonical] and [rand.dist.uni.real], contains sufficient information to implement what you want, though with slightly different syntax. It isn't easy, but it is possible. I speak from personal experience. A year ago I knew nothing about random numbers, and I was able to do it. Though it took me a while... :-)
The question is ill-posed. What does uniform distribution over floats even mean?
Taking our cue from discrepancy, one way to operationalize your question is to define that you want the distribution that minimizes the following value:
Where x is the random variable you are sampling with your GetRandomVal(double fMin, double fMax) function, and means the probability that a random x is smaller or equal to t.
And now you can go on and try to evaluate eg a dabbler's answer. (Hint all the answers that fail to use the whole precision and stick to eg 52 bits will fail this minimization criterion.)
However, if you just want to be able to generate all float bit patterns that fall into your specified range with equal possibility, even if that means that eg asking for GetRandomVal(0,1000) will create more values between 0 and 1.5 than between 1.5 and 1000, that's easy: any interval of IEEE floating point numbers when interpreted as bit patterns map easily to a very small number of intervals of unsigned int64. See eg this question. Generating equally distributed random values of unsigned int64 in any given interval is easy.
I may be misunderstanding the question, but what stops you simply sampling the next n bits from the random bit stream and converting that to a base 10 number number ranged 0 to 2^n - 1.
To get a random value in [0..1[ you could do something like:
double value = 0;
for (int i=0;i<53;i++)
value = 0.5 * (value + random_bit()); // Insert 1 random bit
// or value = ldexp(value+random_bit(),-1);
// or group several bits into one single ldexp
return value;