How do I convert double to string using only math.h?

How do I convert double to string using only math.h? - c++

I am trying to convert a double to a string in a native NT application, i.e. an application that only depends on ntdll.dll. Unfortunately, ntdll's version of vsnprintf does not support %f et al., forcing me to implement the conversion on my own.
The aforementioned ntdll.dll exports only a few of the math.h functions (floor, ceil, log, pow, ...). However, I am reasonably sure that I can implement any of the unavailable math.h functions if necessary.
There is an implementation of floating point conversion in GNU's libc, but the code is extremely dense and difficult to comprehent (the GNU indentation style does not help here).
I've already implemented the conversion by normalizing the number (i.e. multiplying/dividing the number by 10 until it's in the interval [1, 10)) and then generating each digit by cutting the integral part off with modf and multiplying the fractional part by 10. This works, but there is a loss of precision (only the first 15 digits are correct). The loss of precision is, of course, inherent to the algorithm.
I'd settle with 17 digits, but an algorithm that would be able to generate an arbitrary number of digits correctly would be preferred.
Could you please suggest an algorithm or point me to a good resource?

Double-precision numbers do not have more than 15 significant (decimal) figures of precision. There is absolutely no way you can get "an arbitrary number of digits correctly"; doubles are not bignums.
Since you say you're happy with 17 significant figures, use long double; on Windows, I think, that will give you 19 significant figures.

I've thought about this a bit more. You lose precision because you normalize by multiplying by some power of 10 (you chose [1,10) rather than [0,1), but that's a minor detail). If you did so with a power of 2, you'd lose no precision, but then you'd get "decimal digits"*2^e; you could implement bcd arithmetic and compute the product yourself, but that doesn't sound like fun.
I'm pretty confident that you could split the double g=m*2^e into two parts: h=floor(g*10^k) and i=modf(g*10^k) for some k, and then separately convert to decimal digits and then stitch them together, but how about a simpler approach: use "long double" (80 bits, but I've heard that Visual C++ may not support it?) with your current approach and stop after 17 digits.
_gcvt should do it (edit - it's not in ntdll.dll, it's in some msvcrt*.dll?)
As for decimal digits of precision, IEEE binary64 has 52 binary digits. 52*log10(2)=15.65... (edit: as you pointed out, to round trip, you need more than 16 digits)

After a lot of research, I found a paper titled Printing Floating-Point Numbers Quickly and Accurately. It uses exact rational arithmetic to avoid precision loss. It cites a little older paper: How to Print Floating-Point Numbers Accurately, which however seems to require ACM subscription to access.
Since the former paper was reprinted in 2006, I am inclined to believe that it is still current. The exact rational arithmetic (which requires dynamic allocation) seems to be a necessary evil.

A complete implementation of the C code for the fastest known (as of today) algorithm:
http://code.google.com/p/double-conversion/downloads/list
It even includes a test suite.
This is the C code behind the algorithm described in this PDF:
Printing Floating-Point Numbers Quickly and Accurately
http://www.cs.indiana.edu/~burger/FP-Printing-PLDI96.pdf

#include <cstdint>
// --------------------------------------------------------------------------
// Return number of decimal-digits of a given unsigned-integer
// N is unit8_t/uint16_t/uint32_t/uint64_t
template <class N> inline uint8_t GetUnsignedDecDigits(const N n)
{
static_assert(std::numeric_limits<N>::is_integer && !std::numeric_limits<N>::is_signed,
"GetUnsignedDecDigits: unsigned integer type expected" );
const uint8_t anMaxDigits[]= {3, 5, 8, 10, 13, 15, 17, 20};
const uint8_t nMaxDigits = anMaxDigits[sizeof(N)-1];
uint8_t nDigits= 1;
N nRoof = 10;
while ((n >= nRoof) && (nDigits<nMaxDigits))
{
nDigits++;
nRoof*= 10;
}
return nDigits;
}
// --------------------------------------------------------------------------
// Convert floating-point value to NULL-terminated string represention
TCHAR* DoubleToStr(double f , // [i ]
TCHAR* pczStr , // [i/o] caller should allocate enough space
int nDigitsI, // [i ] digits of integer part including sign / <1: auto
int nDigitsF ) // [i ] digits of fractional part / <0: auto
{
switch (_fpclass(f))
{
case _FPCLASS_SNAN:
case _FPCLASS_QNAN: _tcscpy_s(pczStr, 5, _T("NaN" )); return pczStr;
case _FPCLASS_NINF: _tcscpy_s(pczStr, 5, _T("-INF")); return pczStr;
case _FPCLASS_PINF: _tcscpy_s(pczStr, 5, _T("+INF")); return pczStr;
}
if (nDigitsI> 18) nDigitsI= 18; if (nDigitsI< 1) nDigitsI= -1;
if (nDigitsF> 18) nDigitsF= 18; if (nDigitsF< 0) nDigitsF= -1;
bool bNeg= (f<0);
if (f<0)
f= -f;
int nE= 0; // exponent (displayed if != 0)
if ( ((-1 == nDigitsI) && (f >= 1e18 )) || // large value: switch to scientific representation
((-1 != nDigitsI) && (f >= pow(10., nDigitsI))) )
{
nE= (int)log10(f);
f/= (double)pow(10., nE);
if (-1 != nDigitsF)
nDigitsF= __max(nDigitsF, nDigitsI+nDigitsF-(bNeg?2:1)-4);
nDigitsI= (bNeg?2:1);
}
else if (f>0)
if ((-1 == nDigitsF) && (f <= 1e-10)) // small value: switch to scientific representation
{
nE= (int)log10(f)-1;
f/= (double)pow(10., nE);
if (-1 != nDigitsF)
nDigitsF= __max(nDigitsF, nDigitsI+nDigitsF-(bNeg?2:1)-4);
nDigitsI= (bNeg?2:1);
}
double fI;
double fF= modf(f, &fI); // fI: integer part, fF: fractional part
if (-1 == nDigitsF) // figure out number of meaningfull digits in fF
{
double fG, fGI, fGF;
do
{
nDigitsF++;
fG = fF*pow(10., nDigitsF);
fGF= modf(fG, &fGI);
}
while (fGF > 1e-10);
}
const double afPower10[20]= {1e0 , 1e1 , 1e2 , 1e3 , 1e4 , 1e5 , 1e6 , 1e7 , 1e8 , 1e9 ,
1e10, 1e11, 1e12, 1e13, 1e14, 1e15, 1e16, 1e17, 1e18, 1e19 };
uint64_t uI= (uint64_t)round(fI );
uint64_t uF= (uint64_t)round(fF*afPower10[nDigitsF]);
if (uF)
if (GetUnsignedDecDigits(uF) > nDigitsF) // X.99999 was rounded to X+1
{
uF= 0;
uI++;
if (nE)
{
uI/= 10;
nE++;
}
}
uint8_t nRealDigitsI= GetUnsignedDecDigits(uI);
if (bNeg)
nRealDigitsI++;
int nPads= 0;
if (-1 != nDigitsI)
{
nPads= nDigitsI-nRealDigitsI;
for (int i= nPads-1; i>=0; i--) // leading spaces
pczStr[i]= _T(' ');
}
if (bNeg) // minus sign
{
pczStr[nPads]= _T('-');
nRealDigitsI--;
nPads++;
}
for (int j= nRealDigitsI-1; j>=0; j--) // digits of integer part
{
pczStr[nPads+j]= (uint8_t)(uI%10) + _T('0');
uI /= 10;
}
nPads+= nRealDigitsI;
if (nDigitsF)
{
pczStr[nPads++]= _T('.'); // decimal point
for (int k= nDigitsF-1; k>=0; k--) // digits of fractional part
{
pczStr[nPads+k]= (uint8_t)(uF%10)+ _T('0');
uF /= 10;
}
}
nPads+= nDigitsF;
if (nE)
{
pczStr[nPads++]= _T('e'); // exponent sign
if (nE<0)
{
pczStr[nPads++]= _T('-');
nE= -nE;
}
else
pczStr[nPads++]= _T('+');
for (int l= 2; l>=0; l--) // digits of exponent
{
pczStr[nPads+l]= (uint8_t)(nE%10) + _T('0');
nE /= 10;
}
pczStr[nPads+3]= 0;
}
else
pczStr[nPads]= 0;
return pczStr;
}

Does vsnprintf supports I64?
double x = SOME_VAL; // allowed to be from -1.e18 to 1.e18
bool sign = (SOME_VAL < 0);
if ( sign ) x = -x;
__int64 i = static_cast<__int64>( x );
double xm = x - static_cast<double>( i );
__int64 w = static_cast<__int64>( xm*pow(10.0, DIGITS_VAL) ); // DIGITS_VAL indicates how many digits after the decimal point you want to get
char out[100];
vsnprintf( out, sizeof out, "%s%I64.%I64", (sign?"-":""), i, w );
Another option is to try to find implementation of gcvt.

Have you looked at the uClibc implementation of printf?

Related

Do multiples of Pi to the thousandths have a value that may change how a loop executes?

Recently I decided to get into c++, and after going through the basics I decided to build a calculator using only iostream (just to challenge myself). After most of it was complete, I came across an issue with my loop for exponents. Whenever a multiple of Pi was used as the exponent, it looped way too many times. I fixed it in a somewhat redundant way and now I'm hoping someone might be able to tell me what happened. My unfixed code snippet is below. Ignore everything above and just look at the last bit of fully functioning code. All I was wondering was why values of pi would throw off the loop so much. Thanks.
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
And then heres where it all goes wrong (Denominator is set to a value of 1)
if (TestForDecimal(Power) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(Power) == 1) {
Power = Power * 10;
Denominator = Denominator * 10;
}
}
If anyone could give me an explanation that would be great!
To clarify further, the while loop kept looping even after Power became a whole number (This only happened when Power was equal to a multiple of pi such as 3.1415 or 6.2830 etc.)
Heres a complete code you can try:
#include <iostream>
bool TestForDecimal(double Num) /* Checks if the number given is whole or not */ {
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) {
return 0;
}
else {
return 1;
}
}
void foo(double Power) {
double x = Power;
if (TestForDecimal(x) == 1) /* Checks if its decimal or not */ {
while (TestForDecimal(x) == 1) {
x = x * 10;
std::cout << x << std::endl;
}
}
}
int main() {
foo(3.145); // Substitute this with 3.1415 and it doesn't work (this was my problem)
system("Pause");
return 0;
}

What's wrong with doing something like this?
#include <cmath> // abs and round
#include <cfloat> // DBL_EPSILON
bool TestForDecimal(double Num) {
double diff = abs(round(Num) - Num);
// true if not a whole number
return diff > DBL_EPSILON;
}

The look is quite inefficient...what if Num is large...
A faster way could be something like
if (Num == static_cast<int>(Num))
or
if (Num == (int)Num)
if you prefer a C-style syntax.
Then a range check may be useful... it oes not make sense to ask if Num is an intger when is larger than 2^32 (about 4 billions)
Finally do not think od these numers as decimals. They are stored as binary numbers, instead of multiplying Power and Denominator by 2 you are better of multiplying them by 2.

Most decimal fractions can't be represented exactly in a binary floating-point format, so what you're trying to do can't work in general. For example, with a standard 64-bit double format, the closest representable value to 3.1415 is more like 3.1415000000000001812.
If you need to represent decimal fractions exactly, then you'll need a non-standard type. Boost.Multiprecision has some decimal types, and there's a proposal to add decimal types to the standard library; some implementations may have experimental support for this.

Beware. A double is (generally but I think you use a standard architecture) represented in IEE-754 format, that is mantissa * 2exponent. For a double, you have 53 bits for the mantissa part, one for the sign and 10 for the exponent. When you multiply it by 10 it will grow, and will get an integer value as soon as exponent will be greater than 53.
Unfortunately, unless you have a 64 bits system, an 53 bits integer cannot be represented as a 32 bits int, and your test will fail again.
So if you have a 32 bits system, you will never reach an integer value. You will more likely reach an infinity representation and stay there ...
The only use case where it could work, would be if you started with a number that can be represented with a small number of negative power of 2, for example 0.5 (1/2), 0.25(1/4), 0.75(1/2 + 1/4), giving almost all digits of mantissa part being 0.

After studying your "unfixed" function, from what I can tell, here's your basic algorithm:
double TestForDecimal(double Num) { ...
A function that accepts a double and returns a double. This would make sense if the returned value was the decimal value, but since that's not the case, perhaps you meant to use bool?
while (Num > 1) { make it less }
While there is nothing inherently wrong with this, it doesn't really address negative numbers with large magnitudes, so you'll run into problems there.
if (Num > -INT_MAX && Num < INT_MAX && Num == (int)Num) { return 0; }
This means that if Num is within the signed integer range and its integer typecast is equal to itself, return a 0 typecasted to a double. This means you don't care whether numbers outside the integer range are whole numbers or not. To fix this, change the condition to if (Num == (long)Num) since sizeof(long) == sizeof(double).
Perhaps the algorithm your function follows that I've just explained might shed some light on your problem.

Multiplication between big integers and doubles

I am managing some big (128~256bits) integers with gmp. It has come a point were I would like to multiply them for a double close to 1 (0.1 < double < 10), the result being still an approximated integer. A good example of the operation I need to do is the following:
int i = 1000000000000000000 * 1.23456789
I searched in the gmp documentation but I didn't find a function for this, so I ended up writing this code which seems to work well:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d, int prec=10) {
if (prec > 15) prec=15; //avoids overflows
uint_fast64_t m = (uint_fast64_t) floor(d);
r = i * m;
uint_fast64_t pos=1;
for (uint_fast8_t j=0; j<prec; j++) {
const double posd = (double) pos;
m = ((uint_fast64_t) floor(d * posd * 10.)) -
((uint_fast64_t) floor(d * posd)) * 10;
pos*=10;
r += (i * m) /pos;
}
}
Can you please tell me what do you think? Do you have any suggestion to make it more robust or faster?

this is what you wanted:
// BYTE lint[_N] ... lint[0]=MSB, lint[_N-1]=LSB
void mul(BYTE *c,BYTE *a,double b) // c[_N]=a[_N]*b
{
int i; DWORD cc;
double q[_N+1],aa,bb;
for (q[0]=0.0,i=0;i<_N;) // mul,carry down
{
bb=double(a[i])*b; aa=floor(bb); bb-=aa;
q[i]+=aa; i++;
q[i]=bb*256.0;
}
cc=0; if (q[_N]>127.0) cc=1.0; // round
for (i=_N-1;i>=0;i--) // carry up
{
double aa,bb;
cc+=q[i];
c[i]=cc&255;
cc>>=8;
}
}
_N is number of bits/8 per large int, large int is array of _N BYTEs where first byte is MSB (most significant BYTE) and last BYTE is LSB (least significant BYTE)
function is not handling signum, but it is only one if and some xor/inc to add.
trouble is that double has low precision even for your number 1.23456789 !!! due to precision loss the result is not exact what it should be (1234387129122386944 instead of 1234567890000000000) I think my code is mutch quicker and even more precise than yours because i do not need to mul/mod/div numbers by 10, instead i use bit shifting where is possible and not by 10-digit but by 256-digit (8bit). if you need more precision than use long arithmetic. you can speed up this code by using larger digits (16,32, ... bit)
My long arithmetics for precise astro computations are usually fixed point 256.256 bits numbers consist of 2*8 DWORDs + signum, but of course is much slower and some goniometric functions are realy tricky to implement, but if you want just basic functions than code your own lon arithmetics is not that hard.
also if you want to have numbers often in readable form is good to compromise between speed/size and consider not to use binary coded numbers but BCD coded numbers

I am not so familiar with either C++ or GMP what I could suggest source code without syntax errors, but what you are doing is more complicated than it should and can introduce unnecessary approximation.
Instead, I suggest you write function mpz_mult_d() like this:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d) {
d = ldexp(d, 52); /* exact, no overflow because 1 <= d <= 10 */
unsigned long long l = d; /* exact because d is an integer */
p = l * i; /* exact, in GMP */
(quotient, remainder) = p / 2^52; /* in GMP */
And now the next step depends on the kind of rounding you wish. If you wish the multiplication of d by i to give a result rounded toward -inf, just return quotient as result of the function. If you wish a result rounded to the nearest integer, you must look at remainder:
assert(0 <= remainder); /* proper Euclidean division */
assert(remainder < 2^52);
if (remainder < 2^51) return quotient;
if (remainder > 2^51) return quotient + 1; /* in GMP */
if (remainder == 2^51) return quotient + (quotient & 1); /* in GMP, round to “even” */
PS: I found your question by random browsing but if you had tagged it “floating-point”, people more competent than me could have answered it quickly.

Try this strategy:
Convert integer value to big float
Convert double value to big float
Make product
Convert result to integer
mpf_set_z(...)
mpf_set_d(...)
mpf_mul(...)
mpz_set_f(...)

Generating random floating-point values based on random bit stream

Given a random source (a generator of random bit stream), how do I generate a uniformly distributed random floating-point value in a given range?
Assume that my random source looks something like:
unsigned int GetRandomBits(char* pBuf, int nLen);
And I want to implement
double GetRandomVal(double fMin, double fMax);
Notes:
I don't want the result precision to be limited (for example only 5 digits).
Strict uniform distribution is a must
I'm not asking for a reference to an existing library. I want to know how to implement it from scratch.
For pseudo-code / code, C++ would be most appreciated

I don't think I'll ever be convinced that you actually need this, but it was fun to write.
#include <stdint.h>
#include <cmath>
#include <cstdio>
FILE* devurandom;
bool geometric(int x) {
// returns true with probability min(2^-x, 1)
if (x <= 0) return true;
while (1) {
uint8_t r;
fread(&r, sizeof r, 1, devurandom);
if (x < 8) {
return (r & ((1 << x) - 1)) == 0;
} else if (r != 0) {
return false;
}
x -= 8;
}
}
double uniform(double a, double b) {
// requires IEEE doubles and 0.0 < a < b < inf and a normal
// implicitly computes a uniform random real y in [a, b)
// and returns the greatest double x such that x <= y
union {
double f;
uint64_t u;
} convert;
convert.f = a;
uint64_t a_bits = convert.u;
convert.f = b;
uint64_t b_bits = convert.u;
uint64_t mask = b_bits - a_bits;
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask |= mask >> 32;
int b_exp;
frexp(b, &b_exp);
while (1) {
// sample uniform x_bits in [a_bits, b_bits)
uint64_t x_bits;
fread(&x_bits, sizeof x_bits, 1, devurandom);
x_bits &= mask;
x_bits += a_bits;
if (x_bits >= b_bits) continue;
double x;
convert.u = x_bits;
x = convert.f;
// accept x with probability proportional to 2^x_exp
int x_exp;
frexp(x, &x_exp);
if (geometric(b_exp - x_exp)) return x;
}
}
int main() {
devurandom = fopen("/dev/urandom", "r");
for (int i = 0; i < 100000; ++i) {
printf("%.17g\n", uniform(1.0 - 1e-15, 1.0 + 1e-15));
}
}

Here is one way of doing it.
The IEEE Std 754 double format is as follows:
[s][ e ][ f ]
where s is the sign bit (1 bit), e is the biased exponent (11 bits) and f is the fraction (52 bits).
Beware that the layout in memory will be different on little-endian machines.
For 0 < e < 2047, the number represented is
(-1)**(s) * 2**(e – 1023) * (1.f)
By setting s to 0, e to 1023 and f to 52 random bits from your bit stream, you get a random double in the interval [1.0, 2.0). This interval is unique in that it contains 2 ** 52 doubles, and these doubles are equidistant. If you then subtract 1.0 from the constructed double, you get a random double in the interval [0.0, 1.0). Moreover, the property about being equidistant is preserve.
From there you should be able to scale and translate as needed.

I'm surprised that for question this old, nobody had actual code for the best answer. User515430's answer got it right--you can take advantage of IEEE-754 double format to directly put 52 bits into a double with no math at all. But he didn't give code. So here it is, from my public domain ojrandlib:
double ojr_next_double(ojr_generator *g) {
uint64_t r = (OJR_NEXT64(g) & 0xFFFFFFFFFFFFFull) | 0x3FF0000000000000ull;
return *(double *)(&r) - 1.0;
}
NEXT64() gets a 64-bit random number. If you have a more efficient way of getting only 52 bits, use that instead.

This is easy, as long as you have an integer type with as many bits of precision as a double. For instance, an IEEE double-precision number has 53 bits of precision, so a 64-bit integer type is enough:
#include <limits.h>
double GetRandomVal(double fMin, double fMax) {
unsigned long long n ;
GetRandomBits ((char*)&n, sizeof(n)) ;
return fMin + (n * (fMax - fMin))/ULLONG_MAX ;
}

This is probably not the answer you want, but the specification here:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3225.pdf
in sections [rand.util.canonical] and [rand.dist.uni.real], contains sufficient information to implement what you want, though with slightly different syntax. It isn't easy, but it is possible. I speak from personal experience. A year ago I knew nothing about random numbers, and I was able to do it. Though it took me a while... :-)

The question is ill-posed. What does uniform distribution over floats even mean?
Taking our cue from discrepancy, one way to operationalize your question is to define that you want the distribution that minimizes the following value:
Where x is the random variable you are sampling with your GetRandomVal(double fMin, double fMax) function, and means the probability that a random x is smaller or equal to t.
And now you can go on and try to evaluate eg a dabbler's answer. (Hint all the answers that fail to use the whole precision and stick to eg 52 bits will fail this minimization criterion.)
However, if you just want to be able to generate all float bit patterns that fall into your specified range with equal possibility, even if that means that eg asking for GetRandomVal(0,1000) will create more values between 0 and 1.5 than between 1.5 and 1000, that's easy: any interval of IEEE floating point numbers when interpreted as bit patterns map easily to a very small number of intervals of unsigned int64. See eg this question. Generating equally distributed random values of unsigned int64 in any given interval is easy.

I may be misunderstanding the question, but what stops you simply sampling the next n bits from the random bit stream and converting that to a base 10 number number ranged 0 to 2^n - 1.

To get a random value in [0..1[ you could do something like:
double value = 0;
for (int i=0;i<53;i++)
value = 0.5 * (value + random_bit()); // Insert 1 random bit
// or value = ldexp(value+random_bit(),-1);
// or group several bits into one single ldexp
return value;

Can I rely on this to judge a square number in C++?

Can I rely on
sqrt((float)a)*sqrt((float)a)==a
or
(int)sqrt((float)a)*(int)sqrt((float)a)==a
to check whether a number is a perfect square? Why or why not?
int a is the number to be judged. I'm using Visual Studio 2005.
Edit: Thanks for all these rapid answers. I see that I can't rely on float type comparison. (If I wrote as above, will the last a be cast to float implicitly?) If I do it like
(int)sqrt((float)a)*(int)sqrt((float)a) - a < e
How small should I take that e value?
Edit2: Hey, why don't we leave the comparison part aside, and decide whether the (int) is necessary? As I see, with it, the difference might be great for squares; but without it, the difference might be small for non-squares. Perhaps neither will do. :-(

Actually, this is not a C++, but a math question.
With floating point numbers, you should never rely on equality. Where you would test a == b, just test against abs(a - b) < eps, where eps is a small number (e.g. 1E-6) that you would treat as a good enough approximation.
If the number you are testing is an integer, you might be interested in the Wikipedia article about Integer square root
EDIT:
As Krugar said, the article I linked does not answer anything. Sure, there is no direct answer to your question there, phoenie. I just thought that the underlying problem you have is floating point precision and maybe you wanted some math background to your problem.
For the impatient, there is a link in the article to a lengthy discussion about implementing isqrt. It boils down to the code karx11erx posted in his answer.
If you have integers which do not fit into an unsigned long, you can modify the algorithm yourself.

If you don't want to rely on float precision then you can use the following code that uses integer math.
The Isqrt is taken from here and is O(log n)
// Finds the integer square root of a positive number
static int Isqrt(int num)
{
if (0 == num) { return 0; } // Avoid zero divide
int n = (num / 2) + 1; // Initial estimate, never low
int n1 = (n + (num / n)) / 2;
while (n1 < n)
{
n = n1;
n1 = (n + (num / n)) / 2;
} // end while
return n;
} // end Isqrt()
static bool IsPerfectSquare(int num)
{
return Isqrt(num) * Isqrt(num) == num;
}

Not to do the same calculation twice I would do it with a temporary number:
int b = (int)sqrt((float)a);
if((b*b) == a)
{
//perfect square
}
edit:
dav made a good point. instead of relying on the cast you'll need to round off the float first
so it should be:
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a)
{
//perfect square
}

Your question has already been answered, but here is a working solution.
Your 'perfect squares' are implicitly integer values, so you could easily solve floating point format related accuracy problems by using some integer square root function to determine the integer square root of the value you want to test. That function will return the biggest number r for a value v where r * r <= v. Once you have r, you simply need to test whether r * r == v.
unsigned short isqrt (unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
for (int i = 16; i; i--) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
if (root < rem)
rem -= ++root;
}
return (unsigned short) (root >> 1);
}
bool PerfectSquare (unsigned long a)
{
unsigned short r = isqrt (a);
return r * r == a;
}

I didn't follow the formula, I apologize.
But you can easily check if a floating point number is an integer by casting it to an integer type and compare the result against the floating point number. So,
bool isSquare(long val) {
double root = sqrt(val);
if (root == (long) root)
return true;
else return false;
}
Naturally this is only doable if you are working with values that you know will fit within the integer type range. But being that the case, you can solve the problem this way, saving you the inherent complexity of a mathematical formula.

As reinier says, you need to add 0.5 to make sure it rounds to the nearest integer, so you get
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a) /* perfect square */
For this to work, b has to be (exactly) equal to the square root of a if a is a perfect square. However, I don't think you can guarantee this. Suppose that int is 64 bits and float is 32 bits (I think that's allowed). Then a can be of the order 2^60, so its square root is of order 2^30. However, a float only stores 24 bits in the significand, so the rounding error is of order 2^(30-24) = 2^6. This is larger to 1, so b may contain the wrong integer. For instance, I think that the above code does not identify a = (2^30+1)^2 as a perfect square.

I would do.
// sqrt always returns positive value. So casting to int is equivalent to floor()
int down = static_cast<int>(sqrt(value));
int up = down+1; // This is the ceil(sqrt(value))
// Because of rounding problems I would test the floor() and ceil()
// of the value returned from sqrt().
if (((down*down) == value) || ((up*up) == value))
{
// We have a winner.
}

The more obvious, if slower -- O(sqrt(n)) -- way:
bool is_perfect_square(int i) {
int d = 1;
for (int x = 0; x <= i; x += d, d += 2) {
if (x == i) return true;
}
return false;
}

While others have noted that you should not test for equality with floats, I think you are missing out on chances to take advantage of the properties of perfect squares. First there is no point in re-squaring the calculated root. If a is a perfect square then sqrt(a) is an integer and you should check:
b = sqrt((float)a)
b - floor(b) < e
where e is set sufficiently small. There are also a number of integers that you can cross of as non-square before taking the square root. Checking Wikipedia you can see some necessary conditions for a to be square:
A square number can only end with
digits 00,1,4,6,9, or 25 in base 10
Another simple check would be to see that a % 4 == 1 or 0 before taking the root since:
Squares of even numbers are even,
since (2n)^2 = 4n^2.
Squares of odd
numbers are odd, since (2n + 1)^2 =
4(n^2 + n) + 1.
These would essentially eliminate half of the integers before taking any roots.

The cleanest solution is to use an integer sqrt routine, then do:
bool isSquare( unsigned int a ) {
unsigned int s = isqrt( a );
return s * s == a;
}
This will work in the full int range and with perfect precision. A few cases:
a = 0, s = 0, s * s = 0 (add an exception if you don't want to treat 0 as square)
a = 1, s = 1, s * s = 1
a = 2, s = 1, s * s = 1
a = 3, s = 1, s * s = 1
a = 4, s = 2, s * s = 4
a = 5, s = 2, s * s = 4
Won't fail either as you approach the maximum value for your int size. E.g. for 32-bit ints:
a = 0x40000000, s = 0x00008000, s * s = 0x40000000
a = 0xFFFFFFFF, s = 0x0000FFFF, s * s = 0xFFFE0001
Using floats you run into a number of issues. You may find that sqrt( 4 ) = 1.999999..., and similar problems, although you can round-to-nearest instead of using floor().
Worse though, a float has only 24 significant bits which means you can't cast any int larger than 2^24-1 to a float without losing precision, which introduces false positives/negatives. Using doubles for testing 32-bit ints, you should be fine, though.
But remember to cast the result of the floating-point sqrt back to an int and compare the result to the original int. Comparisons between floats are never a good idea; even for square values of x in a limited range, there is no guarantee that sqrt( x ) * sqrt( x ) == x, or that sqrt( x * x) = x.

basics first:
if you (int) a number in a calculation it will remove ALL post-comma data. If I remember my C correctly, if you have an (int) in any calculation (+/-*) it will automatically presume int for all other numbers.
So in your case you want float on every number involved, otherwise you will loose data:
sqrt((float)a)*sqrt((float)a)==(float)a
is the way you want to go

Floating point math is inaccurate by nature.
So consider this code:
int a=35;
float conv = (float)a;
float sqrt_a = sqrt(conv);
if( sqrt_a*sqrt_a == conv )
printf("perfect square");
this is what will happen:
a = 35
conv = 35.000000
sqrt_a = 5.916079
sqrt_a*sqrt_a = 34.999990734
this is amply clear that sqrt_a^2 is not equal to a.

C/C++ counting the number of decimals?

Lets say that input from the user is a decimal number, ex. 5.2155 (having 4 decimal digits). It can be stored freely (int,double) etc.
Is there any clever (or very simple) way to find out how many decimals the number has? (kinda like the question how do you find that a number is even or odd by masking last bit).

Two ways I know of, neither very clever unfortunately but this is more a limitation of the environment rather than me :-)
The first is to sprintf the number to a big buffer with a "%.50f" format string, strip off the trailing zeros then count the characters after the decimal point. This will be limited by the printf family itself. Or you could use the string as input by the user (rather than sprintfing a floating point value), so as to avoid floating point problems altogether.
The second is to subtract the integer portion then iteratively multiply by 10 and again subtract the integer portion until you get zero. This is limited by the limits of computer representation of floating point numbers - at each stage you may get the problem of a number that cannot be represented exactly (so .2155 may actually be .215499999998). Something like the following (untested, except in my head, which is about on par with a COMX-35):
count = 0
num = abs(num)
num = num - int(num)
while num != 0:
num = num * 10
count = count + 1
num = num - int(num)
If you know the sort of numbers you'll get (e.g., they'll all be 0 to 4 digits after the decimal point), you can use standard floating point "tricks" to do it properly. For example, instead of:
while num != 0:
use
while abs(num) >= 0.0000001:

Once the number is converted from the user representation (string, OCR-ed gif file, whatever) into a floating point number, you are not dealing with the same number necessarily. So the strict, not very useful answer is "No".
If (case A) you can avoid converting the number from the string representation, the problem becomes much easier, you only need to count the digits after the decimal point and subtract the number of trailing zeros.
If you cannot do it (case B), then you need to make an assumption about the maximum number of decimals, convert the number back into string representation and round it to this maximum number using the round-to-even method. For example, if the user supplies 1.1 which gets represented as 1.09999999999999 (hypothetically), converting it back to string yields, guess what, "1.09999999999999". Rounding this number to, say, four decimal points gives you "1.1000". Now it's back to case A.

Off the top of my head:
start with the fractional portion: .2155
repeatedly multiply by 10 and throw away the integer portion of the number until you get zero. The number of steps will be the number of decimals. e.g:
.2155 * 10 = 2.155
.155 * 10 = 1.55
.55 * 10 = 5.5
.5 * 10 = 5.0
4 steps = 4 decimal digits

Something like this might work as well:
float i = 5.2154;
std::string s;
std::string t;
std::stringstream out;
out << i;
s = out.str();
t = s.substr(s.find(".")+1);
cout<<"number of decimal places: " << t.length();

What do you mean "stored freely (int"? Once stored in an int, it has zero decimals left, clearly. A double is stored in a binary form, so no obvious or simple relation to "decimals" either. Why don't you keep the input as a string, just long enough to count those decimals, before sending it on to its final numeric-variable destination?

using the Scientific Notation format (to avoid rounding errors):
#include <stdio.h>
#include <string.h>
/* Counting the number of decimals
*
* 1. Use Scientific Notation format
* 2. Convert it to a string
* 3. Tokenize it on the exp sign, discard the base part
* 4. convert the second token back to number
*/
int main(){
int counts;
char *sign;
char str[15];
char *base;
char *exp10;
float real = 0.00001;
sprintf (str, "%E", real);
sign= ( strpbrk ( str, "+"))? "+" : "-";
base = strtok (str, sign);
exp10 = strtok (NULL, sign);
counts=atoi(exp10);
printf("[%d]\n", counts);
return 0;
}
[5]

If the decimal part of your number is stored in a separate int, you can just count the its decimal digits.
This is a improvement on andrei alexandrescu's improvement. His version was already faster than the naive way (dividing by 10 at every digit). The version below is constant time and faster at least on x86-64 and ARM for all sizes, but occupies twice as much binary code, so it is not as cache-friendly.
Benchmarks for this version vs alexandrescu's version on my PR on facebook folly.
Works on unsigned, not signed.
inline uint32_t digits10(uint64_t v) {
return 1
+ (std::uint32_t)(v>=10)
+ (std::uint32_t)(v>=100)
+ (std::uint32_t)(v>=1000)
+ (std::uint32_t)(v>=10000)
+ (std::uint32_t)(v>=100000)
+ (std::uint32_t)(v>=1000000)
+ (std::uint32_t)(v>=10000000)
+ (std::uint32_t)(v>=100000000)
+ (std::uint32_t)(v>=1000000000)
+ (std::uint32_t)(v>=10000000000ull)
+ (std::uint32_t)(v>=100000000000ull)
+ (std::uint32_t)(v>=1000000000000ull)
+ (std::uint32_t)(v>=10000000000000ull)
+ (std::uint32_t)(v>=100000000000000ull)
+ (std::uint32_t)(v>=1000000000000000ull)
+ (std::uint32_t)(v>=10000000000000000ull)
+ (std::uint32_t)(v>=100000000000000000ull)
+ (std::uint32_t)(v>=1000000000000000000ull)
+ (std::uint32_t)(v>=10000000000000000000ull);
}

Years after the fight but as I have made my own solution in three lines :
string number = "543.014";
size_t dotFound;
stoi(number, &dotFound));
string(number).substr(dotFound).size()
Of course you have to test before if it is really a float
(With stof(number) == stoi(number) for example)

int main()
{
char s[100];
fgets(s,100,stdin);
unsigned i=0,sw=0,k=0,l=0,ok=0;
unsigned length=strlen(s);
for(i=0;i<length;i++)
{
if(isprint(s[i]))
{
if(sw==1)
{
k++;
if(s[i]=='0')
{
ok=0;
}
if(ok==0)
{
if(s[i]=='0')
l++;
else
{
ok=1;
l=0;
}
}
}
if(s[i]=='.')
{
sw=1;
}
}
}
printf("%d",k-l);
return 0;
}

This is a robust C++ 11 implementation suitable for float and double types:
template <typename T>
std::enable_if_t<(std::is_floating_point<T>::value), std::size_t>
decimal_places(T v)
{
std::size_t count = 0;
v = std::abs(v);
auto c = v - std::floor(v);
T factor = 10;
T eps = std::numeric_limits<T>::epsilon() * c;
while ((c > eps && c < (1 - eps)) && count < std::numeric_limits<T>::max_digits10)
{
c = v * factor;
c = c - std::floor(c);
factor *= 10;
eps = std::numeric_limits<T>::epsilon() * v * factor;
count++;
}
return count;
}
It throws the value away each iteration and instead keeps track of a power of 10 multiplier to avoid rounding issues building up. It uses machine epsilon to correctly handle decimal numbers that cannot be represented exactly in binary such as the value of 5.2155 as stipulated in the question.

Based on what others wrote, this has worked well for me. This solution does handle the case where a number can't be represented exactly in binary.
As suggested by others, the condition for the while loop indicates the precise behavior. My update uses the machine epsilon value to test whether the remainder on any loop is representable by the numeric format. The test should not compare to 0 or a hardcoded value like 0.000001.
template<class T, std::enable_if_t<std::is_floating_point_v<T>, T>* = nullptr>
unsigned int NumDecimalPlaces(T val)
{
unsigned int decimalPlaces = 0;
val = std::abs(val);
val = val - std::round(val);
while (
val - std::numeric_limits<T>::epsilon() > std::numeric_limits<T>::epsilon() &&
decimalPlaces <= std::numeric_limits<T>::digits10)
{
std::cout << val << ", ";
val = val * 10;
++decimalPlaces;
val = val - std::round(val);
}
return val;
}
As an example, if the input value is 2.1, the correct solution is 1. However, some other answers posted here would output 16 if using double precision because 2.1 can't be precisely represented in double precision.

I would suggest reading the value as a string, searching for the decimal point, and parsing the text before and after it as integers. No floating point or rounding errors.

char* fractpart(double f)
{
int intary={1,2,3,4,5,6,7,8,9,0};
char charary={'1','2','3','4','5','6','7','8','9','0'};
int count=0,x,y;
f=f-(int)f;
while(f<=1)
{
f=f*10;
for(y=0;y<10;y++)
{
if((int)f==intary[y])
{
chrstr[count]=charary[y];
break;
}
}
f=f-(int)f;
if(f<=0.01 || count==4)
break;
if(f<0)
f=-f;
count++;
}
return(chrstr);
}

Here is the complete program
#include <iostream.h>
#include <conio.h>
#include <string.h>
#include <math.h>
char charary[10]={'1','2','3','4','5','6','7','8','9','0'};
int intary[10]={1,2,3,4,5,6,7,8,9,0};
char* intpart(double);
char* fractpart(double);
int main()
{
clrscr();
int count = 0;
double d = 0;
char intstr[10], fractstr[10];
cout<<"Enter a number";
cin>>d;
strcpy(intstr,intpart(d));
strcpy(fractstr,fractpart(d));
cout<<intstr<<'.'<<fractstr;
getche();
return(0);
}
char* intpart(double f)
{
char retstr[10];
int x,y,z,count1=0;
x=(int)f;
while(x>=1)
{
z=x%10;
for(y=0;y<10;y++)
{
if(z==intary[y])
{
chrstr[count1]=charary[y];
break;
}
}
x=x/10;
count1++;
}
for(x=0,y=strlen(chrstr)-1;y>=0;y--,x++)
retstr[x]=chrstr[y];
retstr[x]='\0';
return(retstr);
}
char* fractpart(double f)
{
int count=0,x,y;
f=f-(int)f;
while(f<=1)
{
f=f*10;
for(y=0;y<10;y++)
{
if((int)f==intary[y])
{
chrstr[count]=charary[y];
break;
}
}
f=f-(int)f;
if(f<=0.01 || count==4)
break;
if(f<0)
f=-f;
count++;
}
return(chrstr);
}

One way would be to read the number in as a string. Find the length of the substring after the decimal point and that's how many decimals the person entered. To convert this string into a float by using
atof(string.c_str());
On a different note; it's always a good idea when dealing with floating point operations to store them in a special object which has finite precision. For example, you could store the float points in a special type of object called "Decimal" where the whole number part and the decimal part of the number are both ints. This way you have a finite precision. The downside to this is that you have to write out methods for arithmetic operations (+, -, *, /, etc.), but you can easily overwrite operators in C++. I know this deviates from your original question, but it's always better to store your decimals in a finite form. In this way you can also answer your question of how many decimals the number has.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I convert double to string using only math.h? - c++

Have you looked at the uClibc implementation of printf?

Related

Do multiples of Pi to the thousandths have a value that may change how a loop executes?

Multiplication between big integers and doubles

Generating random floating-point values based on random bit stream

Can I rely on this to judge a square number in C++?

C/C++ counting the number of decimals?

Categories

Resources