Newton-Raphson Division With Big Integers - c++

I'm making a BigInt class as a programming exercise. It uses a vector of 2's complement signed ints in base-65536 (so that 32-bit multiplications don't overflow. I will increase the base once I get it fully working).
All of the basic math operations are coded, with one problem: division is painfully slow with the basic algorithm I was able to create. (It kind of works like binary division for each digit of the quotient... I'm not going to post it unless someone wants to see it....)
Instead of my slow algorithm, I want to use Newton-Raphson to find the (shifted) reciprocal and then multiply (and shift). I think I have my head around the basics: you give the formula (x1 = x0(2 - x0 * divisor)) a good initial guess, and then after some amount of iterations, x converges to the reciprocal. This part seems easy enough... but I am running into some problems when trying to apply this formula to big integers:
Problem 1:
Because I am working with integers... well... I can't use fractions. This seems to cause x to always diverge (x0 * divisor must be <2 it seems?). My intuition tells me there should be some modification to the equation that would allow it to work integers (to some accuracy) but I am really struggling to find out what it is. (My lack of math skills are beating me up here....) I think I need to find some equivalent equation where instead of d there is d*[base^somePower]? Can there be some equation like (x1 = x0(2 - x0 * d)) that works with whole numbers?
Problem 2:
When I use Newton's formula to find the reciprocal of some numbers, the result ends up being just a small faction below what the answer should be... ex. when trying to find reciprocal of 4 (in decimal):
x0 = 0.3
x1 = 0.24
x2 = 0.2496
x3 = 0.24999936
x4 = 0.2499999999983616
x5 = 0.24999999999999999999998926258176
If I were representing numbers in base-10, I would want a result of 25 (and to remember to right shift product by 2). With some reciprocals such as 1/3, you can simply truncate the result after you know you have enough accuracy. But how can I pull out the correct reciprocal from the above result?
Sorry if this is all too vague or if I'm asking for too much. I looked through Wikipedia and all of the research papers I could find on Google, but I feel like I'm banging my head against a wall. I appreciate any help anyone can give me!
...
Edit: Got the algorithm working, although it is much slower than I expected. I actually lost a lot of speed compared to my old algorithm, even on numbers with thousands of digits... I'm still missing something. It's not a problem with multiplication, which is very fast. (I am indeed using Karatsuba's algorithm).
For anyone interested, here is my current iteration of the Newton-Raphson algorithm:
bigint operator/(const bigint& lhs, const bigint& rhs) {
if (rhs == 0) throw overflow_error("Divide by zero exception");
bigint dividend = lhs;
bigint divisor = rhs;
bool negative = 0;
if (dividend < 0) {
negative = !negative;
dividend.invert();
}
if (divisor < 0) {
negative = !negative;
divisor.invert();
}
int k = dividend.numBits() + divisor.numBits();
bigint pow2 = 1;
pow2 <<= k + 1;
bigint x = dividend - divisor;
bigint lastx = 0;
bigint lastlastx = 0;
while (1) {
x = (x * (pow2 - x * divisor)) >> k;
if (x == lastx || x == lastlastx) break;
lastlastx = lastx;
lastx = x;
}
bigint quotient = dividend * x >> k;
if (dividend - (quotient * divisor) >= divisor) quotient++;
if (negative)quotient.invert();
return quotient;
}
And here is my (really ugly) old algorithm that is faster:
bigint operator/(const bigint& lhs, const bigint & rhs) {
if (rhs == 0) throw overflow_error("Divide by zero exception");
bigint dividend = lhs;
bigint divisor = rhs;
bool negative = 0;
if (dividend < 0) {
negative = !negative;
dividend.invert();
}
if (divisor < 0) {
negative = !negative;
divisor.invert();
}
bigint remainder = 0;
bigint quotient = 0;
while (dividend.value.size() > 0) {
remainder.value.insert(remainder.value.begin(), dividend.value.at(dividend.value.size() - 1));
remainder.value.push_back(0);
remainder.unPad();
dividend.value.pop_back();
if (divisor > remainder) {
quotient.value.push_back(0);
} else {
int count = 0;
int i = MSB;
bigint value = 0;
while (i > 0) {
bigint increase = divisor * i;
bigint next = value + increase;
if (next <= remainder) {
value = next;
count += i;
}
i >>= 1;
}
quotient.value.push_back(count);
remainder -= value;
}
}
for (int i = 0; i < quotient.value.size() / 2; i++) {
int swap = quotient.value.at(i);
quotient.value.at(i) = quotient.value.at((quotient.value.size() - 1) - i);
quotient.value.at(quotient.value.size() - 1 - i) = swap;
}
if (negative)quotient.invert();
quotient.unPad();
return quotient;
}

First of all, you can implement division in time O(n^2) and with reasonable constant, so it's not (much) slower than the naive multiplication. However, if you use Karatsuba-like algorithm, or even FFT-based multiplication algorithm, then you indeed can speedup your division algorithm using Newton-Raphson.
A Newton-Raphson iteration for calculating the reciprocal of x is q[n+1]=q[n]*(2-q[n]*x).
Suppose we want to calculate floor(2^k/B) where B is a positive integer. WLOG, B≤2^k; otherwise, the quotient is 0. The Newton-Raphson iteration for x=B/2^k yields q[n+1]=q[n]*(2-q[n]*B/2^k). we can rearrange it as
q[n+1]=q[n]*(2^(k+1)-q[n]*B) >> k
Each iteration of this kind requires only integer multiplications and bit shifts. Does it converge to floor(2^k/B)? Not necessarily. However, in the worst case, it eventually alternates between floor(2^k/B) and ceiling(2^k/B) (Prove it!). So you can use some not-so-clever test to see if you are in this case, and extract floor(2^k/B). (this "not-so-clever test" should be alot faster than the multiplications in each iteration; However, it will be nice to optimize this thing).
Indeed, calculating floor(2^k/B) suffices in order to calculate floor(A/B) for any positive integers A,B. Take k such that A*B≤2^k, and verify floor(A/B)=A*ceiling(2^k/B) >> k.
Lastly, a simple but important optimization for this approach is to truncate multiplications (i.e. calculate only the higher bits of the product) in the early iterations of the Newton-Raphson method. The reason to do so, is that the results of the early iterations are far from the quotient, and it doesn't matter to perform them inaccurately. (Refine this argument and show that if you do this thing appropriately, you can divide two ≤n-bit integers in time O(M(2n)), assuming you can multiply two ≤k-bit integers in time M(k), and M(x) is an increasing convex function).

If I see this correctly a major improvement is picking a good starting value for x. Knowing how many digits the divisor has you know where the most significant bit of the inverse has to be, as
1/x = pow(2,log2(1/x))
1/x = pow(2,-log2(x))
1/x >= pow(2,-floor(log2(x)))
floor(log2(x)) simply is the index of the most significant bit set.
As suggested in the comment by the op using a 256-bit lookup table is a going to speed up the convergence even more, because each step roughly doubles the amount of correct digits. Starting with 8 correct digits is better than starting with 1 and much better than starting with even less than that.
constexpr fixpoint_integer_inverse(const T& d) {
uint8_t lut[256] = { 255u,254u,253u,252u,251u,250u,249u,248u,247u,246u,245u,244u,243u,242u,241u,
240u,240u,239u,238u,237u,236u,235u,234u,234u,233u,232u,231u,230u,229u,229u,228u,
227u,226u,225u,225u,224u,223u,222u,222u,221u,220u,219u,219u,218u,217u,217u,216u,
215u,214u,214u,213u,212u,212u,211u,210u,210u,209u,208u,208u,207u,206u,206u,205u,
204u,204u,203u,202u,202u,201u,201u,200u,199u,199u,198u,197u,197u,196u,196u,195u,
195u,194u,193u,193u,192u,192u,191u,191u,190u,189u,189u,188u,188u,187u,187u,186u,
186u,185u,185u,184u,184u,183u,183u,182u,182u,181u,181u,180u,180u,179u,179u,178u,
178u,177u,177u,176u,176u,175u,175u,174u,174u,173u,173u,172u,172u,172u,171u,171u,
170u,170u,169u,169u,168u,168u,168u,167u,167u,166u,166u,165u,165u,165u,164u,164u,
163u,163u,163u,162u,162u,161u,161u,161u,160u,160u,159u,159u,159u,158u,158u,157u,
157u,157u,156u,156u,156u,155u,155u,154u,154u,154u,153u,153u,153u,152u,152u,152u,
151u,151u,151u,150u,150u,149u,149u,149u,148u,148u,148u,147u,147u,147u,146u,146u,
146u,145u,145u,145u,144u,144u,144u,144u,143u,143u,143u,142u,142u,142u,141u,141u,
141u,140u,140u,140u,140u,139u,139u,139u,138u,138u,138u,137u,137u,137u,137u,136u,
136u,136u,135u,135u,135u,135u,134u,134u,134u,134u,133u,133u,133u,132u,132u,132u,
132u,131u,131u,131u,131u,130u,130u,130u,130u,129u,129u,129u,129u,128u,128u,128u,
127u
};
const auto l = log2(d);
T x;
if (l<8) {
x = T(1)<<(digits(d)-1-l);
} else {
if (digits(d)>(l+8)) x = T(lut[(d>>(l-8))-256])<<(digits(d)-l-8);
else x = T(lut[(d>>(l-8))-256])>>(l+8-digits(d));
}
if (x==0) x=1;
while(true) {
const auto lm = long_mul(x,T(1)-x*d);
const T i = get<0>(lm);
if (i) x+=i;
else return x;
}
return x;
}
// calculate a * b = r0r1
template<typename T>
typename std::enable_if<std::is_unsigned<T>::value,tuple<T,T>>::type
constexpr long_mul(const T& a, const T& b){
const T N = digits<T>()/2;
const T t0 = (a>>N)*(b>>N);
const T t1 = ((a<<N)>>N)*(b>>N);
const T t2 = (a>>N)*((b<<N)>>N);
const T t3 = ((a<<N)>>N)*((b<<N)>>N);
const T t4 = t3+(t1<<N);
const T r1 = t4+(t2<<N);
const T r0 = (r1<t4)+(t4<t3)+(t1>>N)+(t2>>N)+t0;
return {r0,r1};
}

Newton-Raphson is an approximation algorithm - not appropriate for use in integer math. You will get rounding errors which will result in the kind of problems you are seeing. You could do the problem with floating point numbers and then see if you get an integer, precise to a specified number of digits (see next paragraph)
As to the second problem, pick a precision (number of decimal places) you want for accuracy and round to that precision. If you picked twenty digits of precision in the problem, you would round to 0.25. You simply need to iterate until your required digits of precision are stable. In general, representing irrational numbers on a computer often introduces imprecision.

Related

How to write a loop that calculates power?

I'm trying to write a loop that calculates power without using the pow() function. I'm stuck on how to do that. Doing base *= base works for even powers upto 4, so there is something totally weird that I can't seem to figure out.
int Fast_Power(int base, int exp){
int i = 2;
int result;
if(exp == 0){
result = 1;
}
if(exp == 1){
result = base;
}
else{
for(i = 2; i < exp; i++){
base *= base;
result = base;
}
}
return result;
}
base *= base;
Your problem lies with that statement, you should not be changing base at all. Rather, you should be adjusting result based on the constant value of base.
To do powers, you need repeated multiplication, but the base *= base gives you a repeated squaring of the value and you'll therefore get a much bigger value than desired. This actually works for powers of four since you iterate 4 - 2 times, squaring each iteration, and x4 == (x2)2.
It will not work for higher powers like six since you iterate 6 - 2 times, and x6 != (((x2)2)2)2. That latter value is actually equivalent to x16.
As an aside (despite your contention), it's actually not guaranteed to work for powers of two. If you follow the code in that case, you'll see that result is never assigned a value so the return value will be arbitrary. If it's working for you, that's accidental and likely to bite you at some point.
The algorithm you can use should be something like:
float power(float base, int exponent):
# 0^0 is undefined.
if base == 0 and exponent == 0:
throw bad_input
# Handle negative exponents.
if exponent < 0:
return 1 / power(base, -exponent)
# Repeated multiplication to get power.
float result = 1
while exponent > 0:
# Use checks to detect overflow.
float oldResult = result
result *= base
if result / base is not close to oldResult:
throw overflow
exponent -= 1
return result
This algorithm handles:
negative integral exponents (since x-y = 1/xy);
the undefined case of 00; and
overflow if you do not have arbitrary-precision values (basically, if (x * y) / y != x, you can be reasonably certain an overflow has occurred). Note the use of "not close to", it's unwise to check floats for exact equality due to potential for errors due to precision limits - far better to implement a "is close enough to" check of some description.
One thing to keep in mind when translating to C or C++, a 2's complement implementation will cause issues when using the most negative integer, since its negation is often the same value again again due to the imbalance between the positive and negative values. This is likely to lead to infinite recursion.
You can fix that simply by detecting the case early on (before anything else), with something like:
if INT_MIN == -INT_MAX - 1 and exp == INT_MIN:
throw bad_input
The first part of that detects a 2's complement implementation, while the second detects the (problematic) use of INT_MIN as an exponent.
What you were doing wrong is base *= base each time through the loop, which changes the base itself, each iteration.
Instead you want the base to remain the same, and multiply the final result by that original base "exp" times.
int Fast_Power(int base, int exp){
int result=1;
if(exp == 0){
result = 1;
}
if(exp == 1){
result = base;
}
else{
for(int i = 0; i < exp; i++){
result *= base;
}
}
return result;
}
The basic but naive algorithm you are looking for that is horribly subject to integer overflow is:
int Fast_Power (int base, int exp)
{
int result = base;
if (exp == 0)
return result ? 1 : 0;
for (int i = 1; i < exp; i++) {
result *= base;
}
return result;
}
Note: result can very easily overflow. You need to employ some basic check to prevent integer-overflow and Undefined Behavior.
A minimal check (see: Catch and compute overflow during multiplication of two large integers), can be incorporated as follows. You must use a wider-type for the temporary calculation here and then compare the results against INT_MIN and INT_MAX (provided in the limits.h header) to determine if overflow occurred:
#include <limits.h>
...
int Fast_Power (int base, int exp)
{
int result = base;
if (exp == 0)
return result ? 1 : 0;
for (int i = 1; i < exp; i++) {
long long int tmp = (long long) result * base; /* tmp of wider type */
if (tmp < INT_MIN || INT_MAX < tmp) { /* check for overflow */
fputs ("error: overflow occurred.\n", stderr);
return 0;
}
result = tmp;
}
return result;
}
Now if you attempt, e.g. Fast_Power (2, 31); an error is generated and zero returned.
Additionally as #paxdiablo notes in the comment Zero to the power of zero may be undefined as there is no agreed upon value. You can add a test and issue a warning/error in that case if you desire.
First off, I agree it was probably a mistake to use base *= base. That said, it's not necessarily the mistake. My first impression was that OP was trying to compute powers the way that a human might do by hand. For example if you wanted to compute 3^13 a reasonable way is to start is by computing exponents which are powers of 2.
3^1 = 3
3^2 = 3*3 = 9
3^4 = 3^2 * 3^2 = 81
3^8 = 3^4 * 3^4 = 6,561
Then you can use these results to compute 3^13 as
3^13 = 3^1 * 3^4 * 3^8 = 1,594,323
Once you understand the steps you could code this. The hardest part is probably determining when to stop squaring the base, and which squares should be included in the final calculation. Perhaps surprisingly the (unsigned) binary representation of the exponent tells us this! This is because the digits in binary represent the powers of two which sum together to form the number. With that in mind we can write the following.
int Fast_Power(int base, int exp) {
int result = 1;
unsigned int expu = exp;
unsigned int power_of_two = 1;
while (expu > 0) {
if (power_of_two & expu) {
result *= base;
expu ^= power_of_two;
}
power_of_two <<= 1;
base *= base;
}
return result;
}
This code doesn't have overflow protection, though that would be a good idea. Sticking with the original prototype it still accepts negative exponents and returns integers, which is a contradiction. Since OP didn't specify what should occur upon overflow or negative exponents this code doesn't attempt to handle either of those cases. Reasonable methods of addressing these issues are provided by other answers.

Correct way to find nth root using pow() in c++

I have to find nth root of numbers that can be as large as 10^18, with n as large as 10^4.
I know using pow() we can find the nth roots using,
x = (long int)(1e-7 + pow(number, 1.0 / n))
But this is giving wrong answers on online programming judges, but on all the cases i have taken, it is giving correct results. Is there something wrong with this method for the given constraints
Note: nth root here means the largest integer whose nth power is less than or equal to the given number, i.e., largest 'x' for which x^n <= number.
Following the answers, i know this approach is wrong, then what is the way i should do it?
You can just use
x = (long int)pow(number, 1.0 / n)
Given the high value of n, most answers will be 1.
UPDATE:
Following the OP comment, this approach is indeed flawed, because in most cases 1/n does not have an exact floating-point representation and the floor of the 1/n-th power can be off by one.
And rounding is not better solution, it can make the root off by one in excess.
Another problem is that values up to 10^18 cannot be represented exactly using double precision, whereas 64 bits ints do.
My proposal:
1) truncate the 11 low order bits of number before the (implicit) cast to double, to avoid rounding up by the FP unit (unsure if this is useful).
2) use the pow function to get an inferior estimate of the n-th root, let r.
3) compute the n-th power of r+1 using integer arithmetic only (by repeated squaring).
4) the solution is r+1 rather than r in case that the n-th power fits.
There remains a possibility that the FP unit rounds up when computing 1/n, leading to a slightly too large result. I doubt that this "too large" can get as large as one unit in the final result, but this should be checked.
I think I finally understood your problem. All you want to do is raise a value, say X, to the reciprocal of a number, say n (i.e., find ⁿ√X̅), and round down. If you then raise that answer to the n-th power, it will never be larger than your original X. The problem is that the computer sometimes runs into rounding error.
#include <cmath>
long find_nth_root(double X, int n)
{
long nth_root = std::trunc(std::pow(X, 1.0 / n));
// because of rounding error, it's possible that nth_root + 1 is what we actually want; let's check
if (std::pow(nth_root + 1, n) <= X) {
return nth_root + 1;
}
return nth_root;
}
Of course, the original question was to find the largest integer, Y, that satisfies the equation X ≤ Yⁿ. That's easy enough to write:
long find_nth_root(double x, int d)
{
long i = 0;
for (; std::pow(i + 1, d) <= x; ++i) { }
return i;
}
This will probably run faster than you'd expect. But you can do better with a binary search:
#include <cmath>
long find_nth_root(double x, int d)
{
long low = 0, high = 1;
while (std::pow(high, d) <= x) {
low = high;
high *= 2;
}
while (low != high - 1) {
long step = (high - low) / 2;
long candidate = low + step;
double value = std::pow(candidate, d);
if (value == x) {
return candidate;
}
if (value < x) {
low = candidate;
continue;
}
high = candidate;
}
return low;
}
I use this routine I wrote. It's the faster of the ones I've seen here. It also handles up to 64 bits. BTW, n1 is the input number.
for (n3 = 0; ((mnk) < n1) ; n3+=0.015625, nmrk++) {
mk += 0.0073125;
dad += 0.00390625;
mnk = pow(n1, 1.0/(mk+n3+dad));
mnk = pow(mnk, (mk+n3+dad));
}
Although not always perfect, it does come the closest.
You can try this to get the nth_root with unsigned in C :
// return a number that, when multiplied by itself nth times, makes N.
unsigned nth_root(const unsigned n, const unsigned nth) {
unsigned a = n, c, d, r = nth ? n + (n > 1) : n == 1 ;
for (; a < r; c = a + (nth - 1) * r, a = c / nth)
for (r = a, a = n, d = nth - 1; d && (a /= r); --d);
return r;
}
Yes it does not include <math.h>, example of output :
24 == (int) pow(15625, 1.0/3)
25 == nth_root(15625, 3)
0 == nth_root(0, 0)
1 == nth_root(1, 0)
4 == nth_root(4096, 6)
13 == nth_root(18446744073709551614, 17) // 64-bit 20 digits
11 == nth_root(340282366920938463463374607431768211454, 37) // 128-bit 39 digits
The default guess is the variable a, set to n.

Multiplication between big integers and doubles

I am managing some big (128~256bits) integers with gmp. It has come a point were I would like to multiply them for a double close to 1 (0.1 < double < 10), the result being still an approximated integer. A good example of the operation I need to do is the following:
int i = 1000000000000000000 * 1.23456789
I searched in the gmp documentation but I didn't find a function for this, so I ended up writing this code which seems to work well:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d, int prec=10) {
if (prec > 15) prec=15; //avoids overflows
uint_fast64_t m = (uint_fast64_t) floor(d);
r = i * m;
uint_fast64_t pos=1;
for (uint_fast8_t j=0; j<prec; j++) {
const double posd = (double) pos;
m = ((uint_fast64_t) floor(d * posd * 10.)) -
((uint_fast64_t) floor(d * posd)) * 10;
pos*=10;
r += (i * m) /pos;
}
}
Can you please tell me what do you think? Do you have any suggestion to make it more robust or faster?
this is what you wanted:
// BYTE lint[_N] ... lint[0]=MSB, lint[_N-1]=LSB
void mul(BYTE *c,BYTE *a,double b) // c[_N]=a[_N]*b
{
int i; DWORD cc;
double q[_N+1],aa,bb;
for (q[0]=0.0,i=0;i<_N;) // mul,carry down
{
bb=double(a[i])*b; aa=floor(bb); bb-=aa;
q[i]+=aa; i++;
q[i]=bb*256.0;
}
cc=0; if (q[_N]>127.0) cc=1.0; // round
for (i=_N-1;i>=0;i--) // carry up
{
double aa,bb;
cc+=q[i];
c[i]=cc&255;
cc>>=8;
}
}
_N is number of bits/8 per large int, large int is array of _N BYTEs where first byte is MSB (most significant BYTE) and last BYTE is LSB (least significant BYTE)
function is not handling signum, but it is only one if and some xor/inc to add.
trouble is that double has low precision even for your number 1.23456789 !!! due to precision loss the result is not exact what it should be (1234387129122386944 instead of 1234567890000000000) I think my code is mutch quicker and even more precise than yours because i do not need to mul/mod/div numbers by 10, instead i use bit shifting where is possible and not by 10-digit but by 256-digit (8bit). if you need more precision than use long arithmetic. you can speed up this code by using larger digits (16,32, ... bit)
My long arithmetics for precise astro computations are usually fixed point 256.256 bits numbers consist of 2*8 DWORDs + signum, but of course is much slower and some goniometric functions are realy tricky to implement, but if you want just basic functions than code your own lon arithmetics is not that hard.
also if you want to have numbers often in readable form is good to compromise between speed/size and consider not to use binary coded numbers but BCD coded numbers
I am not so familiar with either C++ or GMP what I could suggest source code without syntax errors, but what you are doing is more complicated than it should and can introduce unnecessary approximation.
Instead, I suggest you write function mpz_mult_d() like this:
mpz_mult_d(mpz_class & r, const mpz_class & i, double d) {
d = ldexp(d, 52); /* exact, no overflow because 1 <= d <= 10 */
unsigned long long l = d; /* exact because d is an integer */
p = l * i; /* exact, in GMP */
(quotient, remainder) = p / 2^52; /* in GMP */
And now the next step depends on the kind of rounding you wish. If you wish the multiplication of d by i to give a result rounded toward -inf, just return quotient as result of the function. If you wish a result rounded to the nearest integer, you must look at remainder:
assert(0 <= remainder); /* proper Euclidean division */
assert(remainder < 2^52);
if (remainder < 2^51) return quotient;
if (remainder > 2^51) return quotient + 1; /* in GMP */
if (remainder == 2^51) return quotient + (quotient & 1); /* in GMP, round to “even” */
PS: I found your question by random browsing but if you had tagged it “floating-point”, people more competent than me could have answered it quickly.
Try this strategy:
Convert integer value to big float
Convert double value to big float
Make product
Convert result to integer
mpf_set_z(...)
mpf_set_d(...)
mpf_mul(...)
mpz_set_f(...)

How can I write a power function myself?

I was always wondering how I can make a function which calculates the power (e.g. 23) myself. In most languages these are included in the standard library, mostly as pow(double x, double y), but how can I write it myself?
I was thinking about for loops, but it think my brain got in a loop (when I wanted to do a power with a non-integer exponent, like 54.5 or negatives 2-21) and I went crazy ;)
So, how can I write a function which calculates the power of a real number? Thanks
Oh, maybe important to note: I cannot use functions which use powers (e.g. exp), which would make this ultimately useless.
Negative powers are not a problem, they're just the inverse (1/x) of the positive power.
Floating point powers are just a little bit more complicated; as you know a fractional power is equivalent to a root (e.g. x^(1/2) == sqrt(x)) and you also know that multiplying powers with the same base is equivalent to add their exponents.
With all the above, you can:
Decompose the exponent in a integer part and a rational part.
Calculate the integer power with a loop (you can optimise it decomposing in factors and reusing partial calculations).
Calculate the root with any algorithm you like (any iterative approximation like bisection or Newton method could work).
Multiply the result.
If the exponent was negative, apply the inverse.
Example:
2^(-3.5) = (2^3 * 2^(1/2)))^-1 = 1 / (2*2*2 * sqrt(2))
AB = Log-1(Log(A)*B)
Edit: yes, this definition really does provide something useful. For example, on an x86, it translates almost directly to FYL2X (Y * Log2(X)) and F2XM1 (2x-1):
fyl2x
fld st(0)
frndint
fsubr st(1),st
fxch st(1)
fchs
f2xmi
fld1
faddp st(1),st
fscale
fstp st(1)
The code ends up a little longer than you might expect, primarily because F2XM1 only works with numbers in the range -1.0..1.0. The fld st(0)/frndint/fsubr st(1),st piece subtracts off the integer part, so we're left with only the fraction. We apply F2XM1 to that, add the 1 back on, then use FSCALE to handle the integer part of the exponentiation.
Typically the implementation of the pow(double, double) function in math libraries is based on the identity:
pow(x,y) = pow(a, y * log_a(x))
Using this identity, you only need to know how to raise a single number a to an arbitrary exponent, and how to take a logarithm base a. You have effectively turned a complicated multi-variable function into a two functions of a single variable, and a multiplication, which is pretty easy to implement. The most commonly chosen values of a are e or 2 -- e because the e^x and log_e(1+x) have some very nice mathematical properties, and 2 because it has some nice properties for implementation in floating-point arithmetic.
The catch of doing it this way is that (if you want to get full accuracy) you need to compute the log_a(x) term (and its product with y) to higher accuracy than the floating-point representation of x and y. For example, if x and y are doubles, and you want to get a high accuracy result, you'll need to come up with some way to store intermediate results (and do arithmetic) in a higher-precision format. The Intel x87 format is a common choice, as are 64-bit integers (though if you really want a top-quality implementation, you'll need to do a couple of 96-bit integer computations, which are a little bit painful in some languages). It's much easier to deal with this if you implement powf(float,float), because then you can just use double for intermediate computations. I would recommend starting with that if you want to use this approach.
The algorithm that I outlined is not the only possible way to compute pow. It is merely the most suitable for delivering a high-speed result that satisfies a fixed a priori accuracy bound. It is less suitable in some other contexts, and is certainly much harder to implement than the repeated-square[root]-ing algorithm that some others have suggested.
If you want to try the repeated square[root] algorithm, begin by writing an unsigned integer power function that uses repeated squaring only. Once you have a good grasp on the algorithm for that reduced case, you will find it fairly straightforward to extend it to handle fractional exponents.
There are two distinct cases to deal with: Integer exponents and fractional exponents.
For integer exponents, you can use exponentiation by squaring.
def pow(base, exponent):
if exponent == 0:
return 1
elif exponent < 0:
return 1 / pow(base, -exponent)
elif exponent % 2 == 0:
half_pow = pow(base, exponent // 2)
return half_pow * half_pow
else:
return base * pow(base, exponent - 1)
The second "elif" is what distinguishes this from the naïve pow function. It allows the function to make O(log n) recursive calls instead of O(n).
For fractional exponents, you can use the identity a^b = C^(b*log_C(a)). It's convenient to take C=2, so a^b = 2^(b * log2(a)). This reduces the problem to writing functions for 2^x and log2(x).
The reason it's convenient to take C=2 is that floating-point numbers are stored in base-2 floating point. log2(a * 2^b) = log2(a) + b. This makes it easier to write your log2 function: You don't need to have it be accurate for every positive number, just on the interval [1, 2). Similarly, to calculate 2^x, you can multiply 2^(integer part of x) * 2^(fractional part of x). The first part is trivial to store in a floating point number, for the second part, you just need a 2^x function over the interval [0, 1).
The hard part is finding a good approximation of 2^x and log2(x). A simple approach is to use Taylor series.
Per definition:
a^b = exp(b ln(a))
where exp(x) = 1 + x + x^2/2 + x^3/3! + x^4/4! + x^5/5! + ...
where n! = 1 * 2 * ... * n.
In practice, you could store an array of the first 10 values of 1/n!, and then approximate
exp(x) = 1 + x + x^2/2 + x^3/3! + ... + x^10/10!
because 10! is a huge number, so 1/10! is very small (2.7557319224⋅10^-7).
Wolfram functions gives a wide variety of formulae for calculating powers. Some of them would be very straightforward to implement.
For positive integer powers, look at exponentiation by squaring and addition-chain exponentiation.
Using three self implemented functions iPow(x, n), Ln(x) and Exp(x), I'm able to compute fPow(x, a), x and a being doubles. Neither of the functions below use library functions, but just iteration.
Some explanation about functions implemented:
(1) iPow(x, n): x is double, n is int. This is a simple iteration, as n is an integer.
(2) Ln(x): This function uses the Taylor Series iteration. The series used in iteration is Σ (from int i = 0 to n) {(1 / (2 * i + 1)) * ((x - 1) / (x + 1)) ^ (2 * n + 1)}. The symbol ^ denotes the power function Pow(x, n) implemented in the 1st function, which uses simple iteration.
(3) Exp(x): This function, again, uses the Taylor Series iteration. The series used in iteration is Σ (from int i = 0 to n) {x^i / i!}. Here, the ^ denotes the power function, but it is not computed by calling the 1st Pow(x, n) function; instead it is implemented within the 3rd function, concurrently with the factorial, using d *= x / i. I felt I had to use this trick, because in this function, iteration takes some more steps relative to the other functions and the factorial (i!) overflows most of the time. In order to make sure the iteration does not overflow, the power function in this part is iterated concurrently with the factorial. This way, I overcame the overflow.
(4) fPow(x, a): x and a are both doubles. This function does nothing but just call the other three functions implemented above. The main idea in this function depends on some calculus: fPow(x, a) = Exp(a * Ln(x)). And now, I have all the functions iPow, Ln and Exp with iteration already.
n.b. I used a constant MAX_DELTA_DOUBLE in order to decide in which step to stop the iteration. I've set it to 1.0E-15, which seems reasonable for doubles. So, the iteration stops if (delta < MAX_DELTA_DOUBLE) If you need some more precision, you can use long double and decrease the constant value for MAX_DELTA_DOUBLE, to 1.0E-18 for example (1.0E-18 would be the minimum).
Here is the code, which works for me.
#define MAX_DELTA_DOUBLE 1.0E-15
#define EULERS_NUMBER 2.718281828459045
double MathAbs_Double (double x) {
return ((x >= 0) ? x : -x);
}
int MathAbs_Int (int x) {
return ((x >= 0) ? x : -x);
}
double MathPow_Double_Int(double x, int n) {
double ret;
if ((x == 1.0) || (n == 1)) {
ret = x;
} else if (n < 0) {
ret = 1.0 / MathPow_Double_Int(x, -n);
} else {
ret = 1.0;
while (n--) {
ret *= x;
}
}
return (ret);
}
double MathLn_Double(double x) {
double ret = 0.0, d;
if (x > 0) {
int n = 0;
do {
int a = 2 * n + 1;
d = (1.0 / a) * MathPow_Double_Int((x - 1) / (x + 1), a);
ret += d;
n++;
} while (MathAbs_Double(d) > MAX_DELTA_DOUBLE);
} else {
printf("\nerror: x < 0 in ln(x)\n");
exit(-1);
}
return (ret * 2);
}
double MathExp_Double(double x) {
double ret;
if (x == 1.0) {
ret = EULERS_NUMBER;
} else if (x < 0) {
ret = 1.0 / MathExp_Double(-x);
} else {
int n = 2;
double d;
ret = 1.0 + x;
do {
d = x;
for (int i = 2; i <= n; i++) {
d *= x / i;
}
ret += d;
n++;
} while (d > MAX_DELTA_DOUBLE);
}
return (ret);
}
double MathPow_Double_Double(double x, double a) {
double ret;
if ((x == 1.0) || (a == 1.0)) {
ret = x;
} else if (a < 0) {
ret = 1.0 / MathPow_Double_Double(x, -a);
} else {
ret = MathExp_Double(a * MathLn_Double(x));
}
return (ret);
}
It's an interesting exercise. Here's some suggestions, which you should try in this order:
Use a loop.
Use recursion (not better, but interesting none the less)
Optimize your recursion vastly by using divide-and-conquer
techniques
Use logarithms
You can found the pow function like this:
static double pows (double p_nombre, double p_puissance)
{
double nombre = p_nombre;
double i=0;
for(i=0; i < (p_puissance-1);i++){
nombre = nombre * p_nombre;
}
return (nombre);
}
You can found the floor function like this:
static double floors(double p_nomber)
{
double x = p_nomber;
long partent = (long) x;
if (x<0)
{
return (partent-1);
}
else
{
return (partent);
}
}
Best regards
A better algorithm to efficiently calculate positive integer powers is repeatedly square the base, while keeping track of the extra remainder multiplicands. Here is a sample solution in Python that should be relatively easy to understand and translate into your preferred language:
def power(base, exponent):
remaining_multiplicand = 1
result = base
while exponent > 1:
remainder = exponent % 2
if remainder > 0:
remaining_multiplicand = remaining_multiplicand * result
exponent = (exponent - remainder) / 2
result = result * result
return result * remaining_multiplicand
To make it handle negative exponents, all you have to do is calculate the positive version and divide 1 by the result, so that should be a simple modification to the above code. Fractional exponents are considerably more difficult, since it means essentially calculating an nth-root of the base, where n = 1/abs(exponent % 1) and multiplying the result by the result of the integer portion power calculation:
power(base, exponent - (exponent % 1))
You can calculate roots to a desired level of accuracy using Newton's method. Check out wikipedia article on the algorithm.
I am using fixed point long arithmetics and my pow is log2/exp2 based. Numbers consist of:
int sig = { -1; +1 } signum
DWORD a[A+B] number
A is number of DWORDs for integer part of number
B is number of DWORDs for fractional part
My simplified solution is this:
//---------------------------------------------------------------------------
longnum exp2 (const longnum &x)
{
int i,j;
longnum c,d;
c.one();
if (x.iszero()) return c;
i=x.bits()-1;
for(d=2,j=_longnum_bits_b;j<=i;j++,d*=d)
if (x.bitget(j))
c*=d;
for(i=0,j=_longnum_bits_b-1;i<_longnum_bits_b;j--,i++)
if (x.bitget(j))
c*=_longnum_log2[i];
if (x.sig<0) {d.one(); c=d/c;}
return c;
}
//---------------------------------------------------------------------------
longnum log2 (const longnum &x)
{
int i,j;
longnum c,d,dd,e,xx;
c.zero(); d.one(); e.zero(); xx=x;
if (xx.iszero()) return c; //**** error: log2(0) = infinity
if (xx.sig<0) return c; //**** error: log2(negative x) ... no result possible
if (d.geq(x,d)==0) {xx=d/xx; xx.sig=-1;}
i=xx.bits()-1;
e.bitset(i); i-=_longnum_bits_b;
for (;i>0;i--,e>>=1) // integer part
{
dd=d*e;
j=dd.geq(dd,xx);
if (j==1) continue; // dd> xx
c+=i; d=dd;
if (j==2) break; // dd==xx
}
for (i=0;i<_longnum_bits_b;i++) // fractional part
{
dd=d*_longnum_log2[i];
j=dd.geq(dd,xx);
if (j==1) continue; // dd> xx
c.bitset(_longnum_bits_b-i-1); d=dd;
if (j==2) break; // dd==xx
}
c.sig=xx.sig;
c.iszero();
return c;
}
//---------------------------------------------------------------------------
longnum pow (const longnum &x,const longnum &y)
{
//x^y = exp2(y*log2(x))
int ssig=+1; longnum c; c=x;
if (y.iszero()) {c.one(); return c;} // ?^0=1
if (c.iszero()) return c; // 0^?=0
if (c.sig<0)
{
c.overflow(); c.sig=+1;
if (y.isreal()) {c.zero(); return c;} //**** error: negative x ^ noninteger y
if (y.bitget(_longnum_bits_b)) ssig=-1;
}
c=exp2(log2(c)*y); c.sig=ssig; c.iszero();
return c;
}
//---------------------------------------------------------------------------
where:
_longnum_bits_a = A*32
_longnum_bits_b = B*32
_longnum_log2[i] = 2 ^ (1/(2^i)) ... precomputed sqrt table
_longnum_log2[0]=sqrt(2)
_longnum_log2[1]=sqrt[tab[0])
_longnum_log2[i]=sqrt(tab[i-1])
longnum::zero() sets *this=0
longnum::one() sets *this=+1
bool longnum::iszero() returns (*this==0)
bool longnum::isnonzero() returns (*this!=0)
bool longnum::isreal() returns (true if fractional part !=0)
bool longnum::isinteger() returns (true if fractional part ==0)
int longnum::bits() return num of used bits in number counted from LSB
longnum::bitget()/bitset()/bitres()/bitxor() are bit access
longnum.overflow() rounds number if there was a overflow X.FFFFFFFFFF...FFFFFFFFF??h -> (X+1).0000000000000...000000000h
int longnum::geq(x,y) is comparition |x|,|y| returns 0,1,2 for (<,>,==)
All you need to understand this code is that numbers in binary form consists of sum of powers of 2, when you need to compute 2^num then it can be rewritten as this
2^(b(-n)*2^(-n) + ... + b(+m)*2^(+m))
where n are fractional bits and m are integer bits. multiplication/division by 2 in binary form is simple bit shifting so if you put it all together you get code for exp2 similar to my. log2 is based on binaru search...changing the result bits from MSB to LSB until it matches searched value (very similar algorithm as for fast sqrt computation). hope this helps clarify things...
A lot of approaches are given in other answers. Here is something that I thought may be useful in case of integral powers.
In the case of integer power x of nx, the straightforward approach would take x-1 multiplications. In order to optimize this, we can use dynamic programming and reuse an earlier multiplication result to avoid all x multiplications. For example, in 59, we can, say, make batches of 3, i.e. calculate 53 once, get 125 and then cube 125 using the same logic, taking only 4 multiplcations in the process, instead of 8 multiplications with the straightforward way.
The question is what is the ideal size of the batch b so that the number of multiplications is minimum. So let's write the equation for this. If f(x,b) is the function representing the number of multiplications entailed in calculating nx using the above method, then
Explanation: A product of batch of p numbers will take p-1 multiplications. If we divide x multiplications into b batches, there would be (x/b)-1 multiplications required inside each batch, and b-1 multiplications required for all b batches.
Now we can calculate the first derivative of this function with respect to b and equate it to 0 to get the b for the least number of multiplications.
Now put back this value of b into the function f(x,b) to get the least number of multiplications:
For all positive x, this value is lesser than the multiplications by the straightforward way.
maybe you can use taylor series expansion. the Taylor series of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor series are equal near this point. Taylor's series are named after Brook Taylor who introduced them in 1715.

Can I rely on this to judge a square number in C++?

Can I rely on
sqrt((float)a)*sqrt((float)a)==a
or
(int)sqrt((float)a)*(int)sqrt((float)a)==a
to check whether a number is a perfect square? Why or why not?
int a is the number to be judged. I'm using Visual Studio 2005.
Edit: Thanks for all these rapid answers. I see that I can't rely on float type comparison. (If I wrote as above, will the last a be cast to float implicitly?) If I do it like
(int)sqrt((float)a)*(int)sqrt((float)a) - a < e
How small should I take that e value?
Edit2: Hey, why don't we leave the comparison part aside, and decide whether the (int) is necessary? As I see, with it, the difference might be great for squares; but without it, the difference might be small for non-squares. Perhaps neither will do. :-(
Actually, this is not a C++, but a math question.
With floating point numbers, you should never rely on equality. Where you would test a == b, just test against abs(a - b) < eps, where eps is a small number (e.g. 1E-6) that you would treat as a good enough approximation.
If the number you are testing is an integer, you might be interested in the Wikipedia article about Integer square root
EDIT:
As Krugar said, the article I linked does not answer anything. Sure, there is no direct answer to your question there, phoenie. I just thought that the underlying problem you have is floating point precision and maybe you wanted some math background to your problem.
For the impatient, there is a link in the article to a lengthy discussion about implementing isqrt. It boils down to the code karx11erx posted in his answer.
If you have integers which do not fit into an unsigned long, you can modify the algorithm yourself.
If you don't want to rely on float precision then you can use the following code that uses integer math.
The Isqrt is taken from here and is O(log n)
// Finds the integer square root of a positive number
static int Isqrt(int num)
{
if (0 == num) { return 0; } // Avoid zero divide
int n = (num / 2) + 1; // Initial estimate, never low
int n1 = (n + (num / n)) / 2;
while (n1 < n)
{
n = n1;
n1 = (n + (num / n)) / 2;
} // end while
return n;
} // end Isqrt()
static bool IsPerfectSquare(int num)
{
return Isqrt(num) * Isqrt(num) == num;
}
Not to do the same calculation twice I would do it with a temporary number:
int b = (int)sqrt((float)a);
if((b*b) == a)
{
//perfect square
}
edit:
dav made a good point. instead of relying on the cast you'll need to round off the float first
so it should be:
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a)
{
//perfect square
}
Your question has already been answered, but here is a working solution.
Your 'perfect squares' are implicitly integer values, so you could easily solve floating point format related accuracy problems by using some integer square root function to determine the integer square root of the value you want to test. That function will return the biggest number r for a value v where r * r <= v. Once you have r, you simply need to test whether r * r == v.
unsigned short isqrt (unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
for (int i = 16; i; i--) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
if (root < rem)
rem -= ++root;
}
return (unsigned short) (root >> 1);
}
bool PerfectSquare (unsigned long a)
{
unsigned short r = isqrt (a);
return r * r == a;
}
I didn't follow the formula, I apologize.
But you can easily check if a floating point number is an integer by casting it to an integer type and compare the result against the floating point number. So,
bool isSquare(long val) {
double root = sqrt(val);
if (root == (long) root)
return true;
else return false;
}
Naturally this is only doable if you are working with values that you know will fit within the integer type range. But being that the case, you can solve the problem this way, saving you the inherent complexity of a mathematical formula.
As reinier says, you need to add 0.5 to make sure it rounds to the nearest integer, so you get
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a) /* perfect square */
For this to work, b has to be (exactly) equal to the square root of a if a is a perfect square. However, I don't think you can guarantee this. Suppose that int is 64 bits and float is 32 bits (I think that's allowed). Then a can be of the order 2^60, so its square root is of order 2^30. However, a float only stores 24 bits in the significand, so the rounding error is of order 2^(30-24) = 2^6. This is larger to 1, so b may contain the wrong integer. For instance, I think that the above code does not identify a = (2^30+1)^2 as a perfect square.
I would do.
// sqrt always returns positive value. So casting to int is equivalent to floor()
int down = static_cast<int>(sqrt(value));
int up = down+1; // This is the ceil(sqrt(value))
// Because of rounding problems I would test the floor() and ceil()
// of the value returned from sqrt().
if (((down*down) == value) || ((up*up) == value))
{
// We have a winner.
}
The more obvious, if slower -- O(sqrt(n)) -- way:
bool is_perfect_square(int i) {
int d = 1;
for (int x = 0; x <= i; x += d, d += 2) {
if (x == i) return true;
}
return false;
}
While others have noted that you should not test for equality with floats, I think you are missing out on chances to take advantage of the properties of perfect squares. First there is no point in re-squaring the calculated root. If a is a perfect square then sqrt(a) is an integer and you should check:
b = sqrt((float)a)
b - floor(b) < e
where e is set sufficiently small. There are also a number of integers that you can cross of as non-square before taking the square root. Checking Wikipedia you can see some necessary conditions for a to be square:
A square number can only end with
digits 00,1,4,6,9, or 25 in base 10
Another simple check would be to see that a % 4 == 1 or 0 before taking the root since:
Squares of even numbers are even,
since (2n)^2 = 4n^2.
Squares of odd
numbers are odd, since (2n + 1)^2 =
4(n^2 + n) + 1.
These would essentially eliminate half of the integers before taking any roots.
The cleanest solution is to use an integer sqrt routine, then do:
bool isSquare( unsigned int a ) {
unsigned int s = isqrt( a );
return s * s == a;
}
This will work in the full int range and with perfect precision. A few cases:
a = 0, s = 0, s * s = 0 (add an exception if you don't want to treat 0 as square)
a = 1, s = 1, s * s = 1
a = 2, s = 1, s * s = 1
a = 3, s = 1, s * s = 1
a = 4, s = 2, s * s = 4
a = 5, s = 2, s * s = 4
Won't fail either as you approach the maximum value for your int size. E.g. for 32-bit ints:
a = 0x40000000, s = 0x00008000, s * s = 0x40000000
a = 0xFFFFFFFF, s = 0x0000FFFF, s * s = 0xFFFE0001
Using floats you run into a number of issues. You may find that sqrt( 4 ) = 1.999999..., and similar problems, although you can round-to-nearest instead of using floor().
Worse though, a float has only 24 significant bits which means you can't cast any int larger than 2^24-1 to a float without losing precision, which introduces false positives/negatives. Using doubles for testing 32-bit ints, you should be fine, though.
But remember to cast the result of the floating-point sqrt back to an int and compare the result to the original int. Comparisons between floats are never a good idea; even for square values of x in a limited range, there is no guarantee that sqrt( x ) * sqrt( x ) == x, or that sqrt( x * x) = x.
basics first:
if you (int) a number in a calculation it will remove ALL post-comma data. If I remember my C correctly, if you have an (int) in any calculation (+/-*) it will automatically presume int for all other numbers.
So in your case you want float on every number involved, otherwise you will loose data:
sqrt((float)a)*sqrt((float)a)==(float)a
is the way you want to go
Floating point math is inaccurate by nature.
So consider this code:
int a=35;
float conv = (float)a;
float sqrt_a = sqrt(conv);
if( sqrt_a*sqrt_a == conv )
printf("perfect square");
this is what will happen:
a = 35
conv = 35.000000
sqrt_a = 5.916079
sqrt_a*sqrt_a = 34.999990734
this is amply clear that sqrt_a^2 is not equal to a.