Fastest way to get the integer part of sqrt(n)? - c++
As we know if n is not a perfect square, then sqrt(n) would not be an integer. Since I need only the integer part, I feel that calling sqrt(n) wouldn't be that fast, as it takes time to calculate the fractional part also.
So my question is,
Can we get only the integer part of sqrt(n) without calculating the actual value of sqrt(n)? The algorithm should be faster than sqrt(n) (defined in <math.h> or <cmath>)?
If possible, you can write the code in asm block also.
I would try the Fast Inverse Square Root trick.
It's a way to get a very good approximation of 1/sqrt(n) without any branch, based on some bit-twiddling so not portable (notably between 32-bits and 64-bits platforms).
Once you get it, you just need to inverse the result, and takes the integer part.
There might be faster tricks, of course, since this one is a bit of a round about.
EDIT: let's do it!
First a little helper:
// benchmark.h
#include <sys/time.h>
template <typename Func>
double benchmark(Func f, size_t iterations)
{
f();
timeval a, b;
gettimeofday(&a, 0);
for (; iterations --> 0;)
{
f();
}
gettimeofday(&b, 0);
return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
(a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}
Then the main body:
#include <iostream>
#include <cmath>
#include "benchmark.h"
class Sqrt
{
public:
Sqrt(int n): _number(n) {}
int operator()() const
{
double d = _number;
return static_cast<int>(std::sqrt(d) + 0.5);
}
private:
int _number;
};
// http://www.codecodex.com/wiki/Calculate_an_integer_square_root
class IntSqrt
{
public:
IntSqrt(int n): _number(n) {}
int operator()() const
{
int remainder = _number;
if (remainder < 0) { return 0; }
int place = 1 <<(sizeof(int)*8 -2);
while (place > remainder) { place /= 4; }
int root = 0;
while (place)
{
if (remainder >= root + place)
{
remainder -= root + place;
root += place*2;
}
root /= 2;
place /= 4;
}
return root;
}
private:
int _number;
};
// http://en.wikipedia.org/wiki/Fast_inverse_square_root
class FastSqrt
{
public:
FastSqrt(int n): _number(n) {}
int operator()() const
{
float number = _number;
float x2 = number * 0.5F;
float y = number;
long i = *(long*)&y;
//i = (long)0x5fe6ec85e7de30da - (i >> 1);
i = 0x5f3759df - (i >> 1);
y = *(float*)&i;
y = y * (1.5F - (x2*y*y));
y = y * (1.5F - (x2*y*y)); // let's be precise
return static_cast<int>(1/y + 0.5f);
}
private:
int _number;
};
int main(int argc, char* argv[])
{
if (argc != 3) {
std::cerr << "Usage: %prog integer iterations\n";
return 1;
}
int n = atoi(argv[1]);
int it = atoi(argv[2]);
assert(Sqrt(n)() == IntSqrt(n)() &&
Sqrt(n)() == FastSqrt(n)() && "Different Roots!");
std::cout << "sqrt(" << n << ") = " << Sqrt(n)() << "\n";
double time = benchmark(Sqrt(n), it);
double intTime = benchmark(IntSqrt(n), it);
double fastTime = benchmark(FastSqrt(n), it);
std::cout << "Number iterations: " << it << "\n"
"Sqrt computation : " << time << "\n"
"Int computation : " << intTime << "\n"
"Fast computation : " << fastTime << "\n";
return 0;
}
And the results:
sqrt(82) = 9
Number iterations: 4096
Sqrt computation : 56
Int computation : 217
Fast computation : 119
// Note had to tweak the program here as Int here returns -1 :/
sqrt(2147483647) = 46341 // real answer sqrt(2 147 483 647) = 46 340.95
Number iterations: 4096
Sqrt computation : 57
Int computation : 313
Fast computation : 119
Where as expected the Fast computation performs much better than the Int computation.
Oh, and by the way, sqrt is faster :)
Edit: this answer is foolish - use (int) sqrt(i)
After profiling with proper settings (-march=native -m64 -O3) the above was a lot faster.
Alright, a bit old question, but the "fastest" answer has not been given yet. The fastest (I think) is the Binary Square Root algorithm, explained fully in this Embedded.com article.
It basicly comes down to this:
unsigned short isqrt(unsigned long a) {
unsigned long rem = 0;
int root = 0;
int i;
for (i = 0; i < 16; i++) {
root <<= 1;
rem <<= 2;
rem += a >> 30;
a <<= 2;
if (root < rem) {
root++;
rem -= root;
root++;
}
}
return (unsigned short) (root >> 1);
}
On my machine (Q6600, Ubuntu 10.10) I profiled by taking the square root of the numbers 1-100000000. Using iqsrt(i) took 2750 ms. Using (unsigned short) sqrt((float) i) took 3600ms. This was done using g++ -O3. Using the -ffast-math compile option the times were 2100ms and 3100ms respectively. Note this is without using even a single line of assembler so it could probably still be much faster.
The above code works for both C and C++ and with minor syntax changes also for Java.
What works even better for a limited range is a binary search. On my machine this blows the version above out of the water by a factor 4. Sadly it's very limited in range:
#include <stdint.h>
const uint16_t squares[] = {
0, 1, 4, 9,
16, 25, 36, 49,
64, 81, 100, 121,
144, 169, 196, 225,
256, 289, 324, 361,
400, 441, 484, 529,
576, 625, 676, 729,
784, 841, 900, 961,
1024, 1089, 1156, 1225,
1296, 1369, 1444, 1521,
1600, 1681, 1764, 1849,
1936, 2025, 2116, 2209,
2304, 2401, 2500, 2601,
2704, 2809, 2916, 3025,
3136, 3249, 3364, 3481,
3600, 3721, 3844, 3969,
4096, 4225, 4356, 4489,
4624, 4761, 4900, 5041,
5184, 5329, 5476, 5625,
5776, 5929, 6084, 6241,
6400, 6561, 6724, 6889,
7056, 7225, 7396, 7569,
7744, 7921, 8100, 8281,
8464, 8649, 8836, 9025,
9216, 9409, 9604, 9801,
10000, 10201, 10404, 10609,
10816, 11025, 11236, 11449,
11664, 11881, 12100, 12321,
12544, 12769, 12996, 13225,
13456, 13689, 13924, 14161,
14400, 14641, 14884, 15129,
15376, 15625, 15876, 16129,
16384, 16641, 16900, 17161,
17424, 17689, 17956, 18225,
18496, 18769, 19044, 19321,
19600, 19881, 20164, 20449,
20736, 21025, 21316, 21609,
21904, 22201, 22500, 22801,
23104, 23409, 23716, 24025,
24336, 24649, 24964, 25281,
25600, 25921, 26244, 26569,
26896, 27225, 27556, 27889,
28224, 28561, 28900, 29241,
29584, 29929, 30276, 30625,
30976, 31329, 31684, 32041,
32400, 32761, 33124, 33489,
33856, 34225, 34596, 34969,
35344, 35721, 36100, 36481,
36864, 37249, 37636, 38025,
38416, 38809, 39204, 39601,
40000, 40401, 40804, 41209,
41616, 42025, 42436, 42849,
43264, 43681, 44100, 44521,
44944, 45369, 45796, 46225,
46656, 47089, 47524, 47961,
48400, 48841, 49284, 49729,
50176, 50625, 51076, 51529,
51984, 52441, 52900, 53361,
53824, 54289, 54756, 55225,
55696, 56169, 56644, 57121,
57600, 58081, 58564, 59049,
59536, 60025, 60516, 61009,
61504, 62001, 62500, 63001,
63504, 64009, 64516, 65025
};
inline int isqrt(uint16_t x) {
const uint16_t *p = squares;
if (p[128] <= x) p += 128;
if (p[ 64] <= x) p += 64;
if (p[ 32] <= x) p += 32;
if (p[ 16] <= x) p += 16;
if (p[ 8] <= x) p += 8;
if (p[ 4] <= x) p += 4;
if (p[ 2] <= x) p += 2;
if (p[ 1] <= x) p += 1;
return p - squares;
}
A 32 bit version can be downloaded here: https://gist.github.com/3481770
While I suspect you can find a plenty of options by searching for "fast integer square root", here are some potentially-new ideas that might work well (each independent, or maybe you can combine them):
Make a static const array of all the perfect squares in the domain you want to support, and perform a fast branchless binary search on it. The resulting index in the array is the square root.
Convert the number to floating point and break it into mantissa and exponent. Halve the exponent and multiply the mantissa by some magic factor (your job to find it). This should be able to give you a very close approximation. Include a final step to adjust it if it's not exact (or use it as a starting point for the binary search above).
If you don't mind an approximation, how about this integer sqrt function I cobbled together.
int sqrti(int x)
{
union { float f; int x; } v;
// convert to float
v.f = (float)x;
// fast aprox sqrt
// assumes float is in IEEE 754 single precision format
// assumes int is 32 bits
// b = exponent bias
// m = number of mantissa bits
v.x -= 1 << 23; // subtract 2^m
v.x >>= 1; // divide by 2
v.x += 1 << 29; // add ((b + 1) / 2) * 2^m
// convert to int
return (int)v.f;
}
It uses the algorithm described in this Wikipedia article.
On my machine it's almost twice as fast as sqrt :)
To do integer sqrt you can use this specialization of newtons method:
Def isqrt(N):
a = 1
b = N
while |a-b| > 1
b = N / a
a = (a + b) / 2
return a
Basically for any x the sqrt lies in the range (x ... N/x), so we just bisect that interval at every loop for the new guess. Sort of like binary search but it converges must faster.
This converges in O(loglog(N)) which is very fast. It also doesn't use floating point at all, and it will also work well for arbitrary precision integers.
This is so short that it 99% inlines:
static inline int sqrtn(int num) {
int i = 0;
__asm__ (
"pxor %%xmm0, %%xmm0\n\t" // clean xmm0 for cvtsi2ss
"cvtsi2ss %1, %%xmm0\n\t" // convert num to float, put it to xmm0
"sqrtss %%xmm0, %%xmm0\n\t" // square root xmm0
"cvttss2si %%xmm0, %0" // float to int
:"=r"(i):"r"(num):"%xmm0"); // i: result, num: input, xmm0: scratch register
return i;
}
Why clean xmm0? Documentation of cvtsi2ss
The destination operand is an XMM register. The result is stored in the low doubleword of the destination operand, and the upper three doublewords are left unchanged.
GCC Intrinsic version (runs only on GCC):
#include <xmmintrin.h>
int sqrtn2(int num) {
register __v4sf xmm0 = {0, 0, 0, 0};
xmm0 = __builtin_ia32_cvtsi2ss(xmm0, num);
xmm0 = __builtin_ia32_sqrtss(xmm0);
return __builtin_ia32_cvttss2si(xmm0);
}
Intel Intrinsic version (tested on GCC, Clang, ICC):
#include <xmmintrin.h>
int sqrtn2(int num) {
register __m128 xmm0 = _mm_setzero_ps();
xmm0 = _mm_cvt_si2ss(xmm0, num);
xmm0 = _mm_sqrt_ss(xmm0);
return _mm_cvtt_ss2si(xmm0);
}
^^^^ All of them require SSE 1 (not even SSE 2).
Note: This is exactly how GCC calculates (int) sqrt((float) num) with -Ofast. If you want higher accuracy for larger i, then we can calculate (int) sqrt((double) num) (as noted by Gumby The Green in the comments):
static inline int sqrtn(int num) {
int i = 0;
__asm__ (
"pxor %%xmm0, %%xmm0\n\t"
"cvtsi2sd %1, %%xmm0\n\t"
"sqrtsd %%xmm0, %%xmm0\n\t"
"cvttsd2si %%xmm0, %0"
:"=r"(i):"r"(num):"%xmm0");
return i;
}
or
#include <xmmintrin.h>
int sqrtn2(int num) {
register __v2df xmm0 = {0, 0};
xmm0 = __builtin_ia32_cvtsi2sd(xmm0, num);
xmm0 = __builtin_ia32_sqrtsd(xmm0);
return __builtin_ia32_cvttsd2si(xmm0);
}
The following solution computes the integer part, meaning floor(sqrt(x)) exactly, with no rounding errors.
Problems With Other Approaches
using float or double is neither portable nor precise enough
#orlp's isqrt gives insane results like isqrt(100) = 15
approaches based on huge lookup tables are not practical beyond 32 bits
using a fast inverse sqrt is very imprecise, you're better off using sqrtf
Newton's approach requires expensive integer division and a good initial guess
My Approach
Mine is based on the bit-guessing approach proposed on Wikipedia. Unfortunately the pseudo-code provided on Wikipedia has some errors so I had to make some adjustments:
// C++20 also provides std::bit_width in its <bit> header
unsigned char bit_width(unsigned long long x) {
return x == 0 ? 1 : 64 - __builtin_clzll(x);
}
template <typename Int, std::enable_if_t<std::is_unsigned<Int, int = 0>>
Int sqrt(const Int n) {
unsigned char shift = bit_width(n);
shift += shift & 1; // round up to next multiple of 2
Int result = 0;
do {
shift -= 2;
result <<= 1; // make space for the next guessed bit
result |= 1; // guess that the next bit is 1
result ^= result * result > (n >> shift); // revert if guess too high
} while (shift != 0);
return result;
}
bit_width can be evaluated in constant time and the loop will iterate at most ceil(bit_width / 2) times. So even for a 64-bit integer, this will be at worst 32 iterations of basic arithmetic and bitwise operations.
The compile output is only around 20 instructions.
Performance
I have benchmarked my methods against float-bases ones by generating inputs uniformly. Note that in the real world most inputs would be much closer to zero than to std::numeric_limits<...>::max().
for uint32_t this performs about 25x worse than using std::sqrt(float)
for uint64_t this performs about 30x worse than using std::sqrt(double)
Accuracy
This method is always perfectly accurate, unlike approaches using floating point math.
Using sqrtf can provide incorrect rounding in the [228, 232) range. For example, sqrtf(0xffffffff) = 65536, when the square root is actually 65535.99999.
Double precision doesn't work consistently for the [260, 264) range. For example, sqrt(0x3fff...) = 2147483648, when the square root is actually 2147483647.999999.
The only thing that covers all 64-bit integers is x86 extended precision long double, simply because it can fit an entire 64-bit integer.
Conclusion
As I said, this the only solution that handles all inputs correctly, avoids integer division and doesn't require lookup tables.
In summary, if you need a method that is independent of precision and doesn't require gigantic lookup tables, this is your only option.
It might be especially useful in a constexpr context where performance isn't critical and where it could be much more important to get a 100% accurate result.
Alternative Approach Using Newton's Method
Newton's method can be quite fast when starting with a good guess. For our guess, we will round down to the next power of 2 and compute the square root in constant time. For any number 2x, we can obtain the square root using 2x/2.
template <typename Int, std::enable_if_t<std::is_unsigned_v<Int>, int> = 0>
Int sqrt_guess(const Int n)
{
Int log2floor = bit_width(n) - 1;
// sqrt(x) is equivalent to pow(2, x / 2 = x >> 1)
// pow(2, x) is equivalent to 1 << x
return 1 << (log2floor >> 1);
}
Note that this is not exactly 2x/2 because we lost some precision during the rightshift. Instead it is 2floor(x/2).
Also note that sqrt_guess(0) = 1 which is actually necessary to avoid division by zero in the first iteration:
template <typename Int, std::enable_if_t<std::is_unsigned_v<Int>, int> = 0>
Int sqrt_newton(const Int n)
{
Int a = sqrt_guess(n);
Int b = n;
// compute unsigned difference
while (std::max(a, b) - std::min(a, b) > 1) {
b = n / a;
a = (a + b) / 2;
}
// a is now either floor(sqrt(n)) or ceil(sqrt(n))
// we decrement in the latter case
// this is overflow-safe as long as we start with a lower bound guess
return a - (a * a > n);
}
This alternative approach performs roughly equivalent to the first proposal, but is usually a few percentage points faster. However, it heavily relies on efficient hardware division and result can vary heavily.
The use of sqrt_guess makes a huge difference. It is roughly five times faster than using 1 as the initial guess.
In many cases, even exact integer sqrt value is not needed, enough having good approximation of it. (For example, it often happens in DSP optimization, when 32-bit signal should be compressed to 16-bit, or 16-bit to 8-bit, without loosing much precision around zero).
I've found this useful equation:
k = ceil(MSB(n)/2); - MSB(n) is the most significant bit of "n"
sqrt(n) ~= 2^(k-2)+(2^(k-1))*n/(2^(2*k))); - all multiplications and divisions here are very DSP-friendly, as they are only 2^k.
This equation generates smooth curve (n, sqrt(n)), its values are not very much different from real sqrt(n) and thus can be useful when approximate accuracy is enough.
Why nobody suggests the quickest method?
If:
the range of numbers is limited
memory consumption is not crucial
application launch time is not critical
then create int[MAX_X] filled (on launch) with sqrt(x) (you don't need to use the function sqrt() for it).
All these conditions fit my program quite well.
Particularly, an int[10000000] array is going to consume 40MB.
What's your thoughts on this?
On my computer with gcc, with -ffast-math, converting a 32-bit integer to float and using sqrtf takes 1.2 s per 10^9 ops (without -ffast-math it takes 3.54 s).
The following algorithm uses 0.87 s per 10^9 at the expense of some accuracy: errors can be as much as -7 or +1 although the RMS error is only 0.79:
uint16_t SQRTTAB[65536];
inline uint16_t approxsqrt(uint32_t x) {
const uint32_t m1 = 0xff000000;
const uint32_t m2 = 0x00ff0000;
if (x&m1) {
return SQRTTAB[x>>16];
} else if (x&m2) {
return SQRTTAB[x>>8]>>4;
} else {
return SQRTTAB[x]>>8;
}
}
The table is constructed using:
void maketable() {
for (int x=0; x<65536; x++) {
double v = x/65535.0;
v = sqrt(v);
int y = int(v*65535.0+0.999);
SQRTTAB[x] = y;
}
}
I found that refining the bisection using further if statements does improve accuracy, but it also slows things down to the point that sqrtf is faster, at least with -ffast-math.
Or just do a binary search, cant write a simpler version imo:
uint16_t sqrti(uint32_t num)
{
uint16_t ret = 0;
for(int32_t i = 15; i >= 0; i--)
{
uint16_t temp = ret | (1 << i);
if(temp * temp <= num)
{
ret = temp;
}
}
return ret;
}
If you need performance on computing square root, I guess you will compute a lot of them.
Then why not caching the answer? I don't know the range for N in your case, nor if you will compute many times the square root of the same integer, but if yes, then you can cache the result each time your method is called (in an array would be the most efficient if not too large).
This is an addition for those in need of a precide square root for very large integers. The trick is to leverage the fast floating point square root of modern processors and to fix round-off errors.
template <typename T>
T preciseIntegerSqrt(T n)
{
if (sizeof(T) <= 4)
{
return std::sqrt((double)n);
}
else if (sizeof(T) <= 8)
{
T r = std::sqrt((double)n);
return r - (r*r-1 >= n);
}
else
{
if (n == 0) return 0;
T r = 0;
for (T b = (T(1)) << ((std::bit_width(n)-1) / 2); b != 0; b >>= 1)
{
T const k = (b + 2*r) * b;
r |= (n >= k) * b;
n -= (n >= k) * k;
}
return r;
}
}
Explanation: Integers of up to 32 bits do not need a correction, since they can be represented precisely as double-precision floating point numbers. 64-bit integers get along with a very cheap correction. For the general case, refer to Jan Schultke's excellent answer. The code provided here is very slightly faster that that one (10% on my machine, may vary with integer type and hardware).
Related
Correct way to find nth root using pow() in c++
I have to find nth root of numbers that can be as large as 10^18, with n as large as 10^4. I know using pow() we can find the nth roots using, x = (long int)(1e-7 + pow(number, 1.0 / n)) But this is giving wrong answers on online programming judges, but on all the cases i have taken, it is giving correct results. Is there something wrong with this method for the given constraints Note: nth root here means the largest integer whose nth power is less than or equal to the given number, i.e., largest 'x' for which x^n <= number. Following the answers, i know this approach is wrong, then what is the way i should do it?
You can just use x = (long int)pow(number, 1.0 / n) Given the high value of n, most answers will be 1. UPDATE: Following the OP comment, this approach is indeed flawed, because in most cases 1/n does not have an exact floating-point representation and the floor of the 1/n-th power can be off by one. And rounding is not better solution, it can make the root off by one in excess. Another problem is that values up to 10^18 cannot be represented exactly using double precision, whereas 64 bits ints do. My proposal: 1) truncate the 11 low order bits of number before the (implicit) cast to double, to avoid rounding up by the FP unit (unsure if this is useful). 2) use the pow function to get an inferior estimate of the n-th root, let r. 3) compute the n-th power of r+1 using integer arithmetic only (by repeated squaring). 4) the solution is r+1 rather than r in case that the n-th power fits. There remains a possibility that the FP unit rounds up when computing 1/n, leading to a slightly too large result. I doubt that this "too large" can get as large as one unit in the final result, but this should be checked.
I think I finally understood your problem. All you want to do is raise a value, say X, to the reciprocal of a number, say n (i.e., find ⁿ√X̅), and round down. If you then raise that answer to the n-th power, it will never be larger than your original X. The problem is that the computer sometimes runs into rounding error. #include <cmath> long find_nth_root(double X, int n) { long nth_root = std::trunc(std::pow(X, 1.0 / n)); // because of rounding error, it's possible that nth_root + 1 is what we actually want; let's check if (std::pow(nth_root + 1, n) <= X) { return nth_root + 1; } return nth_root; } Of course, the original question was to find the largest integer, Y, that satisfies the equation X ≤ Yⁿ. That's easy enough to write: long find_nth_root(double x, int d) { long i = 0; for (; std::pow(i + 1, d) <= x; ++i) { } return i; } This will probably run faster than you'd expect. But you can do better with a binary search: #include <cmath> long find_nth_root(double x, int d) { long low = 0, high = 1; while (std::pow(high, d) <= x) { low = high; high *= 2; } while (low != high - 1) { long step = (high - low) / 2; long candidate = low + step; double value = std::pow(candidate, d); if (value == x) { return candidate; } if (value < x) { low = candidate; continue; } high = candidate; } return low; }
I use this routine I wrote. It's the faster of the ones I've seen here. It also handles up to 64 bits. BTW, n1 is the input number. for (n3 = 0; ((mnk) < n1) ; n3+=0.015625, nmrk++) { mk += 0.0073125; dad += 0.00390625; mnk = pow(n1, 1.0/(mk+n3+dad)); mnk = pow(mnk, (mk+n3+dad)); } Although not always perfect, it does come the closest.
You can try this to get the nth_root with unsigned in C : // return a number that, when multiplied by itself nth times, makes N. unsigned nth_root(const unsigned n, const unsigned nth) { unsigned a = n, c, d, r = nth ? n + (n > 1) : n == 1 ; for (; a < r; c = a + (nth - 1) * r, a = c / nth) for (r = a, a = n, d = nth - 1; d && (a /= r); --d); return r; } Yes it does not include <math.h>, example of output : 24 == (int) pow(15625, 1.0/3) 25 == nth_root(15625, 3) 0 == nth_root(0, 0) 1 == nth_root(1, 0) 4 == nth_root(4096, 6) 13 == nth_root(18446744073709551614, 17) // 64-bit 20 digits 11 == nth_root(340282366920938463463374607431768211454, 37) // 128-bit 39 digits The default guess is the variable a, set to n.
single-word division algorithm
I develop software for embedded platform and need a single-word division algorithm. The problem is as follows: given a large integer represented by a sequence of 32-bit words (can be many), we need to divide it by another 32-bit word, i.e. compute the quotient (also large integer) and the remainder (32-bits). Certainly, If I were developing this algorithm on x86, I could simply take GNU MP but this library is way too large for embdedde platform. Furthermore, our processor does not have hardware integer divider (integer division is performed in the software). However the processor has quite fast FPU, so the trick is to use floating-point arithmetic wherever possible. Any ideas how to implement this ?
Sounds like a classic optimization. Instead of dividing by D, multiply by 0x100000000/D and then divide by 0x100000000. The latter is just a wordshift, i.e. trivial. Calculating the multiplier is a bit harder, but not a lot. See also this detailed article for a far more detailed background.
Take a look at this one: the algorithm divides an integer a[0..n-1] by a single word 'c' using floating-point for 64x32->32 division. The limbs of the quotient 'q' are just printed in a loop, you can save then in an array if you like. Note that you don't need GMP to run the algorithm - I use it just to compare the results. #include <gmp.h> // divides a multi-precision integer a[0..n-1] by a single word c void div_by_limb(const unsigned *a, unsigned n, unsigned c) { typedef unsigned long long uint64; unsigned c_norm = c, sh = 0; while((c_norm & 0xC0000000) == 0) { // make sure the 2 MSB are set c_norm <<= 1; sh++; } // precompute the inverse of 'c' double inv1 = 1.0 / (double)c_norm, inv2 = 1.0 / (double)c; unsigned i, r = 0; printf("\nquotient: "); // quotient is printed in a loop for(i = n - 1; (int)i >= 0; i--) { // start from the most significant digit unsigned u1 = r, u0 = a[i]; union { struct { unsigned u0, u1; }; uint64 x; } s = {u0, u1}; // treat [u1, u0] as 64-bit int // divide a 2-word number [u1, u0] by 'c_norm' using floating-point unsigned q = floor((double)s.x * inv1), q2; r = u0 - q * c_norm; // divide again: this time by 'c' q2 = floor((double)r * inv2); q = (q << sh) + q2; // reconstruct the quotient printf("%x", q); } r %= c; // adjust the residue after normalization printf("; residue: %x\n", r); } int main() { mpz_t z, quo, rem; mpz_init(z); // this is a dividend mpz_set_str(z, "9999999999999999999999999999999", 10); unsigned div = 9; // this is a divisor div_by_limb((unsigned *)z->_mp_d, mpz_size(z), div); mpz_init(quo); mpz_init(rem); mpz_tdiv_qr_ui(quo, rem, z, div); // divide 'z' by 'div' gmp_printf("compare: Quo: %Zx; Rem %Zx\n", quo, rem); mpz_clear(quo); mpz_clear(rem); mpz_clear(z); return 1; }
I believe that a look-up table and Newton Raphson successive approximation is the canonical choice used by hardware designers (who generally can't afford the gates for a full hardware divide). You get to choose the trade off the between accuracy and execution time.
Calculating pow(a,b) mod n
I want to calculate ab mod n for use in RSA decryption. My code (below) returns incorrect answers. What is wrong with it? unsigned long int decrypt2(int a,int b,int n) { unsigned long int res = 1; for (int i = 0; i < (b / 2); i++) { res *= ((a * a) % n); res %= n; } if (b % n == 1) res *=a; res %=n; return res; }
You can try this C++ code. I've used it with 32 and 64-bit integers. I'm sure I got this from SO. template <typename T> T modpow(T base, T exp, T modulus) { base %= modulus; T result = 1; while (exp > 0) { if (exp & 1) result = (result * base) % modulus; base = (base * base) % modulus; exp >>= 1; } return result; } You can find this algorithm and related discussion in the literature on p. 244 of Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (2nd ed.). Wiley. ISBN 978-0-471-11709-4. Note that the multiplications result * base and base * base are subject to overflow in this simplified version. If the modulus is more than half the width of T (i.e. more than the square root of the maximum T value), then one should use a suitable modular multiplication algorithm instead - see the answers to Ways to do modulo multiplication with primitive types.
In order to calculate pow(a,b) % n to be used for RSA decryption, the best algorithm I came across is Primality Testing 1) which is as follows: int modulo(int a, int b, int n){ long long x=1, y=a; while (b > 0) { if (b%2 == 1) { x = (x*y) % n; // multiplying with base } y = (y*y) % n; // squaring the base b /= 2; } return x % n; } See below reference for more details. 1) Primality Testing : Non-deterministic Algorithms – topcoder
Usually it's something like this: while (b) { if (b % 2) { res = (res * a) % n; } a = (a * a) % n; b /= 2; } return res;
The only actual logic error that I see is this line: if (b % n == 1) which should be this: if (b % 2 == 1) But your overall design is problematic: your function performs O(b) multiplications and modulus operations, but your use of b / 2 and a * a implies that you were aiming to perform O(log b) operations (which is usually how modular exponentiation is done).
Doing the raw power operation is very costly, hence you can apply the following logic to simplify the decryption. From here, Now say we want to encrypt the message m = 7, c = m^e mod n = 7^3 mod 33 = 343 mod 33 = 13. Hence the ciphertext c = 13. To check decryption we compute m' = c^d mod n = 13^7 mod 33 = 7. Note that we don't have to calculate the full value of 13 to the power 7 here. We can make use of the fact that a = bc mod n = (b mod n).(c mod n) mod n so we can break down a potentially large number into its components and combine the results of easier, smaller calculations to calculate the final value. One way of calculating m' is as follows:- Note that any number can be expressed as a sum of powers of 2. So first compute values of 13^2, 13^4, 13^8, ... by repeatedly squaring successive values modulo 33. 13^2 = 169 ≡ 4, 13^4 = 4.4 = 16, 13^8 = 16.16 = 256 ≡ 25. Then, since 7 = 4 + 2 + 1, we have m' = 13^7 = 13^(4+2+1) = 13^4.13^2.13^1 ≡ 16 x 4 x 13 = 832 ≡ 7 mod 33
Are you trying to calculate (a^b)%n, or a^(b%n) ? If you want the first one, then your code only works when b is an even number, because of that b/2. The "if b%n==1" is incorrect because you don't care about b%n here, but rather about b%2. If you want the second one, then the loop is wrong because you're looping b/2 times instead of (b%n)/2 times. Either way, your function is unnecessarily complex. Why do you loop until b/2 and try to multiply in 2 a's each time? Why not just loop until b and mulitply in one a each time. That would eliminate a lot of unnecessary complexity and thus eliminate potential errors. Are you thinking that you'll make the program faster by cutting the number of times through the loop in half? Frankly, that's a bad programming practice: micro-optimization. It doesn't really help much: You still multiply by a the same number of times, all you do is cut down on the number of times testing the loop. If b is typically small (like one or two digits), it's not worth the trouble. If b is large -- if it can be in the millions -- then this is insufficient, you need a much more radical optimization. Also, why do the %n each time through the loop? Why not just do it once at the end?
Calculating pow(a,b) mod n A key problem with OP's code is a * a. This is int overflow (undefined behavior) when a is large enough. The type of res is irrelevant in the multiplication of a * a. The solution is to ensure either: the multiplication is done with 2x wide math or with modulus n, n*n <= type_MAX + 1 There is no reason to return a wider type than the type of the modulus as the result is always represent by that type. // unsigned long int decrypt2(int a,int b,int n) int decrypt2(int a,int b,int n) Using unsigned math is certainly more suitable for OP's RSA goals. Also see Modular exponentiation without range restriction // (a^b)%n // n != 0 // Test if unsigned long long at least 2x values bits as unsigned #if ULLONG_MAX/UINT_MAX - 1 > UINT_MAX unsigned decrypt2(unsigned a, unsigned b, unsigned n) { unsigned long long result = 1u % n; // Insure result < n, even when n==1 while (b > 0) { if (b & 1) result = (result * a) % n; a = (1ULL * a * a) %n; b >>= 1; } return (unsigned) result; } #else unsigned decrypt2(unsigned a, unsigned b, unsigned n) { // Detect if UINT_MAX + 1 < n*n if (UINT_MAX/n < n-1) { return TBD_code_with_wider_math(a,b,n); } a %= n; unsigned result = 1u % n; while (b > 0) { if (b & 1) result = (result * a) % n; a = (a * a) % n; b >>= 1; } return result; } #endif
int's are generally not enough for RSA (unless you are dealing with small simplified examples) you need a data type that can store integers up to 2256 (for 256-bit RSA keys) or 2512 for 512-bit keys, etc
Here is another way. Remember that when we find modulo multiplicative inverse of a under mod m. Then a and m must be coprime with each other. We can use gcd extended for calculating modulo multiplicative inverse. For computing ab mod m when a and b can have more than 105 digits then its tricky to compute the result. Below code will do the computing part : #include <iostream> #include <string> using namespace std; /* * May this code live long. */ long pow(string,string,long long); long pow(long long ,long long ,long long); int main() { string _num,_pow; long long _mod; cin>>_num>>_pow>>_mod; //cout<<_num<<" "<<_pow<<" "<<_mod<<endl; cout<<pow(_num,_pow,_mod)<<endl; return 0; } long pow(string n,string p,long long mod){ long long num=0,_pow=0; for(char c: n){ num=(num*10+c-48)%mod; } for(char c: p){ _pow=(_pow*10+c-48)%(mod-1); } return pow(num,_pow,mod); } long pow(long long a,long long p,long long mod){ long res=1; if(a==0)return 0; while(p>0){ if((p&1)==0){ p/=2; a=(a*a)%mod; } else{ p--; res=(res*a)%mod; } } return res; } This code works because ab mod m can be written as (a mod m)b mod m-1 mod m. Hope it helped { :)
use fast exponentiation maybe..... gives same o(log n) as that template above int power(int base, int exp,int mod) { if(exp == 0) return 1; int p=power(base, exp/2,mod); p=(p*p)% mod; return (exp%2 == 0)?p:(base * p)%mod; }
This(encryption) is more of an algorithm design problem than a programming one. The important missing part is familiarity with modern algebra. I suggest that you look for a huge optimizatin in group theory and number theory. If n is a prime number, pow(a,n-1)%n==1 (assuming infinite digit integers).So, basically you need to calculate pow(a,b%(n-1))%n; According to group theory, you can find e such that every other number is equivalent to a power of e modulo n. Therefore the range [1..n-1] can be represented as a permutation on powers of e. Given the algorithm to find e for n and logarithm of a base e, calculations can be significantly simplified. Cryptography needs a tone of math background; I'd rather be off that ground without enough background.
For my code a^k mod n in php: function pmod(a, k, n) { if (n==1) return 0; power = 1; for(i=1; i<=k; $i++) { power = (power*a) % n; } return power; }
#include <cmath> ... static_cast<int>(std::pow(a,b))%n but my best bet is you are overflowing int (IE: the number is two large for the int) on the power I had the same problem creating the exact same function.
I'm using this function: int CalculateMod(int base, int exp ,int mod){ int result; result = (int) pow(base,exp); result = result % mod; return result; } I parse the variable result because pow give you back a double, and for using mod you need two variables of type int, anyway, in a RSA decryption, you should just use integer numbers.
Quickly and safely determine random number within range
How would I quickly and safely* determine a random number within a range of 0 (inclusive) to r (exclusive)? In other words, an optimized version of rejection sampling: u32 myrand(u32 x) { u32 ret = rand(); while(ret >= x) ret = rand(); return(ret); } *By safely, I mean a uniform distribution.
Rejection sampling is the way to go if you want to have a uniform distribution on the result. It is notoriously difficult to do anything smarter. Using the modulo operator for instance results in an uneven distribution of the result values for any number that's not a power of 2. The algorithm in you post however can be improved by discarding the unnecessary most significant bits. (See below.) This is how the standard Java API implements Random.nextInt(int n): public int nextInt(int n) { [...] if ((n & -n) == n) // i.e., n is a power of 2 return (int)((n * (long)next(31)) >> 31); int bits, val; do { bits = next(31); val = bits % n; } while (bits - val + (n-1) < 0); return val; } And in the commens you can read: The algorithm is slightly tricky. It rejects values that would result in an uneven distribution (due to the fact that 231 is not divisible by n). The probability of a value being rejected depends on n. The worst case is n=230+1, for which the probability of a reject is 1/2, and the expected number of iterations before the loop terminates is 2.
u32 myrand(u32 x) { return rand() % (x+1); } Since the question has been changed to include even distribution, this would need something more like this: u32 myrand(u32 x) { assert(x <= RAND_MAX && x > 0); int numOfRanges = (RAND_MAX % x); int maxAcceptedRand = numOfRanges * x; int randNumber; do { randNumber = rand(); } while(randNumber <= maxAcceptedRand); return number / numOfRanges; }
Can I rely on this to judge a square number in C++?
Can I rely on sqrt((float)a)*sqrt((float)a)==a or (int)sqrt((float)a)*(int)sqrt((float)a)==a to check whether a number is a perfect square? Why or why not? int a is the number to be judged. I'm using Visual Studio 2005. Edit: Thanks for all these rapid answers. I see that I can't rely on float type comparison. (If I wrote as above, will the last a be cast to float implicitly?) If I do it like (int)sqrt((float)a)*(int)sqrt((float)a) - a < e How small should I take that e value? Edit2: Hey, why don't we leave the comparison part aside, and decide whether the (int) is necessary? As I see, with it, the difference might be great for squares; but without it, the difference might be small for non-squares. Perhaps neither will do. :-(
Actually, this is not a C++, but a math question. With floating point numbers, you should never rely on equality. Where you would test a == b, just test against abs(a - b) < eps, where eps is a small number (e.g. 1E-6) that you would treat as a good enough approximation. If the number you are testing is an integer, you might be interested in the Wikipedia article about Integer square root EDIT: As Krugar said, the article I linked does not answer anything. Sure, there is no direct answer to your question there, phoenie. I just thought that the underlying problem you have is floating point precision and maybe you wanted some math background to your problem. For the impatient, there is a link in the article to a lengthy discussion about implementing isqrt. It boils down to the code karx11erx posted in his answer. If you have integers which do not fit into an unsigned long, you can modify the algorithm yourself.
If you don't want to rely on float precision then you can use the following code that uses integer math. The Isqrt is taken from here and is O(log n) // Finds the integer square root of a positive number static int Isqrt(int num) { if (0 == num) { return 0; } // Avoid zero divide int n = (num / 2) + 1; // Initial estimate, never low int n1 = (n + (num / n)) / 2; while (n1 < n) { n = n1; n1 = (n + (num / n)) / 2; } // end while return n; } // end Isqrt() static bool IsPerfectSquare(int num) { return Isqrt(num) * Isqrt(num) == num; }
Not to do the same calculation twice I would do it with a temporary number: int b = (int)sqrt((float)a); if((b*b) == a) { //perfect square } edit: dav made a good point. instead of relying on the cast you'll need to round off the float first so it should be: int b = (int) (sqrt((float)a) + 0.5f); if((b*b) == a) { //perfect square }
Your question has already been answered, but here is a working solution. Your 'perfect squares' are implicitly integer values, so you could easily solve floating point format related accuracy problems by using some integer square root function to determine the integer square root of the value you want to test. That function will return the biggest number r for a value v where r * r <= v. Once you have r, you simply need to test whether r * r == v. unsigned short isqrt (unsigned long a) { unsigned long rem = 0; unsigned long root = 0; for (int i = 16; i; i--) { root <<= 1; rem = ((rem << 2) + (a >> 30)); a <<= 2; if (root < rem) rem -= ++root; } return (unsigned short) (root >> 1); } bool PerfectSquare (unsigned long a) { unsigned short r = isqrt (a); return r * r == a; }
I didn't follow the formula, I apologize. But you can easily check if a floating point number is an integer by casting it to an integer type and compare the result against the floating point number. So, bool isSquare(long val) { double root = sqrt(val); if (root == (long) root) return true; else return false; } Naturally this is only doable if you are working with values that you know will fit within the integer type range. But being that the case, you can solve the problem this way, saving you the inherent complexity of a mathematical formula.
As reinier says, you need to add 0.5 to make sure it rounds to the nearest integer, so you get int b = (int) (sqrt((float)a) + 0.5f); if((b*b) == a) /* perfect square */ For this to work, b has to be (exactly) equal to the square root of a if a is a perfect square. However, I don't think you can guarantee this. Suppose that int is 64 bits and float is 32 bits (I think that's allowed). Then a can be of the order 2^60, so its square root is of order 2^30. However, a float only stores 24 bits in the significand, so the rounding error is of order 2^(30-24) = 2^6. This is larger to 1, so b may contain the wrong integer. For instance, I think that the above code does not identify a = (2^30+1)^2 as a perfect square.
I would do. // sqrt always returns positive value. So casting to int is equivalent to floor() int down = static_cast<int>(sqrt(value)); int up = down+1; // This is the ceil(sqrt(value)) // Because of rounding problems I would test the floor() and ceil() // of the value returned from sqrt(). if (((down*down) == value) || ((up*up) == value)) { // We have a winner. }
The more obvious, if slower -- O(sqrt(n)) -- way: bool is_perfect_square(int i) { int d = 1; for (int x = 0; x <= i; x += d, d += 2) { if (x == i) return true; } return false; }
While others have noted that you should not test for equality with floats, I think you are missing out on chances to take advantage of the properties of perfect squares. First there is no point in re-squaring the calculated root. If a is a perfect square then sqrt(a) is an integer and you should check: b = sqrt((float)a) b - floor(b) < e where e is set sufficiently small. There are also a number of integers that you can cross of as non-square before taking the square root. Checking Wikipedia you can see some necessary conditions for a to be square: A square number can only end with digits 00,1,4,6,9, or 25 in base 10 Another simple check would be to see that a % 4 == 1 or 0 before taking the root since: Squares of even numbers are even, since (2n)^2 = 4n^2. Squares of odd numbers are odd, since (2n + 1)^2 = 4(n^2 + n) + 1. These would essentially eliminate half of the integers before taking any roots.
The cleanest solution is to use an integer sqrt routine, then do: bool isSquare( unsigned int a ) { unsigned int s = isqrt( a ); return s * s == a; } This will work in the full int range and with perfect precision. A few cases: a = 0, s = 0, s * s = 0 (add an exception if you don't want to treat 0 as square) a = 1, s = 1, s * s = 1 a = 2, s = 1, s * s = 1 a = 3, s = 1, s * s = 1 a = 4, s = 2, s * s = 4 a = 5, s = 2, s * s = 4 Won't fail either as you approach the maximum value for your int size. E.g. for 32-bit ints: a = 0x40000000, s = 0x00008000, s * s = 0x40000000 a = 0xFFFFFFFF, s = 0x0000FFFF, s * s = 0xFFFE0001 Using floats you run into a number of issues. You may find that sqrt( 4 ) = 1.999999..., and similar problems, although you can round-to-nearest instead of using floor(). Worse though, a float has only 24 significant bits which means you can't cast any int larger than 2^24-1 to a float without losing precision, which introduces false positives/negatives. Using doubles for testing 32-bit ints, you should be fine, though. But remember to cast the result of the floating-point sqrt back to an int and compare the result to the original int. Comparisons between floats are never a good idea; even for square values of x in a limited range, there is no guarantee that sqrt( x ) * sqrt( x ) == x, or that sqrt( x * x) = x.
basics first: if you (int) a number in a calculation it will remove ALL post-comma data. If I remember my C correctly, if you have an (int) in any calculation (+/-*) it will automatically presume int for all other numbers. So in your case you want float on every number involved, otherwise you will loose data: sqrt((float)a)*sqrt((float)a)==(float)a is the way you want to go
Floating point math is inaccurate by nature. So consider this code: int a=35; float conv = (float)a; float sqrt_a = sqrt(conv); if( sqrt_a*sqrt_a == conv ) printf("perfect square"); this is what will happen: a = 35 conv = 35.000000 sqrt_a = 5.916079 sqrt_a*sqrt_a = 34.999990734 this is amply clear that sqrt_a^2 is not equal to a.