Quickly and safely determine random number within range

Quickly and safely determine random number within range - c++

How would I quickly and safely* determine a random number within a range of 0 (inclusive) to r (exclusive)?
In other words, an optimized version of rejection sampling:
u32 myrand(u32 x)
{
u32 ret = rand();
while(ret >= x)
ret = rand();
return(ret);
}
*By safely, I mean a uniform distribution.

Rejection sampling is the way to go if you want to have a uniform distribution on the result. It is notoriously difficult to do anything smarter. Using the modulo operator for instance results in an uneven distribution of the result values for any number that's not a power of 2.
The algorithm in you post however can be improved by discarding the unnecessary most significant bits. (See below.)
This is how the standard Java API implements Random.nextInt(int n):
public int nextInt(int n) {
[...]
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do {
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
And in the commens you can read:
The algorithm is slightly tricky. It rejects values that would result in an uneven distribution (due to the fact that 231 is not divisible by n). The probability of a value being rejected depends on n. The worst case is n=230+1, for which the probability of a reject is 1/2, and the expected number of iterations before the loop terminates is 2.

u32 myrand(u32 x)
{
return rand() % (x+1);
}
Since the question has been changed to include even distribution, this would need something more like this:
u32 myrand(u32 x)
{
assert(x <= RAND_MAX && x > 0);
int numOfRanges = (RAND_MAX % x);
int maxAcceptedRand = numOfRanges * x;
int randNumber;
do
{
randNumber = rand();
}
while(randNumber <= maxAcceptedRand);
return number / numOfRanges;
}

Related

Correct way to find nth root using pow() in c++

I have to find nth root of numbers that can be as large as 10^18, with n as large as 10^4.
I know using pow() we can find the nth roots using,
x = (long int)(1e-7 + pow(number, 1.0 / n))
But this is giving wrong answers on online programming judges, but on all the cases i have taken, it is giving correct results. Is there something wrong with this method for the given constraints
Note: nth root here means the largest integer whose nth power is less than or equal to the given number, i.e., largest 'x' for which x^n <= number.
Following the answers, i know this approach is wrong, then what is the way i should do it?

You can just use
x = (long int)pow(number, 1.0 / n)
Given the high value of n, most answers will be 1.
UPDATE:
Following the OP comment, this approach is indeed flawed, because in most cases 1/n does not have an exact floating-point representation and the floor of the 1/n-th power can be off by one.
And rounding is not better solution, it can make the root off by one in excess.
Another problem is that values up to 10^18 cannot be represented exactly using double precision, whereas 64 bits ints do.
My proposal:
1) truncate the 11 low order bits of number before the (implicit) cast to double, to avoid rounding up by the FP unit (unsure if this is useful).
2) use the pow function to get an inferior estimate of the n-th root, let r.
3) compute the n-th power of r+1 using integer arithmetic only (by repeated squaring).
4) the solution is r+1 rather than r in case that the n-th power fits.
There remains a possibility that the FP unit rounds up when computing 1/n, leading to a slightly too large result. I doubt that this "too large" can get as large as one unit in the final result, but this should be checked.

I think I finally understood your problem. All you want to do is raise a value, say X, to the reciprocal of a number, say n (i.e., find ⁿ√X̅), and round down. If you then raise that answer to the n-th power, it will never be larger than your original X. The problem is that the computer sometimes runs into rounding error.
#include <cmath>
long find_nth_root(double X, int n)
{
long nth_root = std::trunc(std::pow(X, 1.0 / n));
// because of rounding error, it's possible that nth_root + 1 is what we actually want; let's check
if (std::pow(nth_root + 1, n) <= X) {
return nth_root + 1;
}
return nth_root;
}
Of course, the original question was to find the largest integer, Y, that satisfies the equation X ≤ Yⁿ. That's easy enough to write:
long find_nth_root(double x, int d)
{
long i = 0;
for (; std::pow(i + 1, d) <= x; ++i) { }
return i;
}
This will probably run faster than you'd expect. But you can do better with a binary search:
#include <cmath>
long find_nth_root(double x, int d)
{
long low = 0, high = 1;
while (std::pow(high, d) <= x) {
low = high;
high *= 2;
}
while (low != high - 1) {
long step = (high - low) / 2;
long candidate = low + step;
double value = std::pow(candidate, d);
if (value == x) {
return candidate;
}
if (value < x) {
low = candidate;
continue;
}
high = candidate;
}
return low;
}

I use this routine I wrote. It's the faster of the ones I've seen here. It also handles up to 64 bits. BTW, n1 is the input number.
for (n3 = 0; ((mnk) < n1) ; n3+=0.015625, nmrk++) {
mk += 0.0073125;
dad += 0.00390625;
mnk = pow(n1, 1.0/(mk+n3+dad));
mnk = pow(mnk, (mk+n3+dad));
}
Although not always perfect, it does come the closest.

You can try this to get the nth_root with unsigned in C :
// return a number that, when multiplied by itself nth times, makes N.
unsigned nth_root(const unsigned n, const unsigned nth) {
unsigned a = n, c, d, r = nth ? n + (n > 1) : n == 1 ;
for (; a < r; c = a + (nth - 1) * r, a = c / nth)
for (r = a, a = n, d = nth - 1; d && (a /= r); --d);
return r;
}
Yes it does not include <math.h>, example of output :
24 == (int) pow(15625, 1.0/3)
25 == nth_root(15625, 3)
0 == nth_root(0, 0)
1 == nth_root(1, 0)
4 == nth_root(4096, 6)
13 == nth_root(18446744073709551614, 17) // 64-bit 20 digits
11 == nth_root(340282366920938463463374607431768211454, 37) // 128-bit 39 digits
The default guess is the variable a, set to n.

What is the optimal algorithm for generating an unbiased random integer within a range?

In this StackOverflow question:
Generating random integer from a range
the accepted answer suggests the following formula for generating a random integer in between given min and max, with min and max being included into the range:
output = min + (rand() % (int)(max - min + 1))
But it also says that
This is still slightly biased towards lower numbers ... It's also
possible to extend it so that it removes the bias.
But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand() function, and in case if it is optimal, how to remove the bias?
EDIT:
I've just tested the while-loop algorithm suggested by #Joey against floating-point extrapolation:
static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);
to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?

The problem is that you're doing a modulo operation. This would be no problem if RAND_MAX would be evenly divisible by your modulus, but usually that is not the case. As a very contrived example, assume RAND_MAX to be 11 and your modulus to be 3. You'll get the following possible random numbers and the following resulting remainders:
0 1 2 3 4 5 6 7 8 9 10
0 1 2 0 1 2 0 1 2 0 1
As you can see, 0 and 1 are slightly more probable than 2.
One option to solve this is rejection sampling: By disallowing the numbers 9 and 10 above you can cause the resulting distribution to be uniform again. The tricky part is figuring out how to do so efficiently. A very nice example (one that took me two days to understand why it works) can be found in Java's java.util.Random.nextInt(int) method.
The reason why Java's algorithm is a little tricky is that they avoid slow operations like multiplication and division for the check. If you don't care too much you can also do it the naïve way:
int n = (int)(max - min + 1);
int remainder = RAND_MAX % n;
int x, output;
do {
x = rand();
output = x % n;
} while (x >= RAND_MAX - remainder);
return min + output;
EDIT: Corrected a fencepost error in above code, now it works as it should. I also created a little sample program (C#; taking a uniform PRNG for numbers between 0 and 15 and constructing a PRNG for numbers between 0 and 6 from it via various ways):
using System;
class Rand {
static Random r = new Random();
static int Rand16() {
return r.Next(16);
}
static int Rand7Naive() {
return Rand16() % 7;
}
static int Rand7Float() {
return (int)(Rand16() / 16.0 * 7);
}
// corrected
static int Rand7RejectionNaive() {
int n = 7, remainder = 16 % n, x, output;
do {
x = Rand16();
output = x % n;
} while (x >= 16 - remainder);
return output;
}
// adapted to fit the constraints of this example
static int Rand7RejectionJava() {
int n = 7, x, output;
do {
x = Rand16();
output = x % n;
} while (x - output + 6 > 15);
return output;
}
static void Test(Func<int> rand, string name) {
var buckets = new int[7];
for (int i = 0; i < 10000000; i++) buckets[rand()]++;
Console.WriteLine(name);
for (int i = 0; i < 7; i++) Console.WriteLine("{0}\t{1}", i, buckets[i]);
}
static void Main() {
Test(Rand7Naive, "Rand7Naive");
Test(Rand7Float, "Rand7Float");
Test(Rand7RejectionNaive, "Rand7RejectionNaive");
}
}
The result is as follows (pasted into Excel and added conditional coloring of cells so that differences are more apparent):
Now that I fixed my mistake in above rejection sampling it works as it should (before it would bias 0). As you can see, the float method isn't perfect at all, it just distributes the biased numbers differently.

The problem occurs when the number of outputs from the random number generator (RAND_MAX+1) is not evenly divisible by the desired range (max-min+1). Since there will be a consistent mapping from a random number to an output, some outputs will be mapped to more random numbers than others. This is regardless of how the mapping is done - you can use modulo, division, conversion to floating point, whatever voodoo you can come up with, the basic problem remains.
The magnitude of the problem is very small, and undemanding applications can generally get away with ignoring it. The smaller the range and the larger RAND_MAX is, the less pronounced the effect will be.
I took your example program and tweaked it a bit. First I created a special version of rand that only has a range of 0-255, to better demonstrate the effect. I made a few tweaks to rangeRandomAlg2. Finally I changed the number of "balls" to 1000000 to improve the consistency. You can see the results here: http://ideone.com/4P4HY
Notice that the floating-point version produces two tightly grouped probabilities, near either 0.101 or 0.097, nothing in between. This is the bias in action.
I think calling this "Java's algorithm" is a bit misleading - I'm sure it's much older than Java.
int rangeRandomAlg2 (int min, int max)
{
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do
{
x = rand();
} while (x >= RAND_MAX - remainder);
return min + x % n;
}

It's easy to see why this algorithm produces a biased sample. Suppose your rand() function returns uniform integers from the set {0, 1, 2, 3, 4}. If I want to use this to generate a random bit 0 or 1, I would say rand() % 2. The set {0, 2, 4} gives me 0, and the set {1, 3} gives me 1 -- so clearly I sample 0 with 60% and 1 with 40% likelihood, not uniform at all!
To fix this you have to either make sure that your desired range divides the range of the random number generator, or otherwise discard the result whenever the random number generator returns a number that's larger than the largest possible multiple of the target range.
In the above example, the target range is 2, the largest multiple that fits into the random generation range is 4, so we discard any sample that is not in the set {0, 1, 2, 3} and roll again.

By far the easiest solution is std::uniform_int_distribution<int>(min, max).

You have touched on two points involving a random integer algorithm: Is it optimal, and is it unbiased?
Optimal
There are many ways to define an "optimal" algorithm. Here we look at "optimal" algorithms in terms of the number of random bits it uses on average. In this sense, rand is a poor method to use for randomly generated numbers because, among other problems with rand(), it need not necessarily produce random bits (because RAND_MAX is not exactly specified). Instead, we will assume we have a "true" random generator that can produce unbiased and independent random bits.
In 1976, D. E. Knuth and A. C. Yao showed that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) They also gave bounds on the number of bits a given algorithm will need on average for this task. In this case, an optimal algorithm to generate integers in [0, n) uniformly, will need at least log2(n) and at most log2(n) + 2 bits on average.
There are many examples of optimal algorithms in this sense. See the following answer of mine:
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
Unbiased
However, any optimal integer generator that is also unbiased will, in general, run forever in the worst case, as also shown by Knuth and Yao. Going back to the binary tree, each one of the n outcomes labels leaves in the binary tree so that each integer in [0, n) can occur with probability 1/n. But if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—
Have an "infinite" depth, or
include "rejection" leaves at the end of the tree,
And in either case, the algorithm won't run in constant time and will run forever in the worst case. (On the other hand, when n is a power of 2, the optimal binary tree will have a finite depth and no rejection nodes.)
And for general n, there is no way to "fix" this worst case time complexity without introducing bias. For instance, modulo reductions (including the min + (rand() % (int)(max - min + 1)) in your question) are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (However, this bias may be negligible depending on the application. There are also security aspects to random integer generation, which are too complicated to discuss in this answer.)

Without loss of generality, the problem of generating random integers on [a, b] can be reduced to the problem of generating random integers on [0, s). The state of the art for generating random integers on a bounded range from a uniform PRNG is represented by the following recent publication:
Daniel Lemire,"Fast Random Integer Generation in an Interval." ACM Trans. Model. Comput. Simul. 29, 1, Article 3 (January 2019) (ArXiv draft)
Lemire shows that his algorithm provides unbiased results, and motivated by the growing popularity of very fast high-quality PRNGs such as Melissa O'Neill's PCG generators, shows how to the results can be computed fast, avoiding slow division operations almost all of the time.
An exemplary ISO-C implementation of his algorithm is shown in randint() below. Here I demonstrate it in conjunction with George Marsaglia's older KISS64 PRNG. For performance reasons, the required 64×64→128 bit unsigned multiplication is typically best implemented via machine-specific intrinsics or inline assembly that map directly to appropriate hardware instructions.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* PRNG state */
typedef struct Prng_T *Prng_T;
/* Returns uniformly distributed integers in [0, 2**64-1] */
uint64_t random64 (Prng_T);
/* Multiplies two 64-bit factors into a 128-bit product */
void umul64wide (uint64_t, uint64_t, uint64_t *, uint64_t *);
/* Generate in bias-free manner a random integer in [0, s) with Lemire's fast
algorithm that uses integer division only rarely. s must be in [0, 2**64-1].
Daniel Lemire, "Fast Random Integer Generation in an Interval," ACM Trans.
Model. Comput. Simul. 29, 1, Article 3 (January 2019)
*/
uint64_t randint (Prng_T prng, uint64_t s)
{
uint64_t x, h, l, t;
x = random64 (prng);
umul64wide (x, s, &h, &l);
if (l < s) {
t = (0 - s) % s;
while (l < t) {
x = random64 (prng);
umul64wide (x, s, &h, &l);
}
}
return h;
}
#define X86_INLINE_ASM (0)
/* Multiply two 64-bit unsigned integers into a 128 bit unsined product. Return
the least significant 64 bist of the product to the location pointed to by
lo, and the most signfiicant 64 bits of the product to the location pointed
to by hi.
*/
void umul64wide (uint64_t a, uint64_t b, uint64_t *hi, uint64_t *lo)
{
#if X86_INLINE_ASM
uint64_t l, h;
__asm__ (
"movq %2, %%rax;\n\t" // rax = a
"mulq %3;\n\t" // rdx:rax = a * b
"movq %%rax, %0;\n\t" // l = (a * b)<31:0>
"movq %%rdx, %1;\n\t" // h = (a * b)<63:32>
: "=r"(l), "=r"(h)
: "r"(a), "r"(b)
: "%rax", "%rdx");
*lo = l;
*hi = h;
#else // X86_INLINE_ASM
uint64_t a_lo = (uint64_t)(uint32_t)a;
uint64_t a_hi = a >> 32;
uint64_t b_lo = (uint64_t)(uint32_t)b;
uint64_t b_hi = b >> 32;
uint64_t p0 = a_lo * b_lo;
uint64_t p1 = a_lo * b_hi;
uint64_t p2 = a_hi * b_lo;
uint64_t p3 = a_hi * b_hi;
uint32_t cy = (uint32_t)(((p0 >> 32) + (uint32_t)p1 + (uint32_t)p2) >> 32);
*lo = p0 + (p1 << 32) + (p2 << 32);
*hi = p3 + (p1 >> 32) + (p2 >> 32) + cy;
#endif // X86_INLINE_ASM
}
/* George Marsaglia's KISS64 generator, posted to comp.lang.c on 28 Feb 2009
https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
*/
struct Prng_T {
uint64_t x, c, y, z, t;
};
struct Prng_T kiss64 = {1234567890987654321ULL, 123456123456123456ULL,
362436362436362436ULL, 1066149217761810ULL, 0ULL};
/* KISS64 state equations */
#define MWC64 (kiss64->t = (kiss64->x << 58) + kiss64->c, \
kiss64->c = (kiss64->x >> 6), kiss64->x += kiss64->t, \
kiss64->c += (kiss64->x < kiss64->t), kiss64->x)
#define XSH64 (kiss64->y ^= (kiss64->y << 13), kiss64->y ^= (kiss64->y >> 17), \
kiss64->y ^= (kiss64->y << 43))
#define CNG64 (kiss64->z = 6906969069ULL * kiss64->z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)
uint64_t random64 (Prng_T kiss64)
{
return KISS64;
}
int main (void)
{
int i;
Prng_T state = &kiss64;
for (i = 0; i < 1000; i++) {
printf ("%llu\n", randint (state, 10));
}
return EXIT_SUCCESS;
}

If you really want to get a perfect generator assuming rand() function that you have is perfect, you need to apply the method explained bellow.
We will create a random number, r, from 0 to max-min=b-1, which is then easy to move to the range that you want, just take r+min
We will create a random number where b < RAND_MAX, but the procedure can be easily adopted to have a random number for any base
PROCEDURE:
Take a random number r in its original RAND_MAX size without any truncation
Display this number in base b
Take first m=floor(log_b(RAND_MAX)) digits of this number for m random numbers from 0 to b-1
Shift each by min (i.e. r+min) to get them into the range (min,max) as you wanted
Since log_b(RAND_MAX) is not necessarily an integer, the last digit in the representation is wasted.
The original approach of just using mod (%) is mistaken exactly by
(log_b(RAND_MAX) - floor(log_b(RAND_MAX)))/ceil(log_b(RAND_MAX))
which you might agree is not that much, but if you insist on being precise, that is the procedure.

Calculating pow(a,b) mod n

I want to calculate ab mod n for use in RSA decryption. My code (below) returns incorrect answers. What is wrong with it?
unsigned long int decrypt2(int a,int b,int n)
{
unsigned long int res = 1;
for (int i = 0; i < (b / 2); i++)
{
res *= ((a * a) % n);
res %= n;
}
if (b % n == 1)
res *=a;
res %=n;
return res;
}

You can try this C++ code. I've used it with 32 and 64-bit integers. I'm sure I got this from SO.
template <typename T>
T modpow(T base, T exp, T modulus) {
base %= modulus;
T result = 1;
while (exp > 0) {
if (exp & 1) result = (result * base) % modulus;
base = (base * base) % modulus;
exp >>= 1;
}
return result;
}
You can find this algorithm and related discussion in the literature on p. 244 of
Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (2nd ed.). Wiley. ISBN 978-0-471-11709-4.
Note that the multiplications result * base and base * base are subject to overflow in this simplified version. If the modulus is more than half the width of T (i.e. more than the square root of the maximum T value), then one should use a suitable modular multiplication algorithm instead - see the answers to Ways to do modulo multiplication with primitive types.

In order to calculate pow(a,b) % n to be used for RSA decryption, the best algorithm I came across is Primality Testing 1) which is as follows:
int modulo(int a, int b, int n){
long long x=1, y=a;
while (b > 0) {
if (b%2 == 1) {
x = (x*y) % n; // multiplying with base
}
y = (y*y) % n; // squaring the base
b /= 2;
}
return x % n;
}
See below reference for more details.
1) Primality Testing : Non-deterministic Algorithms – topcoder

Usually it's something like this:
while (b)
{
if (b % 2) { res = (res * a) % n; }
a = (a * a) % n;
b /= 2;
}
return res;

The only actual logic error that I see is this line:
if (b % n == 1)
which should be this:
if (b % 2 == 1)
But your overall design is problematic: your function performs O(b) multiplications and modulus operations, but your use of b / 2 and a * a implies that you were aiming to perform O(log b) operations (which is usually how modular exponentiation is done).

Doing the raw power operation is very costly, hence you can apply the following logic to simplify the decryption.
From here,
Now say we want to encrypt the message m = 7, c = m^e mod n = 7^3 mod 33
= 343 mod 33 = 13. Hence the ciphertext c = 13.
To check decryption we compute m' = c^d mod n = 13^7 mod 33 = 7. Note
that we don't have to calculate the full value of 13 to the power 7
here. We can make use of the fact that a = bc mod n = (b mod n).(c mod
n) mod n so we can break down a potentially large number into its
components and combine the results of easier, smaller calculations to
calculate the final value.
One way of calculating m' is as follows:- Note that any number can be
expressed as a sum of powers of 2. So first compute values of 13^2,
13^4, 13^8, ... by repeatedly squaring successive values modulo 33. 13^2
= 169 ≡ 4, 13^4 = 4.4 = 16, 13^8 = 16.16 = 256 ≡ 25. Then, since 7 = 4 + 2 + 1, we have m' = 13^7 = 13^(4+2+1) = 13^4.13^2.13^1 ≡ 16 x 4 x 13 = 832
≡ 7 mod 33

Are you trying to calculate (a^b)%n, or a^(b%n) ?
If you want the first one, then your code only works when b is an even number, because of that b/2. The "if b%n==1" is incorrect because you don't care about b%n here, but rather about b%2.
If you want the second one, then the loop is wrong because you're looping b/2 times instead of (b%n)/2 times.
Either way, your function is unnecessarily complex. Why do you loop until b/2 and try to multiply in 2 a's each time? Why not just loop until b and mulitply in one a each time. That would eliminate a lot of unnecessary complexity and thus eliminate potential errors. Are you thinking that you'll make the program faster by cutting the number of times through the loop in half? Frankly, that's a bad programming practice: micro-optimization. It doesn't really help much: You still multiply by a the same number of times, all you do is cut down on the number of times testing the loop. If b is typically small (like one or two digits), it's not worth the trouble. If b is large -- if it can be in the millions -- then this is insufficient, you need a much more radical optimization.
Also, why do the %n each time through the loop? Why not just do it once at the end?

Calculating pow(a,b) mod n
A key problem with OP's code is a * a. This is int overflow (undefined behavior) when a is large enough. The type of res is irrelevant in the multiplication of a * a.
The solution is to ensure either:
the multiplication is done with 2x wide math or
with modulus n, n*n <= type_MAX + 1
There is no reason to return a wider type than the type of the modulus as the result is always represent by that type.
// unsigned long int decrypt2(int a,int b,int n)
int decrypt2(int a,int b,int n)
Using unsigned math is certainly more suitable for OP's RSA goals.
Also see Modular exponentiation without range restriction
// (a^b)%n
// n != 0
// Test if unsigned long long at least 2x values bits as unsigned
#if ULLONG_MAX/UINT_MAX - 1 > UINT_MAX
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
unsigned long long result = 1u % n; // Insure result < n, even when n==1
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (1ULL * a * a) %n;
b >>= 1;
}
return (unsigned) result;
}
#else
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
// Detect if UINT_MAX + 1 < n*n
if (UINT_MAX/n < n-1) {
return TBD_code_with_wider_math(a,b,n);
}
a %= n;
unsigned result = 1u % n;
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (a * a) % n;
b >>= 1;
}
return result;
}
#endif

int's are generally not enough for RSA (unless you are dealing with small simplified examples)
you need a data type that can store integers up to 2256 (for 256-bit RSA keys) or 2512 for 512-bit keys, etc

Here is another way. Remember that when we find modulo multiplicative inverse of a under mod m.
Then
a and m must be coprime with each other.
We can use gcd extended for calculating modulo multiplicative inverse.
For computing ab mod m when a and b can have more than 105 digits then its tricky to compute the result.
Below code will do the computing part :
#include <iostream>
#include <string>
using namespace std;
/*
* May this code live long.
*/
long pow(string,string,long long);
long pow(long long ,long long ,long long);
int main() {
string _num,_pow;
long long _mod;
cin>>_num>>_pow>>_mod;
//cout<<_num<<" "<<_pow<<" "<<_mod<<endl;
cout<<pow(_num,_pow,_mod)<<endl;
return 0;
}
long pow(string n,string p,long long mod){
long long num=0,_pow=0;
for(char c: n){
num=(num*10+c-48)%mod;
}
for(char c: p){
_pow=(_pow*10+c-48)%(mod-1);
}
return pow(num,_pow,mod);
}
long pow(long long a,long long p,long long mod){
long res=1;
if(a==0)return 0;
while(p>0){
if((p&1)==0){
p/=2;
a=(a*a)%mod;
}
else{
p--;
res=(res*a)%mod;
}
}
return res;
}
This code works because ab mod m can be written as (a mod m)b mod m-1 mod m.
Hope it helped { :)

use fast exponentiation maybe..... gives same o(log n) as that template above
int power(int base, int exp,int mod)
{
if(exp == 0)
return 1;
int p=power(base, exp/2,mod);
p=(p*p)% mod;
return (exp%2 == 0)?p:(base * p)%mod;
}

This(encryption) is more of an algorithm design problem than a programming one. The important missing part is familiarity with modern algebra. I suggest that you look for a huge optimizatin in group theory and number theory.
If n is a prime number, pow(a,n-1)%n==1 (assuming infinite digit integers).So, basically you need to calculate pow(a,b%(n-1))%n; According to group theory, you can find e such that every other number is equivalent to a power of e modulo n. Therefore the range [1..n-1] can be represented as a permutation on powers of e. Given the algorithm to find e for n and logarithm of a base e, calculations can be significantly simplified. Cryptography needs a tone of math background; I'd rather be off that ground without enough background.

For my code a^k mod n in php:
function pmod(a, k, n)
{
if (n==1) return 0;
power = 1;
for(i=1; i<=k; $i++)
{
power = (power*a) % n;
}
return power;
}

#include <cmath>
...
static_cast<int>(std::pow(a,b))%n
but my best bet is you are overflowing int (IE: the number is two large for the int) on the power I had the same problem creating the exact same function.

I'm using this function:
int CalculateMod(int base, int exp ,int mod){
int result;
result = (int) pow(base,exp);
result = result % mod;
return result;
}
I parse the variable result because pow give you back a double, and for using mod you need two variables of type int, anyway, in a RSA decryption, you should just use integer numbers.

Fastest way to get the integer part of sqrt(n)?

As we know if n is not a perfect square, then sqrt(n) would not be an integer. Since I need only the integer part, I feel that calling sqrt(n) wouldn't be that fast, as it takes time to calculate the fractional part also.
So my question is,
Can we get only the integer part of sqrt(n) without calculating the actual value of sqrt(n)? The algorithm should be faster than sqrt(n) (defined in <math.h> or <cmath>)?
If possible, you can write the code in asm block also.

I would try the Fast Inverse Square Root trick.
It's a way to get a very good approximation of 1/sqrt(n) without any branch, based on some bit-twiddling so not portable (notably between 32-bits and 64-bits platforms).
Once you get it, you just need to inverse the result, and takes the integer part.
There might be faster tricks, of course, since this one is a bit of a round about.
EDIT: let's do it!
First a little helper:
// benchmark.h
#include <sys/time.h>
template <typename Func>
double benchmark(Func f, size_t iterations)
{
f();
timeval a, b;
gettimeofday(&a, 0);
for (; iterations --> 0;)
{
f();
}
gettimeofday(&b, 0);
return (b.tv_sec * (unsigned int)1e6 + b.tv_usec) -
(a.tv_sec * (unsigned int)1e6 + a.tv_usec);
}
Then the main body:
#include <iostream>
#include <cmath>
#include "benchmark.h"
class Sqrt
{
public:
Sqrt(int n): _number(n) {}
int operator()() const
{
double d = _number;
return static_cast<int>(std::sqrt(d) + 0.5);
}
private:
int _number;
};
// http://www.codecodex.com/wiki/Calculate_an_integer_square_root
class IntSqrt
{
public:
IntSqrt(int n): _number(n) {}
int operator()() const
{
int remainder = _number;
if (remainder < 0) { return 0; }
int place = 1 <<(sizeof(int)*8 -2);
while (place > remainder) { place /= 4; }
int root = 0;
while (place)
{
if (remainder >= root + place)
{
remainder -= root + place;
root += place*2;
}
root /= 2;
place /= 4;
}
return root;
}
private:
int _number;
};
// http://en.wikipedia.org/wiki/Fast_inverse_square_root
class FastSqrt
{
public:
FastSqrt(int n): _number(n) {}
int operator()() const
{
float number = _number;
float x2 = number * 0.5F;
float y = number;
long i = *(long*)&y;
//i = (long)0x5fe6ec85e7de30da - (i >> 1);
i = 0x5f3759df - (i >> 1);
y = *(float*)&i;
y = y * (1.5F - (x2*y*y));
y = y * (1.5F - (x2*y*y)); // let's be precise
return static_cast<int>(1/y + 0.5f);
}
private:
int _number;
};
int main(int argc, char* argv[])
{
if (argc != 3) {
std::cerr << "Usage: %prog integer iterations\n";
return 1;
}
int n = atoi(argv[1]);
int it = atoi(argv[2]);
assert(Sqrt(n)() == IntSqrt(n)() &&
Sqrt(n)() == FastSqrt(n)() && "Different Roots!");
std::cout << "sqrt(" << n << ") = " << Sqrt(n)() << "\n";
double time = benchmark(Sqrt(n), it);
double intTime = benchmark(IntSqrt(n), it);
double fastTime = benchmark(FastSqrt(n), it);
std::cout << "Number iterations: " << it << "\n"
"Sqrt computation : " << time << "\n"
"Int computation : " << intTime << "\n"
"Fast computation : " << fastTime << "\n";
return 0;
}
And the results:
sqrt(82) = 9
Number iterations: 4096
Sqrt computation : 56
Int computation : 217
Fast computation : 119
// Note had to tweak the program here as Int here returns -1 :/
sqrt(2147483647) = 46341 // real answer sqrt(2 147 483 647) = 46 340.95
Number iterations: 4096
Sqrt computation : 57
Int computation : 313
Fast computation : 119
Where as expected the Fast computation performs much better than the Int computation.
Oh, and by the way, sqrt is faster :)

Edit: this answer is foolish - use (int) sqrt(i)
After profiling with proper settings (-march=native -m64 -O3) the above was a lot faster.
Alright, a bit old question, but the "fastest" answer has not been given yet. The fastest (I think) is the Binary Square Root algorithm, explained fully in this Embedded.com article.
It basicly comes down to this:
unsigned short isqrt(unsigned long a) {
unsigned long rem = 0;
int root = 0;
int i;
for (i = 0; i < 16; i++) {
root <<= 1;
rem <<= 2;
rem += a >> 30;
a <<= 2;
if (root < rem) {
root++;
rem -= root;
root++;
}
}
return (unsigned short) (root >> 1);
}
On my machine (Q6600, Ubuntu 10.10) I profiled by taking the square root of the numbers 1-100000000. Using iqsrt(i) took 2750 ms. Using (unsigned short) sqrt((float) i) took 3600ms. This was done using g++ -O3. Using the -ffast-math compile option the times were 2100ms and 3100ms respectively. Note this is without using even a single line of assembler so it could probably still be much faster.
The above code works for both C and C++ and with minor syntax changes also for Java.
What works even better for a limited range is a binary search. On my machine this blows the version above out of the water by a factor 4. Sadly it's very limited in range:
#include <stdint.h>
const uint16_t squares[] = {
0, 1, 4, 9,
16, 25, 36, 49,
64, 81, 100, 121,
144, 169, 196, 225,
256, 289, 324, 361,
400, 441, 484, 529,
576, 625, 676, 729,
784, 841, 900, 961,
1024, 1089, 1156, 1225,
1296, 1369, 1444, 1521,
1600, 1681, 1764, 1849,
1936, 2025, 2116, 2209,
2304, 2401, 2500, 2601,
2704, 2809, 2916, 3025,
3136, 3249, 3364, 3481,
3600, 3721, 3844, 3969,
4096, 4225, 4356, 4489,
4624, 4761, 4900, 5041,
5184, 5329, 5476, 5625,
5776, 5929, 6084, 6241,
6400, 6561, 6724, 6889,
7056, 7225, 7396, 7569,
7744, 7921, 8100, 8281,
8464, 8649, 8836, 9025,
9216, 9409, 9604, 9801,
10000, 10201, 10404, 10609,
10816, 11025, 11236, 11449,
11664, 11881, 12100, 12321,
12544, 12769, 12996, 13225,
13456, 13689, 13924, 14161,
14400, 14641, 14884, 15129,
15376, 15625, 15876, 16129,
16384, 16641, 16900, 17161,
17424, 17689, 17956, 18225,
18496, 18769, 19044, 19321,
19600, 19881, 20164, 20449,
20736, 21025, 21316, 21609,
21904, 22201, 22500, 22801,
23104, 23409, 23716, 24025,
24336, 24649, 24964, 25281,
25600, 25921, 26244, 26569,
26896, 27225, 27556, 27889,
28224, 28561, 28900, 29241,
29584, 29929, 30276, 30625,
30976, 31329, 31684, 32041,
32400, 32761, 33124, 33489,
33856, 34225, 34596, 34969,
35344, 35721, 36100, 36481,
36864, 37249, 37636, 38025,
38416, 38809, 39204, 39601,
40000, 40401, 40804, 41209,
41616, 42025, 42436, 42849,
43264, 43681, 44100, 44521,
44944, 45369, 45796, 46225,
46656, 47089, 47524, 47961,
48400, 48841, 49284, 49729,
50176, 50625, 51076, 51529,
51984, 52441, 52900, 53361,
53824, 54289, 54756, 55225,
55696, 56169, 56644, 57121,
57600, 58081, 58564, 59049,
59536, 60025, 60516, 61009,
61504, 62001, 62500, 63001,
63504, 64009, 64516, 65025
};
inline int isqrt(uint16_t x) {
const uint16_t *p = squares;
if (p[128] <= x) p += 128;
if (p[ 64] <= x) p += 64;
if (p[ 32] <= x) p += 32;
if (p[ 16] <= x) p += 16;
if (p[ 8] <= x) p += 8;
if (p[ 4] <= x) p += 4;
if (p[ 2] <= x) p += 2;
if (p[ 1] <= x) p += 1;
return p - squares;
}
A 32 bit version can be downloaded here: https://gist.github.com/3481770

While I suspect you can find a plenty of options by searching for "fast integer square root", here are some potentially-new ideas that might work well (each independent, or maybe you can combine them):
Make a static const array of all the perfect squares in the domain you want to support, and perform a fast branchless binary search on it. The resulting index in the array is the square root.
Convert the number to floating point and break it into mantissa and exponent. Halve the exponent and multiply the mantissa by some magic factor (your job to find it). This should be able to give you a very close approximation. Include a final step to adjust it if it's not exact (or use it as a starting point for the binary search above).

If you don't mind an approximation, how about this integer sqrt function I cobbled together.
int sqrti(int x)
{
union { float f; int x; } v;
// convert to float
v.f = (float)x;
// fast aprox sqrt
// assumes float is in IEEE 754 single precision format
// assumes int is 32 bits
// b = exponent bias
// m = number of mantissa bits
v.x -= 1 << 23; // subtract 2^m
v.x >>= 1; // divide by 2
v.x += 1 << 29; // add ((b + 1) / 2) * 2^m
// convert to int
return (int)v.f;
}
It uses the algorithm described in this Wikipedia article.
On my machine it's almost twice as fast as sqrt :)

To do integer sqrt you can use this specialization of newtons method:
Def isqrt(N):
a = 1
b = N
while |a-b| > 1
b = N / a
a = (a + b) / 2
return a
Basically for any x the sqrt lies in the range (x ... N/x), so we just bisect that interval at every loop for the new guess. Sort of like binary search but it converges must faster.
This converges in O(loglog(N)) which is very fast. It also doesn't use floating point at all, and it will also work well for arbitrary precision integers.

This is so short that it 99% inlines:
static inline int sqrtn(int num) {
int i = 0;
__asm__ (
"pxor %%xmm0, %%xmm0\n\t" // clean xmm0 for cvtsi2ss
"cvtsi2ss %1, %%xmm0\n\t" // convert num to float, put it to xmm0
"sqrtss %%xmm0, %%xmm0\n\t" // square root xmm0
"cvttss2si %%xmm0, %0" // float to int
:"=r"(i):"r"(num):"%xmm0"); // i: result, num: input, xmm0: scratch register
return i;
}
Why clean xmm0? Documentation of cvtsi2ss
The destination operand is an XMM register. The result is stored in the low doubleword of the destination operand, and the upper three doublewords are left unchanged.
GCC Intrinsic version (runs only on GCC):
#include <xmmintrin.h>
int sqrtn2(int num) {
register __v4sf xmm0 = {0, 0, 0, 0};
xmm0 = __builtin_ia32_cvtsi2ss(xmm0, num);
xmm0 = __builtin_ia32_sqrtss(xmm0);
return __builtin_ia32_cvttss2si(xmm0);
}
Intel Intrinsic version (tested on GCC, Clang, ICC):
#include <xmmintrin.h>
int sqrtn2(int num) {
register __m128 xmm0 = _mm_setzero_ps();
xmm0 = _mm_cvt_si2ss(xmm0, num);
xmm0 = _mm_sqrt_ss(xmm0);
return _mm_cvtt_ss2si(xmm0);
}
^^^^ All of them require SSE 1 (not even SSE 2).
Note: This is exactly how GCC calculates (int) sqrt((float) num) with -Ofast. If you want higher accuracy for larger i, then we can calculate (int) sqrt((double) num) (as noted by Gumby The Green in the comments):
static inline int sqrtn(int num) {
int i = 0;
__asm__ (
"pxor %%xmm0, %%xmm0\n\t"
"cvtsi2sd %1, %%xmm0\n\t"
"sqrtsd %%xmm0, %%xmm0\n\t"
"cvttsd2si %%xmm0, %0"
:"=r"(i):"r"(num):"%xmm0");
return i;
}
or
#include <xmmintrin.h>
int sqrtn2(int num) {
register __v2df xmm0 = {0, 0};
xmm0 = __builtin_ia32_cvtsi2sd(xmm0, num);
xmm0 = __builtin_ia32_sqrtsd(xmm0);
return __builtin_ia32_cvttsd2si(xmm0);
}

The following solution computes the integer part, meaning floor(sqrt(x)) exactly, with no rounding errors.
Problems With Other Approaches
using float or double is neither portable nor precise enough
#orlp's isqrt gives insane results like isqrt(100) = 15
approaches based on huge lookup tables are not practical beyond 32 bits
using a fast inverse sqrt is very imprecise, you're better off using sqrtf
Newton's approach requires expensive integer division and a good initial guess
My Approach
Mine is based on the bit-guessing approach proposed on Wikipedia. Unfortunately the pseudo-code provided on Wikipedia has some errors so I had to make some adjustments:
// C++20 also provides std::bit_width in its <bit> header
unsigned char bit_width(unsigned long long x) {
return x == 0 ? 1 : 64 - __builtin_clzll(x);
}
template <typename Int, std::enable_if_t<std::is_unsigned<Int, int = 0>>
Int sqrt(const Int n) {
unsigned char shift = bit_width(n);
shift += shift & 1; // round up to next multiple of 2
Int result = 0;
do {
shift -= 2;
result <<= 1; // make space for the next guessed bit
result |= 1; // guess that the next bit is 1
result ^= result * result > (n >> shift); // revert if guess too high
} while (shift != 0);
return result;
}
bit_width can be evaluated in constant time and the loop will iterate at most ceil(bit_width / 2) times. So even for a 64-bit integer, this will be at worst 32 iterations of basic arithmetic and bitwise operations.
The compile output is only around 20 instructions.
Performance
I have benchmarked my methods against float-bases ones by generating inputs uniformly. Note that in the real world most inputs would be much closer to zero than to std::numeric_limits<...>::max().
for uint32_t this performs about 25x worse than using std::sqrt(float)
for uint64_t this performs about 30x worse than using std::sqrt(double)
Accuracy
This method is always perfectly accurate, unlike approaches using floating point math.
Using sqrtf can provide incorrect rounding in the [228, 232) range. For example, sqrtf(0xffffffff) = 65536, when the square root is actually 65535.99999.
Double precision doesn't work consistently for the [260, 264) range. For example, sqrt(0x3fff...) = 2147483648, when the square root is actually 2147483647.999999.
The only thing that covers all 64-bit integers is x86 extended precision long double, simply because it can fit an entire 64-bit integer.
Conclusion
As I said, this the only solution that handles all inputs correctly, avoids integer division and doesn't require lookup tables.
In summary, if you need a method that is independent of precision and doesn't require gigantic lookup tables, this is your only option.
It might be especially useful in a constexpr context where performance isn't critical and where it could be much more important to get a 100% accurate result.
Alternative Approach Using Newton's Method
Newton's method can be quite fast when starting with a good guess. For our guess, we will round down to the next power of 2 and compute the square root in constant time. For any number 2x, we can obtain the square root using 2x/2.
template <typename Int, std::enable_if_t<std::is_unsigned_v<Int>, int> = 0>
Int sqrt_guess(const Int n)
{
Int log2floor = bit_width(n) - 1;
// sqrt(x) is equivalent to pow(2, x / 2 = x >> 1)
// pow(2, x) is equivalent to 1 << x
return 1 << (log2floor >> 1);
}
Note that this is not exactly 2x/2 because we lost some precision during the rightshift. Instead it is 2floor(x/2).
Also note that sqrt_guess(0) = 1 which is actually necessary to avoid division by zero in the first iteration:
template <typename Int, std::enable_if_t<std::is_unsigned_v<Int>, int> = 0>
Int sqrt_newton(const Int n)
{
Int a = sqrt_guess(n);
Int b = n;
// compute unsigned difference
while (std::max(a, b) - std::min(a, b) > 1) {
b = n / a;
a = (a + b) / 2;
}
// a is now either floor(sqrt(n)) or ceil(sqrt(n))
// we decrement in the latter case
// this is overflow-safe as long as we start with a lower bound guess
return a - (a * a > n);
}
This alternative approach performs roughly equivalent to the first proposal, but is usually a few percentage points faster. However, it heavily relies on efficient hardware division and result can vary heavily.
The use of sqrt_guess makes a huge difference. It is roughly five times faster than using 1 as the initial guess.

In many cases, even exact integer sqrt value is not needed, enough having good approximation of it. (For example, it often happens in DSP optimization, when 32-bit signal should be compressed to 16-bit, or 16-bit to 8-bit, without loosing much precision around zero).
I've found this useful equation:
k = ceil(MSB(n)/2); - MSB(n) is the most significant bit of "n"
sqrt(n) ~= 2^(k-2)+(2^(k-1))*n/(2^(2*k))); - all multiplications and divisions here are very DSP-friendly, as they are only 2^k.
This equation generates smooth curve (n, sqrt(n)), its values are not very much different from real sqrt(n) and thus can be useful when approximate accuracy is enough.

Why nobody suggests the quickest method?
If:
the range of numbers is limited
memory consumption is not crucial
application launch time is not critical
then create int[MAX_X] filled (on launch) with sqrt(x) (you don't need to use the function sqrt() for it).
All these conditions fit my program quite well.
Particularly, an int[10000000] array is going to consume 40MB.
What's your thoughts on this?

On my computer with gcc, with -ffast-math, converting a 32-bit integer to float and using sqrtf takes 1.2 s per 10^9 ops (without -ffast-math it takes 3.54 s).
The following algorithm uses 0.87 s per 10^9 at the expense of some accuracy: errors can be as much as -7 or +1 although the RMS error is only 0.79:
uint16_t SQRTTAB[65536];
inline uint16_t approxsqrt(uint32_t x) {
const uint32_t m1 = 0xff000000;
const uint32_t m2 = 0x00ff0000;
if (x&m1) {
return SQRTTAB[x>>16];
} else if (x&m2) {
return SQRTTAB[x>>8]>>4;
} else {
return SQRTTAB[x]>>8;
}
}
The table is constructed using:
void maketable() {
for (int x=0; x<65536; x++) {
double v = x/65535.0;
v = sqrt(v);
int y = int(v*65535.0+0.999);
SQRTTAB[x] = y;
}
}
I found that refining the bisection using further if statements does improve accuracy, but it also slows things down to the point that sqrtf is faster, at least with -ffast-math.

Or just do a binary search, cant write a simpler version imo:
uint16_t sqrti(uint32_t num)
{
uint16_t ret = 0;
for(int32_t i = 15; i >= 0; i--)
{
uint16_t temp = ret | (1 << i);
if(temp * temp <= num)
{
ret = temp;
}
}
return ret;
}

If you need performance on computing square root, I guess you will compute a lot of them.
Then why not caching the answer? I don't know the range for N in your case, nor if you will compute many times the square root of the same integer, but if yes, then you can cache the result each time your method is called (in an array would be the most efficient if not too large).

This is an addition for those in need of a precide square root for very large integers. The trick is to leverage the fast floating point square root of modern processors and to fix round-off errors.
template <typename T>
T preciseIntegerSqrt(T n)
{
if (sizeof(T) <= 4)
{
return std::sqrt((double)n);
}
else if (sizeof(T) <= 8)
{
T r = std::sqrt((double)n);
return r - (r*r-1 >= n);
}
else
{
if (n == 0) return 0;
T r = 0;
for (T b = (T(1)) << ((std::bit_width(n)-1) / 2); b != 0; b >>= 1)
{
T const k = (b + 2*r) * b;
r |= (n >= k) * b;
n -= (n >= k) * k;
}
return r;
}
}
Explanation: Integers of up to 32 bits do not need a correction, since they can be represented precisely as double-precision floating point numbers. 64-bit integers get along with a very cheap correction. For the general case, refer to Jan Schultke's excellent answer. The code provided here is very slightly faster that that one (10% on my machine, may vary with integer type and hardware).

Can I rely on this to judge a square number in C++?

Can I rely on
sqrt((float)a)*sqrt((float)a)==a
or
(int)sqrt((float)a)*(int)sqrt((float)a)==a
to check whether a number is a perfect square? Why or why not?
int a is the number to be judged. I'm using Visual Studio 2005.
Edit: Thanks for all these rapid answers. I see that I can't rely on float type comparison. (If I wrote as above, will the last a be cast to float implicitly?) If I do it like
(int)sqrt((float)a)*(int)sqrt((float)a) - a < e
How small should I take that e value?
Edit2: Hey, why don't we leave the comparison part aside, and decide whether the (int) is necessary? As I see, with it, the difference might be great for squares; but without it, the difference might be small for non-squares. Perhaps neither will do. :-(

Actually, this is not a C++, but a math question.
With floating point numbers, you should never rely on equality. Where you would test a == b, just test against abs(a - b) < eps, where eps is a small number (e.g. 1E-6) that you would treat as a good enough approximation.
If the number you are testing is an integer, you might be interested in the Wikipedia article about Integer square root
EDIT:
As Krugar said, the article I linked does not answer anything. Sure, there is no direct answer to your question there, phoenie. I just thought that the underlying problem you have is floating point precision and maybe you wanted some math background to your problem.
For the impatient, there is a link in the article to a lengthy discussion about implementing isqrt. It boils down to the code karx11erx posted in his answer.
If you have integers which do not fit into an unsigned long, you can modify the algorithm yourself.

If you don't want to rely on float precision then you can use the following code that uses integer math.
The Isqrt is taken from here and is O(log n)
// Finds the integer square root of a positive number
static int Isqrt(int num)
{
if (0 == num) { return 0; } // Avoid zero divide
int n = (num / 2) + 1; // Initial estimate, never low
int n1 = (n + (num / n)) / 2;
while (n1 < n)
{
n = n1;
n1 = (n + (num / n)) / 2;
} // end while
return n;
} // end Isqrt()
static bool IsPerfectSquare(int num)
{
return Isqrt(num) * Isqrt(num) == num;
}

Not to do the same calculation twice I would do it with a temporary number:
int b = (int)sqrt((float)a);
if((b*b) == a)
{
//perfect square
}
edit:
dav made a good point. instead of relying on the cast you'll need to round off the float first
so it should be:
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a)
{
//perfect square
}

Your question has already been answered, but here is a working solution.
Your 'perfect squares' are implicitly integer values, so you could easily solve floating point format related accuracy problems by using some integer square root function to determine the integer square root of the value you want to test. That function will return the biggest number r for a value v where r * r <= v. Once you have r, you simply need to test whether r * r == v.
unsigned short isqrt (unsigned long a)
{
unsigned long rem = 0;
unsigned long root = 0;
for (int i = 16; i; i--) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
if (root < rem)
rem -= ++root;
}
return (unsigned short) (root >> 1);
}
bool PerfectSquare (unsigned long a)
{
unsigned short r = isqrt (a);
return r * r == a;
}

I didn't follow the formula, I apologize.
But you can easily check if a floating point number is an integer by casting it to an integer type and compare the result against the floating point number. So,
bool isSquare(long val) {
double root = sqrt(val);
if (root == (long) root)
return true;
else return false;
}
Naturally this is only doable if you are working with values that you know will fit within the integer type range. But being that the case, you can solve the problem this way, saving you the inherent complexity of a mathematical formula.

As reinier says, you need to add 0.5 to make sure it rounds to the nearest integer, so you get
int b = (int) (sqrt((float)a) + 0.5f);
if((b*b) == a) /* perfect square */
For this to work, b has to be (exactly) equal to the square root of a if a is a perfect square. However, I don't think you can guarantee this. Suppose that int is 64 bits and float is 32 bits (I think that's allowed). Then a can be of the order 2^60, so its square root is of order 2^30. However, a float only stores 24 bits in the significand, so the rounding error is of order 2^(30-24) = 2^6. This is larger to 1, so b may contain the wrong integer. For instance, I think that the above code does not identify a = (2^30+1)^2 as a perfect square.

I would do.
// sqrt always returns positive value. So casting to int is equivalent to floor()
int down = static_cast<int>(sqrt(value));
int up = down+1; // This is the ceil(sqrt(value))
// Because of rounding problems I would test the floor() and ceil()
// of the value returned from sqrt().
if (((down*down) == value) || ((up*up) == value))
{
// We have a winner.
}

The more obvious, if slower -- O(sqrt(n)) -- way:
bool is_perfect_square(int i) {
int d = 1;
for (int x = 0; x <= i; x += d, d += 2) {
if (x == i) return true;
}
return false;
}

While others have noted that you should not test for equality with floats, I think you are missing out on chances to take advantage of the properties of perfect squares. First there is no point in re-squaring the calculated root. If a is a perfect square then sqrt(a) is an integer and you should check:
b = sqrt((float)a)
b - floor(b) < e
where e is set sufficiently small. There are also a number of integers that you can cross of as non-square before taking the square root. Checking Wikipedia you can see some necessary conditions for a to be square:
A square number can only end with
digits 00,1,4,6,9, or 25 in base 10
Another simple check would be to see that a % 4 == 1 or 0 before taking the root since:
Squares of even numbers are even,
since (2n)^2 = 4n^2.
Squares of odd
numbers are odd, since (2n + 1)^2 =
4(n^2 + n) + 1.
These would essentially eliminate half of the integers before taking any roots.

The cleanest solution is to use an integer sqrt routine, then do:
bool isSquare( unsigned int a ) {
unsigned int s = isqrt( a );
return s * s == a;
}
This will work in the full int range and with perfect precision. A few cases:
a = 0, s = 0, s * s = 0 (add an exception if you don't want to treat 0 as square)
a = 1, s = 1, s * s = 1
a = 2, s = 1, s * s = 1
a = 3, s = 1, s * s = 1
a = 4, s = 2, s * s = 4
a = 5, s = 2, s * s = 4
Won't fail either as you approach the maximum value for your int size. E.g. for 32-bit ints:
a = 0x40000000, s = 0x00008000, s * s = 0x40000000
a = 0xFFFFFFFF, s = 0x0000FFFF, s * s = 0xFFFE0001
Using floats you run into a number of issues. You may find that sqrt( 4 ) = 1.999999..., and similar problems, although you can round-to-nearest instead of using floor().
Worse though, a float has only 24 significant bits which means you can't cast any int larger than 2^24-1 to a float without losing precision, which introduces false positives/negatives. Using doubles for testing 32-bit ints, you should be fine, though.
But remember to cast the result of the floating-point sqrt back to an int and compare the result to the original int. Comparisons between floats are never a good idea; even for square values of x in a limited range, there is no guarantee that sqrt( x ) * sqrt( x ) == x, or that sqrt( x * x) = x.

basics first:
if you (int) a number in a calculation it will remove ALL post-comma data. If I remember my C correctly, if you have an (int) in any calculation (+/-*) it will automatically presume int for all other numbers.
So in your case you want float on every number involved, otherwise you will loose data:
sqrt((float)a)*sqrt((float)a)==(float)a
is the way you want to go

Floating point math is inaccurate by nature.
So consider this code:
int a=35;
float conv = (float)a;
float sqrt_a = sqrt(conv);
if( sqrt_a*sqrt_a == conv )
printf("perfect square");
this is what will happen:
a = 35
conv = 35.000000
sqrt_a = 5.916079
sqrt_a*sqrt_a = 34.999990734
this is amply clear that sqrt_a^2 is not equal to a.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js