Faster algorithm for finding primitive roots

Faster algorithm for finding primitive roots - c++

I'm trying to find prime roots with this algorithm:
std::vector<unsigned long long> Keyexchange::primroot(unsigned long long val) {
std::vector<unsigned long long> res;
for (unsigned long long i = 2; i<val - 1; i++) {
unsigned long long start = 1;
bool flag = 1;
for (unsigned long long j = 0; j<val / 2; j++) {
start = (start * i) % val;
if (start % val == 1) {
flag = 0;
break;
}
}
if (flag) {
res.push_back(i);
}
}
return res;
}
It works great but it is very very slow.
I want to calculate the primitive roots of big numbers like 1073741789. It would be the best if there is a possibility to set a range because I am calculating the whole set right now.
So basicely I am looking for a way [code snipet would be great] to generate about 100.000 of the biggest primitive roots out of that given big number.
I know that it is much faster with the Eulersche φ-function but I have no idea how to implement it.
Thanks a lot.

First, if you pick a random integer from 2 to p-1 then it has a decent chance of being a primitive root. So you pick a random integer (or you start with 2), check it, and if it fails, you pick the next one etc.
To check that x is a primitive root: It means that x^(p-1) = 1 (modulo p), but no smaller power of p is. Take for example p = 31, p-1 = 30 = 2 x 3 x 5. If p is not a primitive root, then one of x^(30/2), x^(30/3) and x^(30/5) must be 1 (modulo p).
Factor p-1 in its prime factors, calculate x^((p-1)/f) (modulo p) for every prime factor f, and x is a primitive root if none of the results is 1.
Of course x^y (modulo p) needs to be calculated with repeated squaring/multiplying. For example to calculate x^10 you would calculate x^2, x^4, x^5, x^10 in that order.
Once you found a primitive root g, g^k is a primitive root if gcd (k, p-1) = 1. But it would be a rare situation where you care for more than one primitive root.

If the input number is semi-prime and you have its (two) prime factors at hand, then you can use this:
vector<uint64> Roots(uint64 p,uint64 q)
{
vector<uint64> roots;
uint64 zstar = p*q;
for (uint64 y=1; y<zstar; y++)
{
if (GCD(zstar,y) == 1 && InQR(y,p,q))
{
uint64 yp = PowMod(y,(p+1)/4,p);
uint64 yq = PowMod(y,(q+1)/4,q);
uint64 r1 = Map(0+yp,0+yq,p,q);
uint64 r2 = Map(0+yp,q-yq,p,q);
uint64 r3 = Map(p-yp,0+yq,p,q);
uint64 r4 = Map(p-yp,q-yq,p,q);
roots.push_back(r1);
roots.push_back(r2);
roots.push_back(r3);
roots.push_back(r4);
}
}
return roots;
}
Here are the auxiliary functions:
uint64 GCD(uint64 a,uint64 b)
{
uint64 c = a%b;
if (c == 0)
return b;
return GCD(b,c);
}
uint64 PowMod(uint64 x,uint64 e,uint64 n)
{
uint64 y = 1;
while (e > 0)
{
if (e & 1)
y = (y*x)%n;
x = (x*x)%n;
e >>= 1;
}
return y;
}
bool InQR(uint64 y,uint64 p)
{
return PowMod(y,(p-1)/2,p) == 1;
}
bool InQR(uint64 y,uint64 p,uint64 q)
{
return InQR(y,p) && InQR(y,q);
}
uint64 Map(uint64 u,uint64 v,uint64 p,uint64 q)
{
uint64 a = q*Inverse(p,q);
uint64 b = p*Inverse(q,p);
return (u*a+v*b)%(p*q);
}
uint64 Inverse(uint64 n,uint64 a)
{
int64 x1 = 1;
int64 x2 = 0;
int64 y1 = 0;
int64 y2 = 1;
uint64 r1 = n;
uint64 r2 = a;
while (r2 != 0)
{
uint64 r3 = r1%r2;
uint64 q3 = r1/r2;
int64 x3 = x1-q3*x2;
int64 y3 = y1-q3*y2;
x1 = x2;
x2 = x3;
y1 = y2;
y2 = y3;
r1 = r2;
r2 = r3;
}
return (uint64)(y1>0? y1:y1+n);
}

Related

Looking for nbit adder in c++

I was trying to build 17bit adder, when overflow occurs it should round off should appear just like int32.
eg: In int32 add, If a = 2^31 -1
int res = a+1
res= -2^31-1
Code I tried, this is not working & is there a better way. Do I need to convert decimal to binary & then perform 17bit operation
int addOvf(int32_t result, int32_t a, int32_t b)
{
int max = (-(0x01<<16))
int min = ((0x01<<16) -1)
int range_17bit = (0x01<<17);
if (a >= 0 && b >= 0 && (a > max - b)) {
printf("...OVERFLOW.........a=%0d b=%0d",a,b);
}
else if (a < 0 && b < 0 && (a < min - b)) {
printf("...UNDERFLOW.........a=%0d b=%0d",a,b);
}
result = a+b;
if(result<min) {
while(result<min){ result=result + range_17bit; }
}
else if(result>min){
while(result>max){ result=result - range_17bit; }
}
return result;
}
int main()
{
int32_t res,x,y;
x=-65536;
y=-1;
res =addOvf(res,x,y);
printf("Value of x=%0d y=%0d res=%0d",x,y,res);
return 0;
}

You have your constants for max/min int17 reversed and off by one. They should be
max_int17 = (1 << 16) - 1 = 65535
and
min_int17 = -(1 << 16) = -65536.
Then I believe that max_int_n + m == min_int_n + (m-1) and min_int_n - m == max_int_n - (m-1), where n is the bit count and m is some integer in [min_int_n, ... ,max_int_n]. So putting that all together the function to treat two int32's as though they are int17's and add them would be like
int32_t add_as_int17(int32_t a, int32_t b) {
static const int32_t max_int17 = (1 << 16) - 1;
static const int32_t min_int17 = -(1 << 16);
auto sum = a + b;
if (sum < min_int17) {
auto m = min_int17 - sum;
return max_int17 - (m - 1);
} else if (sum > max_int17) {
auto m = sum - max_int17;
return min_int17 + (m - 1);
}
return sum;
}
There is probably some more clever way to do that but I believe the above is correct, assuming I understand what you want.

Square Root in C/C++

I am trying to implement my own square root function which gives square root's integral part only e.g. square root of 3 = 1.
I saw the method here and tried to implement the method
int mySqrt(int x)
{
int n = x;
x = pow(2, ceil(log(n) / log(2)) / 2);
int y=0;
while (y < x)
{
y = (x + n / x) / 2;
x = y;
}
return x;
}
The above method fails for input 8. Also, I don't get why it should work.
Also, I tried the method here
int mySqrt(int x)
{
if (x == 0) return 0;
int x0 = pow(2, (log(x) / log(2))/2) ;
int y = x0;
int diff = 10;
while (diff>0)
{
x0 = (x0 + x / x0) / 2; diff = y - x0;
y = x0;
if (diff<0) diff = diff * (-1);
}
return x0;
}
In this second way, for input 3 the loop continues ... indefinitely (x0 toggles between 1 and 2).
I am aware that both are essentially versions of Netwon's method but I can't figure out why they fail in certain cases and how could I make them work for all cases. I guess i have the correct logic in implementation. I debugged my code but still I can't find a way to make it work.

This one works for me:
uintmax_t zsqrt(uintmax_t x)
{
if(x==0) return 0;
uintmax_t yn = x; // The 'next' estimate
uintmax_t y = 0; // The result
uintmax_t yp; // The previous estimate
do{
yp = y;
y = yn;
yn = (y + x/y) >> 1; // Newton step
}while(yn ^ yp); // (yn != yp) shortcut for dumb compilers
return y;
}
returns floor(sqrt(x))
Instead of testing for 0 with a single estimate, test with 2 estimates.
When I was writing this, I noticed the result estimate would sometimes oscillate. This is because, if the exact result is a fraction, the algorithm could only jump between the two nearest values. So, terminating when the next estimate is the same as the previous will prevent an infinite loop.

Try this
int n,i;//n is the input number
i=0;
while(i<=n)
{
if((i*i)==n)
{
cout<<"The number has exact root : "<<i<<endl;
}
else if((i*i)>n)
{
cout<<"The integer part is "<<(i-1)<<endl;
}
i++;
}
Hope this helps.

You can try there C sqrt implementations :
// return the number that was multiplied by itself to reach N.
unsigned square_root_1(const unsigned num) {
unsigned a, b, c, d;
for (b = a = num, c = 1; a >>= 1; ++c);
for (c = 1 << (c & -2); c; c >>= 2) {
d = a + c;
a >>= 1;
if (b >= d)
b -= d, a += c;
}
return a;
}
// return the number that was multiplied by itself to reach N.
unsigned square_root_2(unsigned n){
unsigned a = n > 0, b;
if (n > 3)
for (a = n >> 1, b = (a + n / a) >> 1; b < a; a = b, b = (a + n / a) >> 1);
return a ;
}
Example of usage :
#include <assert.h>
int main(void){
unsigned num, res ;
num = 1847902954, res = square_root_1(num), assert(res == 42987);
num = 2, res = square_root_2(num), assert(res == 1);
num = 0, res = square_root_2(num), assert(res == 0);
}
Source

Ways to do modulo multiplication with primitive types

Is there a way to build e.g. (853467 * 21660421200929) % 100000000000007 without BigInteger libraries (note that each number fits into a 64 bit integer but the multiplication result does not)?
This solution seems inefficient:
int64_t mulmod(int64_t a, int64_t b, int64_t m) {
if (b < a)
std::swap(a, b);
int64_t res = 0;
for (int64_t i = 0; i < a; i++) {
res += b;
res %= m;
}
return res;
}

You should use Russian Peasant multiplication. It uses repeated doubling to compute all the values (b*2^i)%m, and adds them in if the ith bit of a is set.
uint64_t mulmod(uint64_t a, uint64_t b, uint64_t m) {
int64_t res = 0;
while (a != 0) {
if (a & 1) res = (res + b) % m;
a >>= 1;
b = (b << 1) % m;
}
return res;
}
It improves upon your algorithm because it takes O(log(a)) time, not O(a) time.
Caveats: unsigned, and works only if m is 63 bits or less.

Keith Randall's answer is good, but as he said, a caveat is that it works only if m is 63 bits or less.
Here is a modification which has two advantages:
It works even if m is 64 bits.
It doesn't need to use the modulo operation, which can be expensive on some processors.
(Note that the res -= m and temp_b -= m lines rely on 64-bit unsigned integer overflow in order to give the expected results. This should be fine since unsigned integer overflow is well-defined in C and C++. For this reason it's important to use unsigned integer types.)
uint64_t mulmod(uint64_t a, uint64_t b, uint64_t m) {
uint64_t res = 0;
uint64_t temp_b;
/* Only needed if b may be >= m */
if (b >= m) {
if (m > UINT64_MAX / 2u)
b -= m;
else
b %= m;
}
while (a != 0) {
if (a & 1) {
/* Add b to res, modulo m, without overflow */
if (b >= m - res) /* Equiv to if (res + b >= m), without overflow */
res -= m;
res += b;
}
a >>= 1;
/* Double b, modulo m */
temp_b = b;
if (b >= m - b) /* Equiv to if (2 * b >= m), without overflow */
temp_b -= m;
b += temp_b;
}
return res;
}

Both methods work for me. The first one is the same as yours, but I changed your numbers to excplicit ULL. Second one uses assembler notation, which should work faster.
There are also algorithms used in cryptography (RSA and RSA based cryptography mostly I guess), like already mentioned Montgomery reduction as well, but I think it will take time to implement them.
#include <algorithm>
#include <iostream>
__uint64_t mulmod1(__uint64_t a, __uint64_t b, __uint64_t m) {
if (b < a)
std::swap(a, b);
__uint64_t res = 0;
for (__uint64_t i = 0; i < a; i++) {
res += b;
res %= m;
}
return res;
}
__uint64_t mulmod2(__uint64_t a, __uint64_t b, __uint64_t m) {
__uint64_t r;
__asm__
( "mulq %2\n\t"
"divq %3"
: "=&d" (r), "+%a" (a)
: "rm" (b), "rm" (m)
: "cc"
);
return r;
}
int main() {
using namespace std;
__uint64_t a = 853467ULL;
__uint64_t b = 21660421200929ULL;
__uint64_t c = 100000000000007ULL;
cout << mulmod1(a, b, c) << endl;
cout << mulmod2(a, b, c) << endl;
return 0;
}

An improvement to the repeating doubling algorithm is to check how many bits at once can be calculated without an overflow. An early exit check can be done for both arguments -- speeding up the (unlikely?) event of N not being prime.
e.g. 100000000000007 == 0x00005af3107a4007, which allows 16 (or 17) bits to be calculated per each iteration. The actual number of iterations will be 3 with the example.
// just a conceptual routine
int get_leading_zeroes(uint64_t n)
{
int a=0;
while ((n & 0x8000000000000000) == 0) { a++; n<<=1; }
return a;
}
uint64_t mulmod(uint64_t a, uint64_t b, uint64_t n)
{
uint64_t result = 0;
int N = get_leading_zeroes(n);
uint64_t mask = (1<<N) - 1;
a %= n;
b %= n; // Make sure all values are originally in the proper range?
// n is not necessarily a prime -- so both a & b can end up being zero
while (a>0 && b>0)
{
result = (result + (b & mask) * a) % n; // no overflow
b>>=N;
a = (a << N) % n;
}
return result;
}

You could try something that breaks the multiplication up into additions:
// compute (a * b) % m:
unsigned int multmod(unsigned int a, unsigned int b, unsigned int m)
{
unsigned int result = 0;
a %= m;
b %= m;
while (b)
{
if (b % 2 != 0)
{
result = (result + a) % m;
}
a = (a * 2) % m;
b /= 2;
}
return result;
}

a * b % m equals a * b - (a * b / m) * m
Use floating point arithmetic to approximate a * b / m. The approximation leaves a value small enough for normal 64 bit integer operations, for m up to 63 bits.
This method is limited by the significand of a double, which is usually 52 bits.
uint64_t mod_mul_52(uint64_t a, uint64_t b, uint64_t m) {
uint64_t c = (double)a * b / m - 1;
uint64_t d = a * b - c * m;
return d % m;
}
This method is limited by the significand of a long double, which is usually 64 bits or larger. The integer arithmetic is limited to 63 bits.
uint64_t mod_mul_63(uint64_t a, uint64_t b, uint64_t m) {
uint64_t c = (long double)a * b / m - 1;
uint64_t d = a * b - c * m;
return d % m;
}
These methods require that a and b be less than m. To handle arbitrary a and b, add these lines before c is computed.
a = a % m;
b = b % m;
In both methods, the final % operation could be made conditional.
return d >= m ? d % m : d;

I can suggest an improvement for your algorithm.
You actually calculate a * b iteratively by adding each time b, doing modulo after each iteration. It's better to add each time b * x, whereas x is determined so that b * x won't overflow.
int64_t mulmod(int64_t a, int64_t b, int64_t m)
{
a %= m;
b %= m;
int64_t x = 1;
int64_t bx = b;
while (x < a)
{
int64_t bb = bx * 2;
if (bb <= bx)
break; // overflow
x *= 2;
bx = bb;
}
int64_t ans = 0;
for (; x < a; a -= x)
ans = (ans + bx) % m;
return (ans + a*b) % m;
}

What is the fastest way to compute large power of 2 modulo a number

For 1 <= N <= 1000000000, I need to compute 2N mod 1000000007, and it must be really fast!
My current approach is:
ull power_of_2_mod(ull n) {
ull result = 1;
if (n <= 63) {
result <<= n;
result = result % 1000000007;
}
else {
ull one = 1;
one <<= 63;
while (n > 63) {
result = ((result % 1000000007) * (one % 1000000007)) % 1000000007;
n -= 63;
}
for (int i = 1; i <= n; ++i) {
result = (result * 2) % 1000000007;
}
}
return result;
}
but it doesn't seem to be fast enough. Any idea?

This will be faster (code in C):
typedef unsigned long long uint64;
uint64 PowMod(uint64 x, uint64 e, uint64 mod)
{
uint64 res;
if (e == 0)
{
res = 1;
}
else if (e == 1)
{
res = x;
}
else
{
res = PowMod(x, e / 2, mod);
res = res * res % mod;
if (e % 2)
res = res * x % mod;
}
return res;
}

This method doesn't use recursion with O(log(n)) complexity. Check this out.
#define ull unsigned long long
#define MODULO 1000000007
ull PowMod(ull n)
{
ull ret = 1;
ull a = 2;
while (n > 0) {
if (n & 1) ret = ret * a % MODULO;
a = a * a % MODULO;
n >>= 1;
}
return ret;
}
And this is pseudo from Wikipedia (see Right-to-left binary method section)
function modular_pow(base, exponent, modulus)
Assert :: (modulus - 1) * (base mod modulus) does not overflow base
result := 1
base := base mod modulus
while exponent > 0
if (exponent mod 2 == 1):
result := (result * base) mod modulus
exponent := exponent >> 1
base := (base * base) mod modulus
return result

You can solve it in O(log n).
For example, for n = 1234 = 10011010010 (in base 2) we have n = 2 + 16 + 64 + 128 + 1024, and thus 2^n = 2^2 * 2^16 * 2^64 * 2^128 * 2 ^ 1024.
Note that 2^1024 = (2^512)^2, so that, given you know 2^512, you can compute 2^1024 in a couple of operations.
The solution would be something like this (pseudocode):
const ulong MODULO = 1000000007;
ulong mul(ulong a, ulong b) {
return (a * b) % MODULO;
}
ulong add(ulong a, ulong b) {
return (a + b) % MODULO;
}
int[] decompose(ulong number) {
//for 1234 it should return [1, 4, 6, 7, 10]
}
//for x it returns 2^(2^x) mod MODULO
// (e.g. for x = 10 it returns 2^1024 mod MODULO)
ulong power_of_power_of_2_mod(int power) {
ulong result = 1;
for (int i = 0; i < power; i++) {
result = mul(result, result);
}
return result;
}
//for x it returns 2^x mod MODULO
ulong power_of_2_mod(int power) {
ulong result = 1;
foreach (int metapower in decompose(power)) {
result = mul(result, power_of_power_of_2_mod(metapower));
}
return result;
}
Note that O(log n) is, in practice, O(1) for ulong arguments (as log n < 63); and that this code is compatible with any uint MODULO (MODULO < 2^32), independent of whether MODULO is prime or not.

It can be solved in O((log n)^2).
Try this approach:-
unsigned long long int fastspcexp(unsigned long long int n)
{
if(n==0)
return 1;
if(n%2==0)
return (((fastspcexp(n/2))*(fastspcexp(n/2)))%1000000007);
else
return ( ( ((fastspcexp(n/2)) * (fastspcexp(n/2)) * 2) %1000000007 ) );
}
This is a recursive approach and is pretty fast enough to meet the time requirements in most of the programming competitions.

If u also want to store that array ie. (2^i)%mod [i=0 to whatever] than:
long mod = 1000000007;
long int pow_mod[ele]; //here 'ele' = maximum power upto which you want to store 2^i
pow_mod[0]=1; //2^0 = 1
for(int i=1;i<ele;++i){
pow_mod[i] = (pow_mod[i-1]*2)%mod;
}
I hope it'll be helpful to someone.

Calculating Catalan Numbers mod prime number

The following is the problem description:
let c[n] be the catalan number for n and p be a large prime eg.1000000007
I need to calculate c[n] % p where n ranges from {1,2,3,...,1000}
The problem which I am having is that on a 32 bit machine you get overflow when you calculate catalan number for such large integer. I am familiar with modulo arithmetic. Also
(a.b) % p = ((a % p)(b % p)) % p
this formula helps me to get away with the overflow in numerator separately but I have no idea how to deal with denominators.

For a modulus of 1000000007, avoiding overflow with only 32-bit integers is cumbersome. But any decent C implementation provides 64-bit integers (and any decent C++ implementation does too), so that shouldn't be necessary.
Then to deal with the denominators, one possibility is, as KerrekSB said in his comment, to calculate the modular inverse of the denominators modulo the prime p = 1000000007. You can calculate the modular inverse with the extended Euclidean algorithm or, equivalently, the continued fraction expansion of k/p. Then instead of dividing by k in the calculation, you multiply by its modular inverse.
Another option is to use Segner's recurrence relation for the Catalan numbers, which gives a calculation without divisions:
C(0) = 1
n
C(n+1) = ∑ C(i)*C(n-i)
0
Since you only need the Catalan numbers C(k) for k <= 1000, you can precalculate them, or quickly calculate them at program startup and store them in a lookup table.
If contrary to expectation no 64-bit integer type is available, you can calculate the modular product by splitting the factors into low and high 16 bits,
a = a1 + (a2 << 16) // 0 <= a1, a2 < (1 << 16)
b = b1 + (b2 << 16) // 0 <= b1, b2 < (1 << 16)
a*b = a1*b1 + (a1*b2 << 16) + (a2*b1 << 16) + (a2*b2 << 32)
To calculate a*b (mod m) with m <= (1 << 31), reduce each of the four products modulo m,
p1 = (a1*b1) % m;
p2 = (a1*b2) % m;
p3 = (a2*b1) % m;
p4 = (a2*b2) % m;
and the simplest way to incorporate the shifts is
for(i = 0; i < 16; ++i) {
p2 *= 2;
if (p2 >= m) p2 -= m;
}
the same for p3 and with 32 iterations for p4. Then
s = p1+p2;
if (s >= m) s -= m;
s += p3;
if (s >= m) s -= m;
s += p4;
if (s >= m) s -= m;
return s;
That way is not very fast, but for the few multiplications needed here, it's fast enough. A small speedup should be obtained by reducing the number of shifts; first calculate (p4 << 16) % m,
for(i = 0; i < 16; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
then all of p2, p3 and the current value of p4 need to be multiplied with 216 modulo m,
p4 += p3;
if (p4 >= m) p4 -= m;
p4 += p2;
if (p4 >= m) p4 -= m;
for(i = 0; i < 16; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
s = p4+p1;
if (s >= m) s -= m;
return s;

what about if you store the results using dynamic programming and while populating the lookup table, you can use MODULO division at each step. It will take care of the overflow for the 1000 Catalans and also will be faster than BigDecimal/BigInteger.
My solution:
public class Catalan {
private static long [] catalan= new long[1001];
private static final int MOD=1000000007;
public static void main(String[] args) {
precalc();
for (int i=1;i<=1000;i++){
System.out.println("Catalan number for "+i+" is: "+catalan[i]);
}
}
private static void precalc(){
for (int i=0;i<=1000;i++){
if (i==0 || i==1){
catalan[i]=1;
}
else{
long sum =0;long left, right;
for (int k=1;k<=i;k++){
left = catalan[k-1] % MOD;
right= catalan[i-k] % MOD;
sum =(sum+ (left * right)%MOD)%MOD;
}
catalan[i]=sum;
}
}
}
}

What about using a library for big integers? Try googling for it...

#include <stdio.h>
#include <stdlib.h>
/*
C(n) = (2n)!/(n+1)!n!
= (2n)(2n-1)(2n-2)..(n+2)/n!
*/
int p = 1000000007;
int gcd(int x, int y){
while(y!=0){
int wk = x % y;
x = y;
y = wk;
}
return x;
}
int catalanMod(n){
long long c = 1LL;
int i;
int *list,*wk;
//make array [(2n),(2n-1),(2n-2)..(n+2)]
wk = list = (int*)malloc(sizeof(int)*(n-1));
for(i=n+2;i<=2*n;++i){
*wk++ = i;
}
wk=list;
//[(2n),(2n-1),(2n-2)..(n+2)] / [1,2,3,..n]
//E.g C(10)=[13,17,19,4]
for(i=2;i<=n;++i){
int j,k,w;
for(w=i,j=0;j<n-1;++j){
while(1!=(k = gcd(wk[j], w))){
wk[j] /= k;
w /= k;
}
if(w == 1) break;
}
}
wk=list;
//Multiplication and modulo reduce
for(i=0;i<n-1;++i){
if(wk[i]==1)continue;
c = c * wk[i] % p;
}
free(list);
return c;
}

Simply, use the property, (a * b) % mod = (a % mod) * (b % mod)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Faster algorithm for finding primitive roots - c++

Related

Looking for nbit adder in c++

Square Root in C/C++

Ways to do modulo multiplication with primitive types

What is the fastest way to compute large power of 2 modulo a number

Calculating Catalan Numbers mod prime number

Categories

Resources