Need help implementing a Lucas Pseudoprimality test - primes

I am trying to write a function that determines if a number n is prime or composite using the Lucas pseudoprime test; at the moment, I am working with the standard test, but once I get that working I will then write the strong test. I am reading the paper by Baillie and Wagstaff, and following the implementation by Thomas Nicely in the trn.c file.
I understand that the full test involves several steps: trial division by small primes, checking that n is not a square, performing a strong pseudoprimality test to base 2, then finally the Lucas pseudoprime test. I can handle all the other pieces, but I am having trouble with the Lucas pseudoprime test. Here is my implementation, in Python:
def gcd(a, b):
while b != 0:
a, b = b, a % b
return a
def jacobi(a, m):
a = a % m; t = 1
while a != 0:
while a % 2 == 0:
a = a / 2
if m % 8 == 3 or m % 8 == 5:
t = -1 * t
a, m = m, a # swap a and m
if a % 4 == 3 and m % 4 == 3:
t = -1 * t
a = a % m
if m == 1:
return t
return 0
def isLucasPrime(n):
dAbs, sign, d = 5, 1, 5
while 1:
if 1 < gcd(d, n) > n:
return False
if jacobi(d, n) == -1:
break
dAbs, sign = dAbs + 2, sign * -1
d = dAbs * sign
p, q = 1, (1 - d) / 4
print "p, q, d =", p, q, d
u, v, u2, v2, q, q2 = 0, 2, 1, p, q, 2 * q
bits = []
t = (n + 1) / 2
while t > 0:
bits.append(t % 2)
t = t // 2
h = -1
while -1 * len(bits) <= h:
print "u, u2, v, v2, q, q2, bits, bits[h] = ",\
u, u2, v, v2, q, q2, bits, bits[h]
u2 = (u2 * v2) % n
v2 = (v2 * v2 - q2) % n
if bits[h] == 1:
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u / 2) % n
v = (v2 * v) + (u2 * u * d)
v = v if v % 2 == 0 else v + n
v = (v / 2) % n
if -1 * len(bits) < h:
q = (q * q) % n
q2 = q + q
h = h - 1
return u == 0
When I run this, isLucasPrime returns False for such primes as 83 and 89, which is incorrect. It also returns False for the composite 111, which is correct. And it returns False for the composite 323, which I know is a Lucas pseudoprime for which isLucasPrime should return True. In fact, isLucasPseudoprime returns False for every n on which I have tested it.
I have several questions:
1) I'm not expert with C/GMP, but it seems to me that Nicely runs through the bits of (n+1)/2 from right-to-left (least significant to most significant) where other authors run through the bits left-to-right. My code shown above runs through the bits left-to-right, but I have also tried running through the bits right-to-left, with the same result. Which order is correct?
2) It looks odd to me that Nicely only updates the u and v variables for a 1-bit. Is this correct? I expected to update all four of the Lucas-chain variables each time through the loop, since the indexes of the chain increase at each step.
3) What have I done wrong?

1) I'm not expert with C/GMP, but it seems to me that Nicely runs through the bits of (n+1)/2 from right-to-left (least significant to most significant) where other authors run through the bits left-to-right. My code shown above runs through the bits left-to-right, but I have also tried running through the bits right-to-left, with the same result. Which order is correct?
Indeed, Nicely goes from least significant to most significant bit. He computes U(2^k) and V(2^k) (and Q^(2^k); all modulo N of course), in the mpzU2m and mpzV2m variables, and has U((N+1) % 2^k) resp V((N+1) % 2^k) stored in mpzU and mpzV. When a 1-bit is encountered, the remainder (N+1) % 2^k changes, and mpzU and mpzV are updated accordingly.
The other way is to compute U(p), U(p+1), V(p) and (optionally) V(p+1) for a prefix p of N+1 and combine those to compute U(2*p+1) and either U(2*p) or U(2*p+2) [ditto for V] depending on whether the next bit after the prefix p is 0 or 1.
Both methods are correct, like you can compute the power x^N going from left to right, having x^p and x^(p+1) as state, or from right to left having x^(2^k) and x^(N % 2^k) as state [and, computing U(n) and U(n+1) is basically computing ζ^n where ζ = (1 + sqrt(D))/2].
I - and others, apparently - find the left-to-right order simpler. I haven't done or read an analysis, it might be that right-to-left is computationally less expensive on average and Nicely chose right-to-left because of that.
2) It looks odd to me that Nicely only updates the u and v variables for a 1-bit. Is this correct? I expected to update all four of the Lucas-chain variables each time through the loop, since the indexes of the chain increase at each step.
Yes, that is correct, because the remainder (N+1) % 2^k == (N+1) % 2^(k-1) if the 2^k bit is 0.
3) What have I done wrong?
A small typo first:
if 1 < gcd(d, n) > n:
should be
if 1 < gcd(d, n) < n:
of course.
More substantially, you use the updates for Nicely's traversal order (right-to-left), but traverse in the other direction. That of course produces wrong results.
Further, when updating v
if bits[h] == 1:
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u / 2) % n
v = (v2 * v) + (u2 * u * d)
v = v if v % 2 == 0 else v + n
v = (v / 2) % n
you use the new value of u, but you ought to use the old value.
def isLucasPrime(n):
dAbs, sign, d = 5, 1, 5
while 1:
if 1 < gcd(d, n) < n:
return False
if jacobi(d, n) == -1:
break
dAbs, sign = dAbs + 2, sign * -1
d = dAbs * sign
p, q = 1, (1 - d) // 4
u, v, u2, v2, q, q2 = 0, 2, 1, p, q, 2 * q
bits = []
t = (n + 1) // 2
while t > 0:
bits.append(t % 2)
t = t // 2
h = 0
while h < len(bits):
u2 = (u2 * v2) % n
v2 = (v2 * v2 - q2) % n
if bits[h] == 1:
uold = u
u = u2 * v + u * v2
u = u if u % 2 == 0 else u + n
u = (u // 2) % n
v = (v2 * v) + (u2 * uold * d)
v = v if v % 2 == 0 else v + n
v = (v // 2) % n
if h < len(bits) - 1:
q = (q * q) % n
q2 = q + q
h = h + 1
return u == 0
works (no guarantees, but I think it is correct, and have done some tests, all of which it passed).

Related

how can I make branchless number cycle?

I need a branchless number cycle code.
like this:
int i = 0;
i = (i + 1) % 4 //1
i = (i + 1) % 4 //2
i = (i + 1) % 4 //3
i = (i + 1) % 4 //0
i = (i + 1) % 4 //1
...
But it should work in the reverse order of the code above. (3 > 2 > 1 > 0 > 3 > ...)
I first tried "i = (i - 1) % 4".
But this worked differently than I wanted. (-1 > -2 > -3 > 0 > -1 > ...)
However, if I use the method of adding 4 when i is negative, this code is no longer branchless.
How can I implement the functionality which I want (without additional variables or arrays)?
(This article has been translated by Google Translate.)
The error happens because in C89 the remainder of negative numbers was underspecified and from C99, negative % positive will result in a negative number which is unlike in some programming languages such as Python, where (-1) % 4 would indeed result in 3.
But it is easy to circumvent. When you subtract 1, it is the same as adding -1. Since 0 - 1 will get to -1, we would have a negative remainder. To stay positive, instead of adding the negative -1 we can add a positive number that's congruent to -1 (mod m). The smallest positive such number is m - 1 for an m > 1. Therefore we can use:
#define MODULUS 4 // or any other moduli > 1
int i = 0;
i = (i + (MODULUS - 1)) % MODULUS; //3
i = (i + (MODULUS - 1)) % MODULUS; //2
i = (i + (MODULUS - 1)) % MODULUS; //1
i = (i + (MODULUS - 1)) % MODULUS; //0
i = (i + (MODULUS - 1)) % MODULUS; //3
You need to change your expression a bit. It should be:
i = ( i + 3 ) % 4;
In general, if you want a number in range [0,N-1] with such cycle then the equation should be:
i = (i + (N - 1)) % N;
You can see it working here(manually several times) and here (in loop):
int main()
{
int i = 0;
i = ( i + 3 ) % 4; //3
i = ( i + 3 ) % 4; //2
i = ( i + 3 ) % 4; //1
i = ( i + 3 ) % 4; //0
i = ( i + 3 ) % 4; //3
i = ( i + 3 ) % 4; //2
i = ( i + 3 ) % 4; //1
i = ( i + 3 ) % 4; //0
return 0;
}
You can use unsigned arithmetic, then the numbers won't become negative.
unsigned i = 0;
i = (i - 1) % 4 //3
i = (i - 1) % 4 //2
i = (i - 1) % 4 //1
i = (i - 1) % 4 //0
i = (i - 1) % 4 //3
Also, maybe more intuitive, you can use bitwise operations to implement a 2-bit counter.
unsigned i = 0;
i = (i - 1) & 3;
i = (i - 1) & 3;
i = (i - 1) & 3;
i = (i - 1) & 3;
i = (i - 1) & 3;
On machine code level, this is identical to the code above.
There are two cases:
Case 1: The modulo is a power of 2
In this case, you can simply use unsigned arithmetic with a bit mask:
unsigned i = ...;
i = (i - 1) & (modulo - 1);
When i = 0, subtracting 1 will yield a value with all bits set in unsigned arithmetic, and the mask operation & will yield the value modulo - 1.
Case 2: The modulo is not a power of 2
In this case, no fancy bit tricks work. You can only avoid going negative:
i = (i + modulo - 1) % modulo;

split a number n as sum of k distinct numbers

I have a number n and I have to split it into k numbers such that all k numbers are distinct, the sum of the k numbers is equal to n and k is maximum. Example if n is 9 then the answer should be 1,2,6. If n is 15 then answer should be 1,2,3,4,5.
This is what I've tried -
void findNum(int l, int k, vector<int>& s)
{
if (k <= 2 * l) {
s.push_back(k);
return;
}
else if (l == 1) {
s.push_back(l);
findNum(l + 1, k - 1, s);
}
else if(l == 2) {
s.push_back(l);
findNum(l + 2, k - 2, s);
}
else{
s.push_back(l);
findNum(l + 1, k - l, s);
}
}
Initially k = n and l = 1. Resulting numbers are stored in s. This solution even though returns the number n as a sum of k distinct numbers but it is the not the optimal solution(k is not maximal). Example output for n = 15 is 1,2,4,8. What changes should be made to get the correct result?
Greedy algorithm works for this problem. Just start summing up from 1 to m such that sum(1...m) <= n. As soon as it exceeds, add the excess to m-1. Numbers from 1 upto m|m-1 will be the answer.
eg.
18
1+2+3+4+5 < 18
+6 = 21 > 18
So, answer: 1+2+3+4+(5+6-(21-18))
28
1+2+3+4+5+6+7 = 28
So, answer: 1+2+3+4+5+6+7
Pseudocode (in constant time, complexity O(1))
Find k such that, m * (m+1) > 2 * n
Number of terms = m-1
Terms: 1,2,3...m-2,(m-1 + m - (sum(1...m) - n))
sum can be partitionned into k terms in {1, ... , m} if min(k) <= sum <= max(k,m), with
min(k) = 1 + 2 + .. + k = (k*(k+1))/2
max(k,m) = m + (m-1) + .. + (m-k+1) = k*m - (k*(k-1))/2
So, you can use the following pseudo-code:
fn solve(n, k, sum) -> set or error
s = new_set()
for m from n down to 1:
# will the problem be solvable if we add m to s?
if min(k-1) <= sum-m <= max(k-1, m-1) then
s.add(m), sum-=m, k-=1
if s=0 and k=0 then s else error()

How should I go about solving this recursion without trial and error

int sum_down(int x)
{
if (x >= 0)
{
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
}
else
{
return 1;
}
}
What is this smallest integer value of the parameter x, so that the returned value is greater than 1.000.000 ?
Right now I am just doing it by trial and error and since this question is asked via a paper format. I don't think I will have enough time to do trial and error. Question is, how do you guys visualise this quickly such that it can be solved easily. Thanks guys and I am new to programming so thanks in advance!
The recursion logic:
x = x - 1;
int y = x + sum_down(x);
return y + sum_down(x);
can be simplified to:
x = x - 1;
int y = x + sum_down(x) + sum_down(x);
return y;
which can be simplified to:
int y = (x-1) + sum_down(x-1) + sum_down(x-1);
return y;
which can be simplified to:
return (x-1) + 2*sum_down(x-1);
Put in mathematical form,
f(N) = (N-1) + 2*f(N-1)
with the recursion terminating when N is -1. f(-1) = 1.
Hence,
f(0) = -1 + 2*1 = 1
f(1) = 0 + 2*1 = 2
f(2) = 1 + 2*2 = 5
...
f(18) = 17 + 2*f(17) = 524269
f(19) = 18 + 2*524269 = 1048556
Your program can be written this way (sorry about c#):
public static void Main()
{
int i = 0;
int j = 0;
do
{
i++;
j = sum_down(i);
Console.Out.WriteLine("j:" + j);
} while (j < 1000000);
Console.Out.WriteLine("i:" + i);
}
static int sum_down(int x)
{
if (x >= 0)
{
return x - 1 + 2 * sum_down(x - 1);
}
else
{
return 1;
}
}
So at first iteration you'll get 2, then 5, then 12... So you can neglect the x-1 part since it'll stay little compared to the multiplication.
So we have:
i = 1 => sum_down ~= 4 (real is 2)
i = 2 => sum_down ~= 8 (real is 5)
i = 3 => sum_down ~= 16 (real is 12)
i = 4 => sum_down ~= 32 (real is 27)
i = 5 => sum_down ~= 64 (real is 58)
So we can say that sum_down(x) ~= 2^x+1. Then it's just basic math with 2^x+1 < 1 000 000 which is 19.
A bit late, but it's not that hard to get an exact non-recursive formula.
Write it up mathematically, as explained in other answers already:
f(-1) = 1
f(x) = 2*f(x-1) + x-1
This is the same as
f(-1) = 1
f(x+1) = 2*f(x) + x
(just switched from x and x-1 to x+1 and x, difference 1 in both cases)
The first few x and f(x) are:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
And while there are many arbitrary complicated ways to transform this into a non-recursive formula, with easy ones it often helps to write up what the difference is between each two elements:
x: -1 0 1 2 3 4
f(x): 1 1 2 5 12 27
0 1 3 7 15
So, for some x
f(x+1) - f(x) = 2^(x+1) - 1
f(x+2) - f(x) = (f(x+2) - f(x+1)) + (f(x+1) - f(x)) = 2^(x+2) + 2^(x+1) - 2
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
With eg. a x=0 inserted, to make f(x+n) to f(n):
f(x+n) - f(x) = sum[0<=i<n](2^(x+1+i)) - n
f(0+n) - f(0) = sum[0<=i<n](2^(0+1+i)) - n
f(n) - 1 = sum[0<=i<n](2^(i+1)) - n
f(n) = sum[0<=i<n](2^(i+1)) - n + 1
f(n) = sum[0<i<=n](2^i) - n + 1
f(n) = (2^(n+1) - 2) - n + 1
f(n) = 2^(n+1) - n - 1
No recursion anymore.
How about this :
int x = 0;
while (sum_down(x) <= 1000000)
{
x++;
}
The loop increments x until the result of sum_down(x) is superior to 1.000.000.
Edit : The result would be 19.
While trying to understand and simplify the recursion logic behind the sum_down() function is enlightening and informative, this snippet tend to be logical and pragmatic in that it does not try and solve the problem in terms of context, but in terms of results.
Two lines of Python code to answer your question:
>>> from itertools import * # no code but needed for dropwhile() and count()
Define the recursive function (See R Sahu's answer)
>>> f = lambda x: 1 if x<0 else (x-1) + 2*f(x-1)
Then use the dropwhile() function to remove elements from the list [0, 1, 2, 3, ....] for which f(x)<=1000000, resulting in a list of integers for which f(x) > 1000000. Note: count() returns an infinite "list" of [0, 1, 2, ....]
The dropwhile() function returns a Python generator so we use next() to get the first value of the list:
>>> next(dropwhile(lambda x: f(x)<=1000000, count()))
19

shell sort sequence implementation in C++

I am reading about shell sort in Algorithms in C++ by Robert Sedwick.
Here outer loop to change the increments leads to this compact shellsort implementation, which uses the increment sequence 1 4 13 40 121 364 1093 3280 9841 . . . .
template <class Item>
void shellsort(Item a[], int l, int r)
{
int h;
for (h = 1; h <= (r - l) / 9; h = 3 * h + 1);
for (; h > 0; h = h / 3)
{
for (int i = l + h; i <= r; i++)
{
int j = i; Item v = a[i];
while (j >= l + h && v < a[j - h])
{
a[j] = a[j - h]; j -= h;
}
a[j] = v;
}
}
}
My question under what basis author is checking for condition h <= (r-l)/9, and why author is dividing by 9.
The loop:
for (h = 1; h <= (r - l) / 9; h = 3 * h + 1);
calculates the initial value of h. This value must be smaller than the range it will be used in:
h <= (r - l)
Everytime this condition passes, h gets updated to 3 * h + 1, which means that even though h is smaller than (r-l), the updated value might be larger. To prevent this, we could check if the next value of h would surpass the largest index:
(h * 3) + 1 <= (r - l)
This will make sure h is smaller than range of the array.
For example: say we have an array of size 42, which means indices go from 0 to 41. Using the condition as described above:
h = 1, is (3 * 1 + 1) <= (41 - 0) ? yes! -> update h to 4
h = 4, is (3 * 4 + 1) <= (41 - 0) ? yes! -> update h to 13
h = 13, is (3 * 13 + 1) <= (41 - 0) ? yes! -> update h to 40
h = 40, is (3 * 40 + 1) <= (41 - 0) ? no! => h will begin at 40
This means our initial h is 40, because h is only marginally smaller than the range of the array, very little work will be done, the algorithm will only check the following:
Does array[40] needs to be swapped with array[0] ?
Does array[41] needs to be swapped with array[1] ?
This is a bit useless, the first iteration only performs two checks. A smaller initial value of h means more work will get done in the first iteration.
Using:
h <= (r - l) / 9
ensures the initial value of h to be sufficiently small to allow the first iteration to do useful work. As an extra advantage, it also looks cleaner than the previous condition.
You could replace 9 by any value greater than 3. Why greater than 3? To ensure (h * 3) + 1 <= (r - l) is still true!
But do remember to not make the initial h too small: Shell Sort is based on Insertion Sort, which only performs well on small or nearly sorted arrays. Personally, I would not exceed h <= (r - l) / 15.

Sum of submatrices of bigger matrix

I have a big matrix as input, and I have the size of a smaller matrix. I have to compute the sum of all possible smaller matrices which can be formed out of the bigger matrix.
Example.
Input matrix size: 4 × 4
Matrix:
1 2 3 4
5 6 7 8
9 9 0 0
0 0 9 9
Input smaller matrix size: 3 × 3 (not necessarily a square)
Smaller matrices possible:
1 2 3
5 6 7
9 9 0
5 6 7
9 9 0
0 0 9
2 3 4
6 7 8
9 0 0
6 7 8
9 0 0
0 9 9
Their sum, final output
14 18 22
29 22 15
18 18 18
I did this:
int** matrix_sum(int **M, int n, int r, int c)
{
int **res = new int*[r];
for(int i=0 ; i<r ; i++) {
res[i] = new int[c];
memset(res[i], 0, sizeof(int)*c);
}
for(int i=0 ; i<=n-r ; i++)
for(int j=0 ; j<=n-c ; j++)
for(int k=i ; k<i+r ; k++)
for(int l=j ; l<j+c ; l++)
res[k-i][l-j] += M[k][l];
return res;
}
I guess this is too slow, can anyone please suggest a faster way?
Your current algorithm is O((m - p) * (n - q) * p * q). The worst case is when p = m / 2 and q = n / 2.
The algorithm I'm going to describe will be O(m * n + p * q), which will be O(m * n) regardless of p and q.
The algorithm consists of 2 steps.
Let the input matrix A's size be m x n and the size of the window matrix being p x q.
First, you will create a precomputed matrix B of the same size as the input matrix. Each element of the precomputed matrix B contains the sum of all the elements in the sub-matrix, whose top-left element is at coordinate (1, 1) of the original matrix, and the bottom-right element is at the same coordinate as the element that we are computing.
B[i, j] = Sum[k = 1..i, l = 1..j]( A[k, l] ) for all 1 <= i <= m, 1 <= j <= n
This can be done in O(m * n), by using this relation to compute each element in O(1):
B[i, j] = B[i - 1, j] + Sum[k = 1..j-1]( A[i, k] ) + A[j] for all 2 <= i <= m, 1 <= j <= n
B[i - 1, j], which is everything of the sub-matrix we are computing except the current row, has been computed previously. You keep a prefix sum of the current row, so that you can use it to quickly compute the sum of the current row.
This is another way to compute B[i, j] in O(1), using the property of the 2D prefix sum:
B[i, j] = B[i - 1, j] + B[i, j - 1] - B[i - 1, j - 1] + A[j] for all 1 <= i <= m, 1 <= j <= n and invalid entry = 0
Then, the second step is to compute the result matrix S whose size is p x q. If you make some observation, S[i, j] is the sum of all elements in the matrix size (m - p + 1) * (n - q + 1), whose top-left coordinate is (i, j) and bottom-right is (i + m - p + 1, j + n - q + 1).
Using the precomputed matrix B, you can compute the sum of any sub-matrix in O(1). Apply this to compute the result matrix S:
SubMatrixSum(top-left = (x1, y1), bottom-right = (x2, y2))
= B[x2, y2] - B[x1 - 1, y2] - B[x2, y1 - 1] + B[x1 - 1, y1 - 1]
Therefore, the complexity of the second step will be O(p * q).
The final complexity is as mentioned above, O(m * n), since p <= m and q <= n.