Finding number of divisors of a big integer using prime/quadratic factorization (C#) - primes

I'm trying to get the number of divisors of a 64 bit integer (larger than 32 bit)
My first method (for small numbers) was to divide the number until the resulting number was 1, count the number of matching primes and use the formula (1 + P1)(1+ P2)..*(1 + Pn) = Number of divisors
For example:
24 = 2 * 2 * 2 * 3 = 2^3 * 3^1
==> (3 + 1)*(1 + 1) = 4 * 2 = 8 divisors
public static long[] GetPrimeFactor(long number)
{
bool Found = false;
long i = 2;
List<long> Primes = new List<long>();
while (number > 1)
{
if (number % i == 0)
{
number /= i;
Primes.Add(i);
i = 1;
}
i++;
}
return Primes.ToArray();
}
But for large integers this method is taking to many iterations. I found a method called Quadratic sieve to make a factorization using square numbers. Now using my script this can be much easier because the numbers are much smaller.
My question is, how can I implement this Quadratic Sieve?

The quadatic sieve is a method of finding large factors of large numbers; think 10^75, not 2^64. The quadratic sieve is complicated even in simple pseudocode form, and much more complicated if you want it to be efficient. It is very much overkill for 64-bit integers, and will be slower than other methods that are specialized for such small numbers.
If trial division is too slow for you, the next step up in complexity is John Pollard's rho method; for 64-bit integers, you might want to trial divide up to some small limit, maybe the primes less than a thousand, then switch to rho. Here's simple pseudocode to find a single factor of n; call it repeatedly on the composite cofactors to complete the factorization:
function factor(n, c=1)
if n % 2 == 0 return 2
h := 1; t := 1
repeat
h := (h*h+c) % n
h := (h*h+c) % n
t := (t*t+c) % n
g := gcd(t-h, n)
while g == 1
if g is prime return g
return factor(g, c+1)
There are other ways to factor 64-bit integers, but this will get you started, and is probably sufficient for most purposes; you might search for Richard Brent's variant of the rho algorithm for a modest speedup. If you want to know more, I modestly recommend the essay Programming with Prime Numbers at my blog.

Related

Product of three primes divisible by sum of those primes

I found this problem in a cp contest which is over now so it can be answered.
Three primes (p1,p2,p3) (not necessarily distinct) are called special if (p1+p2+p3) divides p1*p2*p3. We have to find the number of these special pairs if the primes can't exceed 10^6
I tried brute force method but it timed out. Can there be any other method?
If you are timing out, then you need to do some smart searching to replace brute force. There are just short of 80,000 primes below a million so it is not surprising you timed out.
So, you need to start looking more carefully.
For example, any triple (2, p, p+2) where p+2 is also prime will meet the criteria:
2 + 3 + 5 = 10; 2 * 3 * 5 = 30; 30 / 10 = 3.
2 + 5 + 7 = 14; 2 * 5 * 7 = 70. 70 / 14 = 5.
...
2 + p + p+2 = 2(p+2); 2 * p * (p+2) = 2p(p+2); 2p(p+2) / 2(p+2) = p.
...
Are there other triples that start with 2? Are there triples that start with 3? What forms do p2 and p3 take if p1= 3? Run your program for triples up to 500 or so and look for patterns in the results. Then extrapolate those results to 10^6.
I assume you are using a Sieve to generate your initial list of primes.
I've experimented with this problem since you posted it. I've not solved it, but wanted to pass along what insight I have before I move onto something else:
Generating Primes is Not the Issue
With a proper sieve algorithm, we can generate all primes under 10**6 in a fraction of a second. (Less than 1/3 of a second on my Mac mini.) Spending time optimizing prime generation beyond this is time wasted.
The Brute Force Method
If we try to generate all permutations of three primes in Python, e.g.:
for prime_1 in primes:
for prime_2 in primes:
if prime_2 < prime_1:
continue
for prime_3 in primes:
if prime_3 < prime_2:
continue
pass
Or better yet, push the problem down to the C level via Python's itertools:
from itertools import combinations_with_replacement
for prime_1, prime_2, prime_3 in combinations_with_replacement(primes, 3):
pass
Then, our timings, doing no actual work except generating prime triples, looks like:
sec.
10**2 0.04
10**3 0.13
10**4 37.37
10**5 ?
You can see how much time increases with each order of magnitude. Here's my example of a brute force solution:
from itertools import combinations_with_replacement
def sieve_primes(n): # assumes n > 1
sieve = [False, False, True] + [True, False] * ((n - 1) // 2)
p = 3
while p * p <= n:
if sieve[p]:
for i in range(p * p, n + 1, p):
sieve[i] = False
p += 2
return [number for number, isPrime in enumerate(sieve) if isPrime]
primes = sieve_primes(10 ** 3)
print("Finished Sieve:", len(primes), "primes")
special = 0
for prime_1, prime_2, prime_3 in combinations_with_replacement(primes, 3):
if (prime_1 * prime_2 * prime_3) % (prime_1 + prime_2 + prime_3) == 0:
special += 1
print(special)
Avoid Generating Triples, but Still Brute Force
Here's an approach that avoids generating triples. We take the smallest and largest primes we generated, cube them, and loop over them with a custom factoring function. This custom factoring function only returns a value for those numbers that are made up of exactly three prime factors. For any number made up of more or less, it returns None. This should be faster than normal factoring as the function can give up early.
Numbers that factor into exactly three primes are easy to test for specialness. We're going to pretend our custom factoring function takes no time at all and simply measure how long it takes us to loop over all the numbers in question:
smallest_factor, largest_factor = primes[0], primes[-1]
for number in range(smallest_factor**3, largest_factor**3):
pass
Again, some timings:
sec.
10**2 0.14
10**3 122.39
10**4 ?
Doesn't look promising. In fact, worse than our original brute force method. And our custom factoring function in reality adds a lot of time. Here's my example of this solution (copy sieve_primes() from the previous example):
def factor_number(n, count):
size = 0
factors = []
for prime in primes:
while size < count and n % prime == 0:
factors.append(prime)
n //= prime
size += 1
if n == 1 or size == count:
break
if n > 1 or size < count:
return None
return factors
primes = sieve_primes(10 ** 3)
print("Finished Sieve:", len(primes), "primes")
special = 0
smallest_factor, largest_factor = primes[0], primes[-1]
for number in range(smallest_factor**3, largest_factor**3):
factors = factor_number(number, 3)
if factors:
if number % sum(factors) == 0:
special += 1
print(special)

C++: What are some general ways to make code more efficient for use with large numbers?

Please when answering this question try to be as general as possible to help the wider community, rather than just specifically helping my issue (although helping my issue would be great too ;) )
I seem to be encountering this problem time and time again with the simple problems on Project Euler. Most commonly are the problems that require a computation of the prime numbers - these without fail always fail to terminate for numbers greater than about 60,000.
My most recent issue is with Problem 12:
The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28
We can see that 28 is the first triangle number to have over five divisors.
What is the value of the first triangle number to have over five hundred divisors?
Here is my code:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
int main() {
int numberOfDivisors = 500;
//I begin by looping from 1, with 1 being the 1st triangular number, 2 being the second, and so on.
for (long long int i = 1;; i++) {
long long int triangularNumber = (pow(i, 2) + i)/2
//Once I have the i-th triangular, I loop from 1 to itself, and add 1 to count each time I encounter a divisor, giving the total number of divisors for each triangular.
int count = 0;
for (long long int j = 1; j <= triangularNumber; j++) {
if (triangularNumber%j == 0) {
count++;
}
}
//If the number of divisors is 500, print out the triangular and break the code.
if (count == numberOfDivisors) {
cout << triangularNumber << endl;
break;
}
}
}
This code gives the correct answers for smaller numbers, and then either fails to terminate or takes an age to do so!
So firstly, what can I do with this specific problem to make my code more efficient?
Secondly, what are some general tips both for myself and other new C++ users for making code more efficient? (I.e. applying what we learn here in the future.)
Thanks!
The key problem is that your end condition is bad. You are supposed to stop when count > 500, but you look for an exact match of count == 500, therefore you are likely to blow past the correct answer without detecting it, and keep going ... maybe forever.
If you fix that, you can post it to code review. They might say something like this:
Break it down into separate functions for finding the next triangle number, and counting the factors of some number.
When you find the next triangle number, you execute pow. I perform a single addition.
For counting the number of factors in a number, a google search might help. (e.g. http://www.cut-the-knot.org/blue/NumberOfFactors.shtml ) You can build a list of prime numbers as you go, and use that to quickly find a prime factorization, from which you can compute the number of factors without actually counting them. When the numbers get big, that loop gets big.
Tldr: 76576500.
About your Euler problem, some math:
Preliminary 1:
Let's call the n-th triangle number T(n).
T(n) = 1 + 2 + 3 + ... + n = (n^2 + n)/2 (sometimes attributed to Gauss, sometimes someone else). It's not hard to figure it out:
1+2+3+4+5+6+7+8+9+10 =
(1+10) + (2+9) + (3+8) + (4+7) + (5+6) =
11 + 11 + 11 + 11 + 11 =
55 =
110 / 2 =
(10*10 + 10)/2
Because of its definition, it's trivial that T(n) + n + 1 = T(n+1), and that with a<b, T(a)<T(b) is true too.
Preliminary 2:
Let's call the divisor count D. D(1)=1, D(4)=3 (because 1 2 4).
For a n with c non-repeating prime factors (not just any divisors, but prime factors, eg. n = 42 = 2 * 3 * 7 has c = 3), D(n) is c^2: For each factor, there are two possibilites (use it or not). The 9 possibile divisors for the examples are: 1, 2, 3, 7, 6 (2*3), 14 (2*7), 21 (3*7), 42 (2*3*7).
More generally with repeating, the solution for D(n) is multiplying (Power+1) together. Example 126 = 2^1 * 3^2 * 7^1: Because it has two 3, the question is no "use 3 or not", but "use it 1 time, 2 times or not" (if one time, the "first" or "second" 3 doesn't change the result). With the powers 1 2 1, D(126) is 2*3*2=12.
Preliminary 3:
A number n and n+1 can't have any common prime factor x other than 1 (technically, 1 isn't a prime, but whatever). Because if both n/x and (n+1)/x are natural numbers, (n+1)/x - n/x has to be too, but that is 1/x.
Back to Gauss: If we know the prime factors for a certain n and n+1 (needed to calculate D(n) and D(n+1)), calculating D(T(n)) is easy. T(N) = (n^2 + n) / 2 = n * (n+1) / 2. As n and n+1 don't have common prime factors, just throwing together all factors and removing one 2 because of the "/2" is enough. Example: n is 7, factors 7 = 7^1, and n+1 = 8 = 2^3. Together it's 2^3 * 7^1, removing one 2 is 2^2 * 7^1. Powers are 2 1, D(T(7)) = 3*2 = 6. To check, T(7) = 28 = 2^2 * 7^1, the 6 possible divisors are 1 2 4 7 14 28.
What the program could do now: Loop through all n from 1 to something, always factorize n and n+1, use this to get the divisor count of the n-th triangle number, and check if it is >500.
There's just the tiny problem that there are no efficient algorithms for prime factorization. But for somewhat small numbers, todays computers are still fast enough, and keeping all found factorizations from 1 to n helps too for finding the next one (for n+1). Potential problem 2 are too large numbers for longlong, but again, this is no problem here (as can be found out with trying).
With the described process and the program below, I got
the 12375th triangle number is 76576500 and has 576 divisors
#include <iostream>
#include <vector>
#include <cstdint>
using namespace std;
const int limit = 500;
vector<uint64_t> knownPrimes; //2 3 5 7...
//eg. [14] is 1 0 0 1 ... because 14 = 2^1 * 3^0 * 5^0 * 7^1
vector<vector<uint32_t>> knownFactorizations;
void init()
{
knownPrimes.push_back(2);
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 0 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 1 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 1)); //factors for 2
}
void addAnotherFactorization()
{
uint64_t number = knownFactorizations.size();
size_t len = knownPrimes.size();
for(size_t i = 0; i < len; i++)
{
if(!(number % knownPrimes[i]))
{
//dividing with a prime gets a already factorized number
knownFactorizations.push_back(knownFactorizations[number / knownPrimes[i]]);
knownFactorizations[number][i]++;
return;
}
}
//if this failed, number is a newly found prime
//because a) it has no known prime factors, so it must have others
//and b) if it is not a prime itself, then it's factors should've been
//found already (because they are smaller than the number itself)
knownPrimes.push_back(number);
len = knownFactorizations.size();
for(size_t s = 0; s < len; s++)
{
knownFactorizations[s].push_back(0);
}
knownFactorizations.push_back(knownFactorizations[0]);
knownFactorizations[number][knownPrimes.size() - 1]++;
}
uint64_t calculateDivisorCountOfN(uint64_t number)
{
//factors for number must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
for(size_t s = 0; s < len; s++)
{
if(knownFactorizations[number][s])
{
res *= (knownFactorizations[number][s] + 1);
}
}
return res;
}
uint64_t calculateDivisorCountOfTN(uint64_t number)
{
//factors for number and number+1 must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
vector<uint32_t> tmp(len, 0);
size_t s;
for(s = 0; s < len; s++)
{
tmp[s] = knownFactorizations[number][s]
+ knownFactorizations[number+1][s];
}
//remove /2
tmp[0]--;
for(s = 0; s < len; s++)
{
if(tmp[s])
{
res *= (tmp[s] + 1);
}
}
return res;
}
int main()
{
init();
uint64_t number = knownFactorizations.size() - 2;
uint64_t DTn = 0;
while(DTn <= limit)
{
number++;
addAnotherFactorization();
DTn = calculateDivisorCountOfTN(number);
}
uint64_t tn;
if(number % 2) tn = ((number+1)/2)*number;
else tn = (number/2)*(number+1);
cout << "the " << number << "th triangle number is "
<< tn << " and has " << DTn << " divisors" << endl;
return 0;
}
About your general question about speed:
1) Algorithms.
How to know them? For (relatively) simple problems, either reading a book/Wikipedia/etc. or figuring it out if you can. For harder stuff, learning more basic things and gaining experience is necessary before it's even possible to understand them, eg. studying CS and/or maths ... number theory helps a lot for your Euler problem. (It will help less to understand how a MP3 file is compressed ... there are many areas, it's not possible to know everything.).
2a) Automated compiler optimizations of frequently used code parts / patterns
2b) Manual timing what program parts are the slowest, and (when not replacing it with another algorithm) changing it in a way that eg. requires less data send to slow devices (HDD, hetwork...), less RAM memory access, less CPU cycles, works better together with OS scheduler and memory management strategies, uses the CPU pipeline/caches better etc.etc. ... this is both education and experience (and a big topic).
And because long variables have a limited size, sometimes it is necessary to use custom types that use eg. a byte array to store a single digit in each byte. That way, it's possible to use the whole RAM for a single number if you want to, but the downside is you/someone has to reimplement stuff like addition and so on for this kind of number storage. (Of course, libs for that exist already, without writing everything from scratch).
Btw., pow is a floating point function and may get you inaccurate results. It's not appropriate to use it in this case.

Calculate this factorial term in C++ with basic datatypes

I am solving a programming problem, and in the end the problem boils down to calculating following term:
n!/(n1!n2!n3!....nm!)
n<50000
(n1+n2+n3...nm)<n
I am given that the final answer will fit in 8 byte. I am using C++. How should I calculate this. I am able to come up with some tricks but nothing concrete and generalized.
EDIT:
I would not like to use external libraries.
EDIT1 :
Added conditions and result will be definitely 64 bit int.
If the result is guaranteed to be an integer, work with the factored representation.
By the theorem of Legendre, you can express all these factorials by the sequence of exponents of the primes in the range (2,n).
By deducting the exponents of the factorials in the denominator from those in the numerator, you will obtain exponents for the whole quotient. The computation will then reduce to a product of primes that will never overflow the 8 bytes.
For example,
25! = 2^22.3^10.5^6.7^3.11^2.13.17.19.23
15! = 2^11.3^6.5^3.7^2.11.13
10! = 2^8.3^4.5^2.7
yields
25!/(15!.10!) = 2^3.5.11.17.19.23 = 3268760
The exponents of, say, 3 are found by
25/3 + 25/9 = 10
15/3 + 15/9 = 6
10/3 + 10/9 = 4
If all the input (not necessarily the output) is made of integers, you could try to count prime factors. You create an array of size sqrt(n) and fill it with the counts of each prime factor in n :
vector <int> v = vector <int> (sqrt(n)+1,0);
int m = 2;
while (m <=n) {
int i = 2;
int a = m;
while (a >1) {
while (a%i ==0) {
v[i] ++;
a/=i;
}
i++;
}
m++;
}
Then you iterate over the n_k (1 <= k <= m) and you decrease the count for each prime factor. This is pretty much the same code as above except that you replace the v[i]++ by v[i] --. Of course you need to call it with vector v previously obtained.
After that the vector v contains the list of count of prime factors in your expression and you just need to reconstruct the result as
int result = 1;
for (int i = 2; i < v.size(); v++) {
result *= pow(i,v[i]);
}
return result;
Note : you should use long long int instead of int above but I stick to int for simplicity
Edit : As mentioned in another answer, it would be better to use Legendre theorem to fill / unfill the vector v faster.
What you can do is to use the properties of the logarithm:
log(AB) = log(A) + log(B)
log(A/B) = log(A) - log(B)
and
X = e^(log(X))
So you can first compute the logarithm of your quantity, then exponentiate back:
log(N!/(n1!n2!...nk!)) = log(1) + ... + log(N) - [log(n1!) - ... log(nk!)]
then expand log(n1!) etc. so you end up writing everything in terms of logarithm of single numbers. Then take the exponential of your result to obtain the initial value of the factorial.
As #T.C. mentioned, this method may not be to accurate, although in typical scenarios you'll have many terms reduced. Alternatively, you expand each factorial into a list that stores the terms in its product, e.g. 6! will be stored in a list {1,2,3,4,5,6}. You do the same for the denominator terms. Then you start removing common elements. Finally, you can take gcd's and reduce everything to coprime factors, then compute the result.

Code with prime numbers

I am trying to solve one problem from on-line judging system. I have a solution which works, but not efficient enough. Here is the problem:
Which the least number n can we imagine in product n = a∙b like k ways? Products a∙b and b∙a is one of the way, where all numbers is natural (1≤ k ≤50).
Input One number k.
Output One number n.
My code did not pass four tests. It is too slow for k=31, 37, 47. I have been thinking on this problem 2 days,but no improvement. Here is my code, please share, if you have any ideas.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
int prime[10000];
long x,j,i,flag,k,length,p,checker,count,number;
int main()
{
prime[0]=2;
scanf("%ld",&k);
//I find prime numbers between 1 and 1000. 1000 can be changed, just for testing
for (i=3;i<=1000;i=i+2)
{
flag=0;
for (j=2;j<=sqrt(i);j++)
{
if(i%j==0)
{
flag=1;
break;
}
}
if(flag==0)
{
x++;
prime[x]=i;
}
}
length=x;
//this loop is too big I know, again for testing. I suspect, there must be a way to make some changes to this for loop
for (i=1;i<10000000000;i++)
{
number=i;
p=1;
for(x=0;x<=length;x++)
{
if(prime[x]>sqrt(i))
break;
count=0;
while(number%prime[x]==0)
{
number=number/prime[x];
count++;
}
p=p*(count+1);
//I find prime factors of numbers and their powers, then calculate number of divisors
}
//printf("%d\n",p);
//number of ways is just number of divisors/2 or floor (divisors/2)+1
if(p%2==0)
checker=p/2;
else
checker=floor(p/2)+1;
if(checker==k)
{
printf("%ld\n",i);
break;
}
}
return 0;
}
If I understand the problem correctly it's asking you which is the least number n with exactly 2k divisors (should I consider 1 and n?)
in fact if a number has a divisor a, then n / a = b is an integer and n = a* b (counting only one time a and b, so you should divide by two the number of divisors)
edit
Doing that is time consuming indeed. So this is the idea;
for a number n in the form n = p1^(a1)*p2^(a2)...pn^(an) (this is the prime factorization of the number) the number of divisor is (a1 + 1)(a2+1)...(an+1)
Hence, if you want to find a number that has k divisor, factorize k. then assign the biggest factor to the smallest prime; eg if k = 2*5*7, then n should be 2^7*3^5*5^2
I know it is not since i didnt take into account that (a, b) is equal to (b, a) but play around it a little and it should work
example
take k = 37. Then double the number - (to consider the symmetry). You get 74.
Now, if you can imagine n as n = n * 1, then you just need to factor 74 (that is 2 * 37);
then give 36 to 2 and 1 to 3, leading n = 2^(36)*3 = 206158430208
if you can't, then you need to add 1 to the number you got previously (in this case, 74 + 1 = 75 = 25*3); this way you get n = 2^24 * 3^2 = 150994944
If it's none of the above, then I am probably wrong...

Factor a large number efficiently with gmp

I need to get all the prime factors of large numbers that can easily get to 1k bits.
The numbers are practically random so it shouldn't be hard.
How do I do it efficiently? I use C++ with GMP library.
EDIT:
I guess you all misunderstood me.
What I mean by prime a number is to get all prime factors of the number.
Sorry for my english, in my language prime and factor are the same :)
clarification (from OP's other post):
What I need is a way to efficiently factor(find prime factors of a number) large numbers(may get to 2048 bits) using C++ and GMP(Gnu Multiple Precession lib) or less preferably any other way.
The numbers are practically random so there is little chance it will be hard to factor, and even if the number is hard to factor, I can re-roll the number(can't choose though).
A good start would be some pre-filtering with small primes, say about all primes lower than 100 000 or so. Simply try to divide by every single one of them (create a table which you then load at runtime or have it as static data in your code). It might seem slow and stupid, but if the number is totally random, this will give you some factors very fast with a huge probability. Then look at the remaining number and decide what to do next. If it is quite small (what "small" means is up to you) you could try a primality test (there is something in GMP i think) and if it gives it is a prime, you can in most of the cases trust it. Otherwise you have to factor it further.
If your numbers are really huge and you care about performance, then you definitely need to implement something more sophisticated than just a stupid division. Look at Quadratic Sieve (try wikipedia). It is quite simple but very powerful. If you are up to the chalenge, try MPQS, a variant of the quadratic sieve algorithm. This forum is a good source of information. There are even existing implementations of a tool you need - see for example this.
Note though that numbers with 1k bits are huge by all means. Factoring such a number (even with MPQS or others) might take years if you are lucky and forever if not. I think that MPQS performs well with numbers of about 100-400 bits (if they are composed of two primes almost equally large, which is the hardest case of course).
Below is a sample algorithm in Java (it's not C++ with GMP, but converting should be pretty straightforward) that:
generates a random number x of bitlength Nbits
tries to factor out all prime factors < 100, keeping a list of prime factors that divide x.
tests to see if the remaining factor is prime using Java's isProbablePrime method
If the remaining factor product is prime with sufficient probability, we have succeeded in factoring x. (STOP)
Otherwise the remaining factor product is definitely composite (see the isProbablePrime docs).
While we still have time, we run the Pollard rho algorithm until we find a divisor d.
If we run out of time, we have failed. (STOP)
We have found a divisor d. So we factor out d, add the prime factors of d to the list of prime factors of x, and go to step 4.
All the parameters of this algorithm are near the beginning of the program listing. I looked for 1024-bit random numbers, with a timeout of 250 milliseconds, and I keep running the program until I get a number x with at least 4 prime factors (sometimes the program finds a number with 1, 2, or 3 prime factors first). With this parameter set, it usually takes about 15-20 seconds on my 2.66Ghz iMac.
Pollard's rho algorithm isn't really that efficient, but it's simple, compared to the quadratic sieve (QS) or the general number field sieve (GNFS) -- I just wanted to see how the simple algorithm worked.
Why this works: (despite the claim of many of you that this is a hard problem)
The plain fact of it is, that prime numbers aren't that rare. For 1024-bit numbers, the Prime Number Theorem says that about 1 in every 1024 ln 2 (= about 710)
numbers is prime.
So if I generate a random number x that is prime, and I accept probabilistic prime detection, I've successfully factored x.
If it's not prime, but I quickly factor out a few small factors, and the remaining factor is (probabilistically) prime, then I've successfully factored x.
Otherwise I just give up and generate a new random number. (which the OP says is acceptible)
Most of the numbers successfully factored will have 1 large prime factor and a few small prime factors.
The numbers that are hard to factor are the ones that have no small prime factors and at least 2 large prime factors (these include cryptographic keys that are the product of two large numbers; the OP has said nothing about cryptography), and I can just skip them when I run out of time.
package com.example;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class FindLargeRandomComposite {
final static private int[] smallPrimes = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97};
final static private int maxTime = 250;
final static private int Nbits = 1024;
final static private int minFactors = 4;
final static private int NCERTAINTY = 4096;
private interface Predicate { public boolean isTrue(); }
static public void main(String[] args)
{
Random r = new Random();
boolean found = false;
BigInteger x=null;
List<BigInteger> factors=null;
long startTime = System.currentTimeMillis();
while (!found)
{
x = new BigInteger(Nbits, r);
factors = new ArrayList<BigInteger>();
Predicate keepRunning = new Predicate() {
final private long stopTime = System.currentTimeMillis() + maxTime;
public boolean isTrue() {
return System.currentTimeMillis() < stopTime;
}
};
found = factor(x, factors, keepRunning);
System.out.println((found?(factors.size()+" factors "):"not factored ")+x+"= product: "+factors);
if (factors.size() < minFactors)
found = false;
}
long stopTime = System.currentTimeMillis();
System.out.println("Product verification: "+(x.equals(product(factors))?"passed":"failed"));
System.out.println("elapsed time: "+(stopTime-startTime)+" msec");
}
private static BigInteger product(List<BigInteger> factors) {
BigInteger result = BigInteger.ONE;
for (BigInteger f : factors)
result = result.multiply(f);
return result;
}
private static BigInteger findFactor(BigInteger x, List<BigInteger> factors,
BigInteger divisor)
{
BigInteger[] qr = x.divideAndRemainder(divisor);
if (qr[1].equals(BigInteger.ZERO))
{
factors.add(divisor);
return qr[0];
}
else
return x;
}
private static BigInteger findRepeatedFactor(BigInteger x,
List<BigInteger> factors, BigInteger p) {
BigInteger xprev = null;
while (xprev != x)
{
xprev = x;
x = findFactor(x, factors, p);
}
return x;
}
private static BigInteger f(BigInteger x, BigInteger n)
{
return x.multiply(x).add(BigInteger.ONE).mod(n);
}
private static BigInteger gcd(BigInteger a, BigInteger b) {
while (!b.equals(BigInteger.ZERO))
{
BigInteger nextb = a.mod(b);
a = b;
b = nextb;
}
return a;
}
private static BigInteger tryPollardRho(BigInteger n,
List<BigInteger> factors, Predicate keepRunning) {
BigInteger x = new BigInteger("2");
BigInteger y = x;
BigInteger d = BigInteger.ONE;
while (d.equals(BigInteger.ONE) && keepRunning.isTrue())
{
x = f(x,n);
y = f(f(y,n),n);
d = gcd(x.subtract(y).abs(), n);
}
if (d.equals(n))
return x;
BigInteger[] qr = n.divideAndRemainder(d);
if (!qr[1].equals(BigInteger.ZERO))
throw new IllegalStateException("Huh?");
// d is a factor of x. But it may not be prime, so run it through the factoring algorithm.
factor(d, factors, keepRunning);
return qr[0];
}
private static boolean factor(BigInteger x0, List<BigInteger> factors,
Predicate keepRunning) {
BigInteger x = x0;
for (int p0 : smallPrimes)
{
BigInteger p = new BigInteger(Integer.toString(p0));
x = findRepeatedFactor(x, factors, p);
}
boolean done = false;
while (!done && keepRunning.isTrue())
{
done = x.equals(BigInteger.ONE) || x.isProbablePrime(NCERTAINTY);
if (!done)
{
x = tryPollardRho(x, factors, keepRunning);
}
}
if (!x.equals(BigInteger.ONE))
factors.add(x);
return done;
}
}
You could use Pollard p-1 factorization algorithm if the number you want to factor has small prime factors. It has factored out a 30 digit prime factor of the number 2 ^ 740 + 1. ECM is a similar but sub-exponetial algorithm but implementation is more difficult. The amount of time the algorithm is based on what the bound b is set as. It will factor any number which has a factor p where p - 1 is b-smooth.
//Pollard p - 1 factorization algorithm
void factor(mpz_t g, mpz_t n, long b)
{
//sieve for primes
std::vector<bool> r;
for(int i = 0; i < b; i++)
r.push_back(true);
for(int i = 2; i < ceil(sqrt(b - 1)); i++)
if(r.at(i) == true)
for(int j = i * i; j < b; j += i)
r.at(j) = false;
std::vector<long> p;
std::vector<long> a;
for(int i = 2; i < b; i++)
if(r[i] == true)
{
p.push_back(i);//Append the prime on to the vector
int temp = floor(log(b) / log(i)); //temp = logb(i)
// put primes in to sieve
// a = the maximum power for p ^ a < bound b
if(temp == 0)
a.push_back(1);
else
a.push_back(temp);
}
int m = p.size();//m = number of primes under bound b
mpz_t c;// c is the number Which will be exponated
mpz_init(c);
long two = 2;
mpz_set_ui(c, two);// set c to 2
int z = 0;
long x = 2;
// loop c until a factor is found
for(;;)
{
mpz_set_si( c, x);
//powering ladder
for(long i = 0; i < m; i++)
for(long j = 0; j < a[i]; j++)
mpz_powm_ui(c , c, (p[i]), n);
//check if a factor has been found;
mpz_sub_ui(c ,c,1);
mpz_gcd(g ,c, n);
mpz_add_ui(c , c, 1);
//if g is a factor return else increment c
if((mpz_cmp_si(g,1)) > 0 && (mpz_cmp(g,n)) < 0)
return;
else if (x > b)
break;
else
x++;
}
}
int main()
{
mpz_t x;
mpz_t g;
//intialize g and x
mpz_init(g);
mpz_init_set_str(x,"167698757698757868925234234253423534235342655234234235342353423546435347",10);
//p-1 will factor x as long as it has a factor p where p - 1 is b-smooth(has all prime factors less than bound b)
factor(g , x, 1000);
//output the factor, it will output 1 if algorithm fails
mpz_out_str(NULL, 10, g);
return 0;
}
Outputs - 7465647
Execution time - 0.003 seconds
Another Factoring algorithm created by J.Pollard was Pollards Rho algorithm which is not that quick but requires very little space. Their are also ways to parrelize it. Its complexity is O(n^1/4)
//Pollard rho factoring algorithm
void rho(mpz_t g, mpz_t n)
{
mpz_t x;
mpz_t y;
mpz_init_set_ui(x ,2);
mpz_init_set_ui(y ,2);//initialize x and y as 2
mpz_set_ui(g , 1);
mpz_t temp;
mpz_init(temp);
if(mpz_probab_prime_p(n,25) != 0)
return;//test if n is prime with miller rabin test
int count;
int t1 = 0;
int t2 = 1;
int nextTerm = t1 + t2;
while(mpz_cmp_ui(g,1) < 1)
{
f(x,n);//x is changed
f(y,n);//y is going through the sequence twice as fast
f(y,n);
if(count == nextTerm)//calculate gcd every fibonacci number
{
mpz_sub(temp,x,y);
mpz_gcd(g , temp, n);
t1 = t2;
t2 = nextTerm;
nextTerm = t1 + t2;//calculate next fibonacci number
}
count ++;
}
return;
}
int main()
{
mpz_t x;
mpz_t g;
//intialize g and x
mpz_init(g);
mpz_init_set_str(x,"167698757698757868925234234253423",10);
rho(g , x);
//output the factor, it will output 1 if algorithm fails
mpz_out_str(NULL, 10, g);
return 0;
}
Outputs - 353
Execution time - 0.003s
At the moment you cannot factor a bigint with GMP. You can convert your bigint to other libraries and use their factoring algorithms. Note that factoring of integers with >>20 digits needs specialized algorithms and is near exponentially slow.
Check out:
http://flintlib.org/
http://pari.math.u-bordeaux.fr/
http://ecm.gforge.inria.fr/
Interesting task you have! Thanks!
It was a pleasure for me to spend two almost whole days to write very advanced solution. I implemented from scratch three factorization algorithms: Trial Division, Pollard's Rho, Lenstra Elliptic Curve Method (ECM).
It is well known that ECM method (with elliptic curves) is one of the fastest methods for mid-range factors. While trial division is good for very small factors (up to 2^20 factor per second), Pollard Rho is good for bigger yet small factors (up to 2^40 per second), while ECM is good for mid-range factors (up to 2^60 per 10 seconds).
There are also very advanced methods like General Number Field Sieve (GNFS) (factors up to 2^700 per month), but they are very difficult to implement. Also Quadratic Sieve method is advanced (probably up to 2^400 per month), I also implemented this from sratch but it has very big code, yet manageable to understand, but due to its size I don't attach it here. ECM method was the only method quite easy to implement among advanced methods.
Besides mentioned above 3 methods of factorization that I implemented, I also used following algorithms inside code: Modular Exponentiation, Fermat Primality Test, Barrett Reduction, Euclidean Algorithm, Extended Euclidean Algorithm, Modular Multiplicative Inverse, Elliptic Curve Point Addition and Multiplication.
Actually your task as it is is very easy to solve fast: it is known that for bit size 2048 there appears a prime once approximately every ln(2048) = 1420 number, so you just generate fast around 1500 numbers while checking if they are prime, for example using Fermat Primality Test which is very fast. And if a number is prime then by definition it is already factored as it is.
I extended in my mind your task further, to make it more interesting. I don't search for prime number, but instead trying to find such 2048-bit random-generated numbers such that they have at least several big prime factors. This kind of numbers I will call "interesting". Of course if a number has several tiny factors and one large prime then it is not that interesting. But if it has 60-bit prime factor then it is interesting to catch such number, that's what I do in my code.
You can see in my code that I adopted it for two kinds of libraries, Boost Multiprecision and GMP. Both are included in my code (see #include <boost/multiprecision/cpp_int.hpp> and #include <gmpxx.h>), so you should install and link both. Under Linux it is very easy to install both through sudo apt install libboost-all-dev libgmp-dev. But Windows is a bit tricky, install Chocolatey Packet Manager first and then do command choco install boost-msvc-14.3. And for GMP install VCPKG as described here and then vcpkg install gmp. If you want you may install Boost through VCPKG too: vcpkg install boost.
ECM (elliptic curve) method is very interesting and simple:
You generate many random curves, with random X, Y, A, B, N params, where N is your input number that needs to be factored, and other params are random that fit curve equation Y^2 = X^3 + A * x + B (mod N).
You multiply each curve by all growing prime numbers (with some small power).
At some point you'll get a multiple of curve order for some first curve, and by a property of curve order in such a case you'll get so-called Infinite point on curve.
If you look at Point Addition formula then you may see that there is a denominator in formula lambda = (y_q - y_p) / (x_q - x_p). This denominator is computed as Modular Multiplicative Inverse modulus N. For Infinite point it becomes non-invertible, and by property of inverse number non-invertibility is only possible when GCD(N, x_q - x_p) != 1, in which case GCD gives some non-trivial factor (sometimes also N), hence our N is factored successfully by giving first factor.
If we don't get Infinite point, then we continue to generate more curves and to divide by more (bigger and bigger) prime numbers. More curves we generate and more primes we multiply by, the higher is success of factorization.
Try it online!
SOURCE CODE HERE. As StackOverflow has limit of 30K symbols per post and my code alone is about 44K bytes, I couldn't inline it here, but instead sharing it through Github Gist (link below). Also same code is above available through Try it online! link on GodBolt server.
GitHub Gist code
Example console output:
TrialDiv time 8.230 sec
Num: 4343663370925180057127849780941698665126534031938076094687921681578209757551374613160773985436765755919464255163981465381983273353052491 (2^453.90)
Factored: 13 (2^3.70, prime), 167 (2^7.38, prime), 3853 (2^11.91, prime), 53831 (2^15.72, prime), 916471 (2^19.81, prime), 9255383 (2^23.14, prime),
UnFactored: 11372390351822722497588418782148940973499109818654526670537593527638523385195910987808859992169924704037636069779 (2^372.24, composite),
PollardRho time 8.51 sec
Num: 11372390351822722497588418782148940973499109818654526670537593527638523385195910987808859992169924704037636069779 (2^372.24)
Factored: 189379811 (2^27.50, prime), 2315962907 (2^31.11, prime), 50213994043 (2^35.55, prime),
UnFactored: 5163708449171395447719565208770850251589387410889704005960043195676697732937073689 (2^278.09, composite),
Curves 1, Ops 12.965 Ki, ExtraPrimes 512, Primes 0.500 Ki, IterTime 0.410 sec
Curves 2, Ops 50.912 Ki, ExtraPrimes 512, Primes 1.000 Ki, IterTime 8.062 sec
Curves 3, Ops 112.586 Ki, ExtraPrimes 464, Primes 1.453 Ki, IterTime 15.093 sec
Curves 4, Ops 162.931 Ki, ExtraPrimes 120, Primes 1.570 Ki, IterTime 4.853 sec
Curves 5, Ops 193.699 Ki, ExtraPrimes 80, Primes 1.648 Ki, IterTime 4.201 sec
ECM time 32.624 sec
Num: 5163708449171395447719565208770850251589387410889704005960043195676697732937073689 (2^278.09)
Factored: 955928964443 (2^39.80, prime),
UnFactored: 540177004907359979630305557131905764121354872876442621652476639261690523 (2^238.29, composite),
Final time 49.385 sec
Num: 4343663370925180057127849780941698665126534031938076094687921681578209757551374613160773985436765755919464255163981465381983273353052491 (2^453.90)
Factored: 13 (2^3.70, prime), 167 (2^7.38, prime), 3853 (2^11.91, prime), 53831 (2^15.72, prime), 916471 (2^19.81, prime), 9255383 (2^23.14, prime), 189379811 (2^27.50, prime), 2315962907 (2^31.11, prime), 50213994043 (2^35.55, prime), 955928964443 (2^39.80, prime),
UnFactored: 540177004907359979630305557131905764121354872876442621652476639261690523 (2^238.29, composite),