I am looking for the fastest algorithm to check if a number is a prime. The algorithm doesn't have to be deterministic as long as the chance of it failing is very small. Preferably it should be possible to control the possibility of failure by some parameter like "iteration count".
It would be enough for the algorithm to work for integers <= 10^18, but it would be better if it worked for all integers representable by a C++ unsigned long long assuming it being 64 bits (18,446,744,073,709,551,615).
There are already some questions like this one, but they require the algorithm to be deterministic, while for me its fine if it is probabilistic, as long as its "mostly accurate".
As others said, consider Miller-Rabin tests.
Here is a link for testing numbers less than 2^64: https://www.techneon.com/
You have to test at most three different bases per candidate. To get something probabilistic but about three time faster, just check one randomly chosen out of those three.
I believe Miller-Rabin primality testing algorithm fits your needs perfectly.
Some resources:
Miller-Rabin Wikipedia
Implementation and extra information
Related
Until now, I've been using sieve of Eratosthenes for generating all 'n' prime numbers.
However, I was wondering does there exist a better algorithm, or can we improve an existing one which performs better ??
For sufficiently large N (e.g. more than a million or so), the best algorithm is to use an approximation (e.g. Logarithmic Integral or Riemann's R function), then use a fast prime count method such as LMO, then sieve the small remainder. This is many orders of magnitude faster than sieving.
See https://math.stackexchange.com/questions/507178/most-efficient-algorithm-for-nth-prime-deterministic-and-probabilistic
There are at least two open source implementations.
https://metacpan.org/pod/ntheory
https://github.com/kimwalisch/primecount
The latter has progressed past the first, and is also multithreaded.
Add: Will Ness has also pointed out a nice post from Daniel Fischer that provides a walkthrough of different ways to solve this: Calculating and printing the nth prime number
I'm using GMP (with MPIR) for arbitrary size datatypes. I also use its primality test function, which uses Miller-Rabin method, but it is not accurate. This is what I want to fix.
I was able to confirm that the number 18446744073709551253 is a prime by using brute-force, with the sqrt approach.
Is there any way of checking large numbers being prime or not, with 100% accuracy?
It should not use too much memory/storage space, few megabytes is acceptable.
It should be faster than the sqrt method I used.
It should work for numbers that are at least 64bit in size, or larger.
Finally, it should be 100% accurate, no maybes!
What are my options ?
I could live with the brute force method (for 64bit numbers) though, but out of interest, I want faster & larger. Also, the 64bit number check was too slow: total 43 seconds!
For very large numbers, the AKS primality test is a deterministic primality test that runs in time O(log7.5n log log n), where n is the number of interest. This is exponentially faster than the O(√n) algorithm. However, the algorithm has large constant factors, so it's not practical until your numbers get rather large.
Hope this helps!
As a general point 100% certainty is not possible on a physical computer since there is a small but finite possibility that some component has failed invisibly and that the answer given at the end is not correct. Given that fact, then you can run enough probabilistic Miller-Rabin tests that the probability of the number being composite is far less than the probability that your hardware has failed. It is not difficult to test up to a 1 in 2^256 level of certainty:
boolean isPrime(num)
limit <- 256
certainty <- 0
while (certainty < limit)
if (millerRabin returns notPrime)
return false
exit
else
certainty <- certainty + 2
endif
endwhile
return true
end isPrime
This will test that the number is prime, up to a certainty of 1 in 2^256. Each M-R test adds a factor of four to the certainty. I have seen the resulting primes called "industrial strength primes", good enough for all practical purposes, but not quite for theoretical mathematical certainty.
For small n, trial division works; the limit there is probably somewhere around 10^12. For somewhat larger n, there are various studies (see works of Gerhard Jaeschke and Zhou Zhang) that calculate the smallest pseudoprime for various collections of Miller-Rabin bases; that will take you to about 10^25. After that, things get hard.
The "big guns" of primality proving are the APRCL method (it may be called Jacobi sums or Gaussian sums) and the ECPP method (based on elliptic curves). Both are complex, so you will want to find an implementation, don't write your own. These methods can both handle numbers of several hundred digits.
The AKS method is proven polynomial time, and is easy to implement, but the constant of proportionality is very high, so it is not useful in practice.
If you can factor n-1, or even partially factor it, Pocklington's method can determine the primality of n. Pocklington's method itself is quick, but the factoring may not be.
For all of these, you want to be reasonably certain that a number is prime before you try to prove it. If your number is not prime, all these methods will correctly determine that, but first they will waste much time trying to prove that a composite number is prime.
I have implementations of AKS and Pocklington at my blog.
The method of proving depends on the type of prime number you are trying to prove (for example, the Mersenne primes have special methods for proving primality that work only with them) and the size in decimal digits. If you are looking at hundreds of digits, then there is only one solution, albeit an inadequate one: The AKS algorithm. It is provably faster than other primality proving algorithms for large enough primes, but by the time it becomes useful, it will take so long that it really isn't worth the trouble.
Primality proving for big numbers is still a problem that is not yet sufficiently solved. If it was, the EFF awards would all be awarded and cryptography would have some problems, not for the list of primes, but for the methods used to find them.
I believe that, in the near future, a new algorithm for proving primality will arise that doesn't depend on a pre-generated list of primes up to the square root of n, and that doesn't do a brute-force method for making sure that all primes (and a lot of non-primes as well) under the square root are used as witnesses to n's primality. This new algorithm will probably depend on math concepts that are much simpler than those used by analytic number theory. There are patterns in the primes, that much is certain. Identifying those patterns is a different matter entirely.
I've got a simple problem, but somehow fail to solve it properly :
I would like to test the primality of long long integers (64bits).
The primality requirement comes from the mixing of several hash values; if not respected, there is some kind of "echo" in the resulting output, which degrades the distribution property of the hash formula.
I've got a few interesting candidates, but cannot test at this stage their primality.
I've found a website which proposes just that :
input a number, and it provides the next value which is prime.
The problem is, this website only works for values within the 32 bits range limit.
I've been roaming SO for the same question, and it was asked several times already. However, all answers i've been consulting up to now only points towards methods and algorithms (such as miller rabin, or AKS), carrying a hidden "do-it-yourself" tag.
And that's not what i'm looking for. I do not need to test primality regularly every day from now on, or for a huge numbers of candidates. I just have this need now, and for a very limited number of candidates.
Therefore a Ready-to-use tool which answer just this question (preferably, an online one), would better fit the bill.
But does that exist ?
You can plug in your number at http://www.alpertron.com.ar/ECM.HTM, which will tell you if it is prime or give you its factors if it is not. Or you could use the Factors[n] function at http://www.wolframalpha.com, which does the same thing. Either can quickly handle 64-bit integers.
I'm searching for an algorithm to primality test large (like 10200) numbers.
Are there any good algorithms?
Ideally, I'd prefer an algorithm that isn't probabalistic.
Note: Numbers have over 50 and less then 200 digits.
If you're looking for a non-probabalistic test, you may want to check out the AKS primality testing algorithm, which runs in roughly O(log6 n) time. For the number of digits you have, this is probably feasible.
That said, probabalistic primality tests are extremely good and many have exponentially small error rates. I would suggest using one of those unless there's a good reason not to.
EDIT: I just found this page containing several C++ implementations of AKS. I have no idea whether they work correctly or not, but they might be a good starting point.
Hope this helps!
Typically we would use a probable prime test. I recommend BPSW, which you can follow by a Frobenius test and/or some random-base Miller-Rabin tests if you want more certainty. This will be fast and arguably more certain than running some proof implementations.
Assume you say that isn't good enough. Then you really want to use ECPP and get a certificate. Reasonable implementations are Primo or ecpp-dj. These can prove primality of 200 digit numbers in well under a second, and return a certificate that can be independently verified.
APR-CL is another reasonable method. The downside is that it doesn't return a certificate so you're trusting the implementation -- you get a "yes" or "no" output that is deterministically correct if the implementation was correct. Pari/GP uses APR-CL with its isprime command, and David Cleaver has an excellent open source implementation: mpz_aprcl. Those implementations have had some code review and daily use in various software so should be good.
AKS is a horrible method to use in practice. It doesn't return a certificate, and it's not too hard to find broken implementations, which completely defeats the point of using a proof method vs. good probable prime tests in the first place. It's also horrendously slow. 200 digit numbers are well past the practical point for any implementation I'm aware of. There is a "fast" one included in the previously mentioned ecpp-dj software so you can try it out, and there are quite a few other implementations to be found.
For some idea of speed, here are times of some implementations. I don't know of any implementations of AKS, APR-CL, or BPSW that are faster than the ones shown (please comment if you know of one). Primo starts off a bit slower than ecpp-dj shown, but at 500 or so digits it is faster, and has a better slope past that. It is the program of choice for large inputs (2,000-30,000 digits).
Specifically, I've got a method picks n items from a list in such a way that a% of them meet one criterion, and b% meet a second, and so on. A simplified example would be to pick 5 items where 50% have a given property with the value 'true', and 50% 'false'; 50% of the time the method would return 2 true/3 false, and the other 50%, 3 true/2 false.
Statistically speaking, this means that over 100 runs, I should get about 250 true/250 false, but because of the randomness, 240/260 is entirely possible.
What's the best way to unit test this? I'm assuming that even though technically 300/200 is possible, it should probably fail the test if this happens. Is there a generally accepted tolerance for cases like this, and if so, how do you determine what that is?
Edit: In the code I'm working on, I don't have the luxury of using a pseudo-random number generator, or a mechanism of forcing it to balance out over time, as the lists that are picked out are generated on different machines. I need to be able to demonstrate that over time, the average number of items matching each criterion will tend to the required percentage.
Random and statistics are not favored in unit tests. Unit tests should always return the same result. Always. Not mostly.
What you could do is trying to remove the random generator of the logic you are testing. Then you can mock the random generator and return predefined values.
Additional thoughts:
You could consider to change the implementation to make it more testable. Try to get as less random values as possible. You could for instance only get one random value to determine the deviation from the average distribution. This would be easy to test. If the random value is zero, you should get the exact distribution you expect in average. If the value is for instance 1.0, you miss the average by some defined factor, for instance by 10%. You could also implement some Gaussian distribution etc. I know this is not the topic here, but if you are free to implement it as you want, consider testability.
According to the Statistical information you have, determine a range instead of a particular single value as a result.
Many probabilistic algorithms in e.g. scientific computing use pseudo-random number generators, instead of a true random number generator. Even though they're not truly random, a carefully chosen pseudo-random number generator will do the job just fine.
One advantage of a pseudo-random number generator is that the random number sequence they produce is fully reproducible. Since the algorithm is deterministic, the same seed would always generate the same sequence. This is often the deciding factor why they're chosen in the first place, because experiments need to be repeatable, results reproducible.
This concept is also applicable for testing. Components can be designed such that you can plug in any source of random numbers. For testing, you can then use generators that are consistently seeded. The result would then be repeatable, which is suitable for testing.
Note that if in fact a true random number is needed, you can still test it this way, as long as the component features a pluggable source of random numbers. You can re-plug in the same sequence (which may be truly random if need be) to the same component for testing.
It seems to me there are at least three distinct things you want to test here:
The correctness of the procedure that generates an output using the random source
That the distribution of the random source is what you expect
That the distribution of the output is what you expect
1 should be deterministic and you can unit test it by supplying a chosen set of known "random" values and inputs and checking that it produces the known correct outputs. This would be easiest if you structure the code so that the random source is passed as an argument rather than embedded in the code.
2 and 3 cannot be tested absolutely. You can test to some chosen confidence level, but you must be prepared for such tests to fail in some fraction of cases. Probably the thing you really want to look out for is test 3 failing much more often than test 2, since that would suggest that your algorithm is wrong.
The tests to apply will depend on the expected distribution. For 2 you most likely expect the random source to be uniformly distributed. There are various tests for this, depending on how involved you want to be, see for example Tests for pseudo-random number generators on this page.
The expected distribution for 3 will depend very much on exactly what you're producing. The simple 50-50 case in the question is exactly equivalent to testing for a fair coin, but obviously other cases will be more complicated. If you can work out what the distribution should be, a chi-square test against it may help.
That depends on the use you make of your test suite. If you run it every few seconds because you embrace test-driven development and aggressive refactoring, then it is very important that it doesn't fail spuriously, because this causes major disruption and lowers productivity, so you should choose a threshold that is practically impossible to reach for a well-behaved implementation. If you run your tests once a night and have some time to investigate failures you can be much stricter.
Under no circumstances should you deploy something that will lead to frequent uninvestigated failures - this defeats the entire purpose of having a test suite, and dramatically reduces its value to the team.
You should test the distribution of results in a "single" unit test, i.e. that the result is as close to the desired distribution as possible in any individual run. For your example, 2 true / 3 false is OK, 4 true / 1 false is not OK as a result.
Also you could write tests which execute the method e.g. 100 times and checks that the average of the distributions is "close enough" to the desired rate. This is a borderline case - running bigger batches may take a significant amount of time, so you might want to run these tests separately from your "regular" unit tests. Also, as Stefan Steinegger points out, such a test is going to fail every now and then if you define "close enough" stricter, or start being meaningless if you define the threshold too loosely. So it is a tricky case...
I think if I had the same problem I probably construct a confidence interval to detect anomalies if you have some statistics about average/stddev and such. So in your case if the average expected value is 250 then create a 95% confidence interval around the average using a normal distribution. If the results are outside that interval you fail the test.
see more
Why not re-factor the random number generation code and let the unit test framework and the source code both use it? You are trying to test your algorithm and not the randomized sequence right?
First you have to know what distribution should result from your random number generation process. In your case you are generating a result which is either 0 or 1 with probability -0.5. This describes a binomial distribution with p=0.5.
Given the sample size of n, you can construct (as an earlier poster suggested) a confidence interval around the mean. You can also make various statements about the probability of getting, for instance, 240 or less of either outcome when n=500.
You could use a normal distribution assumption for values of N greater than 20 as long as p is not very large or very small. The Wikipedia post has more on this.