I've got a simple problem, but somehow fail to solve it properly :
I would like to test the primality of long long integers (64bits).
The primality requirement comes from the mixing of several hash values; if not respected, there is some kind of "echo" in the resulting output, which degrades the distribution property of the hash formula.
I've got a few interesting candidates, but cannot test at this stage their primality.
I've found a website which proposes just that :
input a number, and it provides the next value which is prime.
The problem is, this website only works for values within the 32 bits range limit.
I've been roaming SO for the same question, and it was asked several times already. However, all answers i've been consulting up to now only points towards methods and algorithms (such as miller rabin, or AKS), carrying a hidden "do-it-yourself" tag.
And that's not what i'm looking for. I do not need to test primality regularly every day from now on, or for a huge numbers of candidates. I just have this need now, and for a very limited number of candidates.
Therefore a Ready-to-use tool which answer just this question (preferably, an online one), would better fit the bill.
But does that exist ?
You can plug in your number at http://www.alpertron.com.ar/ECM.HTM, which will tell you if it is prime or give you its factors if it is not. Or you could use the Factors[n] function at http://www.wolframalpha.com, which does the same thing. Either can quickly handle 64-bit integers.
Related
I am looking for the fastest algorithm to check if a number is a prime. The algorithm doesn't have to be deterministic as long as the chance of it failing is very small. Preferably it should be possible to control the possibility of failure by some parameter like "iteration count".
It would be enough for the algorithm to work for integers <= 10^18, but it would be better if it worked for all integers representable by a C++ unsigned long long assuming it being 64 bits (18,446,744,073,709,551,615).
There are already some questions like this one, but they require the algorithm to be deterministic, while for me its fine if it is probabilistic, as long as its "mostly accurate".
As others said, consider Miller-Rabin tests.
Here is a link for testing numbers less than 2^64: https://www.techneon.com/
You have to test at most three different bases per candidate. To get something probabilistic but about three time faster, just check one randomly chosen out of those three.
I believe Miller-Rabin primality testing algorithm fits your needs perfectly.
Some resources:
Miller-Rabin Wikipedia
Implementation and extra information
I have a function which checks the parity of a 64-bit word. Sadly the input value really could be anything so I cannot bias my test to cover a known sub-set of values, and I clearly cannot test every single possible 64-bit value...
I considered using random numbers so that each time the test was run, the function gained more coverage however unit tests should be consistent.
Ignoring my specific application, is there a sensible way to ensure a reasonable level of coverage, which is highly likely to expose errors introduced in the future, whilst not taking the best part of a billion years to run?
The following argumentation assumes that you have written / have access to the source code and do white box testing.
Depending on the level of confidence you need, you might consider proving the algorithm correct, possibly using automated provers. But, under the assumption that your code is not part of an application which demands this level of confidence, you probably can gain sufficient confidence with a comparably small set of unit-tests.
Lets assume that your algorithm somehow loops over the 64 bits (or, is intended to do so, because you still need to test it). This means, that the 64 bits are to be handled in a very regular way. Now, there could be a bug in your code such that, in the body of the loop, instead of using the respective bit from the 64 bit input, always a value of 0 is used by mistake. This bug would mean that you always get a parity of 0 as result. This particular bug can be found by any input value that leads to an expected parity of 1.
From this example we can conclude that for every bug that could realistically occur, you need one corresponding test case that can find that bug. Therefore, if you look at your algorithm and think about which bugs might be present, you may come up with, say, x bugs. Then you will need not more than x test cases to find these bugs. (Some of your test cases will likely find more than one of the bugs.)
This principal consideration has lead to a number of strategies to derive test cases, like equivalence partitioning or boundary testing. With boundary testing, for example, you would put special focus on the bits 0 and 63, which are at the boundaries of the loop's indices. This way you catch many of the classical off-by-one errors.
Now, what about the situation that an algorithm changes in the future (as you have asked about errors introduced in the future)? Instead of looping over the 64 bits, the parity can be calculated with xor-ing in various ways. For example, to improve the speed you might first xor the upper 32 bits with the lower 32 bits, then take the result and xor the upper 16 bits with the lower 16 bits and so on.
This alternative algorithm will have a different set of possible bugs. To be future proof with your test cases, you may also have to consider such alternative algorithms and the corresponding bugs. Most likely, however, the test cases for the first algorithm will find a large portion of those bugs as well - so probably the amount of tests will not increase too much. The analysis, however, becomes more complex.
In practice, I would focus on the currently chosen algorithm and rather take the approach to re-design the test suite in the case that the algorithm is changed fundamentally.
Sorry if this answer is too generic. But, as should have become clear, a more concrete answer would require more details about the algorithm that you have chosen.
I'm using GMP (with MPIR) for arbitrary size datatypes. I also use its primality test function, which uses Miller-Rabin method, but it is not accurate. This is what I want to fix.
I was able to confirm that the number 18446744073709551253 is a prime by using brute-force, with the sqrt approach.
Is there any way of checking large numbers being prime or not, with 100% accuracy?
It should not use too much memory/storage space, few megabytes is acceptable.
It should be faster than the sqrt method I used.
It should work for numbers that are at least 64bit in size, or larger.
Finally, it should be 100% accurate, no maybes!
What are my options ?
I could live with the brute force method (for 64bit numbers) though, but out of interest, I want faster & larger. Also, the 64bit number check was too slow: total 43 seconds!
For very large numbers, the AKS primality test is a deterministic primality test that runs in time O(log7.5n log log n), where n is the number of interest. This is exponentially faster than the O(√n) algorithm. However, the algorithm has large constant factors, so it's not practical until your numbers get rather large.
Hope this helps!
As a general point 100% certainty is not possible on a physical computer since there is a small but finite possibility that some component has failed invisibly and that the answer given at the end is not correct. Given that fact, then you can run enough probabilistic Miller-Rabin tests that the probability of the number being composite is far less than the probability that your hardware has failed. It is not difficult to test up to a 1 in 2^256 level of certainty:
boolean isPrime(num)
limit <- 256
certainty <- 0
while (certainty < limit)
if (millerRabin returns notPrime)
return false
exit
else
certainty <- certainty + 2
endif
endwhile
return true
end isPrime
This will test that the number is prime, up to a certainty of 1 in 2^256. Each M-R test adds a factor of four to the certainty. I have seen the resulting primes called "industrial strength primes", good enough for all practical purposes, but not quite for theoretical mathematical certainty.
For small n, trial division works; the limit there is probably somewhere around 10^12. For somewhat larger n, there are various studies (see works of Gerhard Jaeschke and Zhou Zhang) that calculate the smallest pseudoprime for various collections of Miller-Rabin bases; that will take you to about 10^25. After that, things get hard.
The "big guns" of primality proving are the APRCL method (it may be called Jacobi sums or Gaussian sums) and the ECPP method (based on elliptic curves). Both are complex, so you will want to find an implementation, don't write your own. These methods can both handle numbers of several hundred digits.
The AKS method is proven polynomial time, and is easy to implement, but the constant of proportionality is very high, so it is not useful in practice.
If you can factor n-1, or even partially factor it, Pocklington's method can determine the primality of n. Pocklington's method itself is quick, but the factoring may not be.
For all of these, you want to be reasonably certain that a number is prime before you try to prove it. If your number is not prime, all these methods will correctly determine that, but first they will waste much time trying to prove that a composite number is prime.
I have implementations of AKS and Pocklington at my blog.
The method of proving depends on the type of prime number you are trying to prove (for example, the Mersenne primes have special methods for proving primality that work only with them) and the size in decimal digits. If you are looking at hundreds of digits, then there is only one solution, albeit an inadequate one: The AKS algorithm. It is provably faster than other primality proving algorithms for large enough primes, but by the time it becomes useful, it will take so long that it really isn't worth the trouble.
Primality proving for big numbers is still a problem that is not yet sufficiently solved. If it was, the EFF awards would all be awarded and cryptography would have some problems, not for the list of primes, but for the methods used to find them.
I believe that, in the near future, a new algorithm for proving primality will arise that doesn't depend on a pre-generated list of primes up to the square root of n, and that doesn't do a brute-force method for making sure that all primes (and a lot of non-primes as well) under the square root are used as witnesses to n's primality. This new algorithm will probably depend on math concepts that are much simpler than those used by analytic number theory. There are patterns in the primes, that much is certain. Identifying those patterns is a different matter entirely.
I want to test my application when it gets tricked by a false passed SHA-160 sum and thus would like to compute a change to the data being summed which results in the original SHA-160 sum again and thus would be missed. I am using the Botan library in C++ to compute the sum.
How can I compute a change to a bit stream that is around 1500 bits such that its SHA-160 is identical to the original?
The short answer is: you can't.
The long answer is: you can, but only with vast amounts of computation power. The entire purpose of hash algorithms is to make it hard to find collisions. If it were easy to find a collision, then there'd be little point in using the hash algorithm.
To solve your test problem, I suggest abstracting away the file-reading/hash-computing part of your application into a separate class, and then mocking it with a fake hash implementation in order to test the rest of the application.
I'm searching for an algorithm to primality test large (like 10200) numbers.
Are there any good algorithms?
Ideally, I'd prefer an algorithm that isn't probabalistic.
Note: Numbers have over 50 and less then 200 digits.
If you're looking for a non-probabalistic test, you may want to check out the AKS primality testing algorithm, which runs in roughly O(log6 n) time. For the number of digits you have, this is probably feasible.
That said, probabalistic primality tests are extremely good and many have exponentially small error rates. I would suggest using one of those unless there's a good reason not to.
EDIT: I just found this page containing several C++ implementations of AKS. I have no idea whether they work correctly or not, but they might be a good starting point.
Hope this helps!
Typically we would use a probable prime test. I recommend BPSW, which you can follow by a Frobenius test and/or some random-base Miller-Rabin tests if you want more certainty. This will be fast and arguably more certain than running some proof implementations.
Assume you say that isn't good enough. Then you really want to use ECPP and get a certificate. Reasonable implementations are Primo or ecpp-dj. These can prove primality of 200 digit numbers in well under a second, and return a certificate that can be independently verified.
APR-CL is another reasonable method. The downside is that it doesn't return a certificate so you're trusting the implementation -- you get a "yes" or "no" output that is deterministically correct if the implementation was correct. Pari/GP uses APR-CL with its isprime command, and David Cleaver has an excellent open source implementation: mpz_aprcl. Those implementations have had some code review and daily use in various software so should be good.
AKS is a horrible method to use in practice. It doesn't return a certificate, and it's not too hard to find broken implementations, which completely defeats the point of using a proof method vs. good probable prime tests in the first place. It's also horrendously slow. 200 digit numbers are well past the practical point for any implementation I'm aware of. There is a "fast" one included in the previously mentioned ecpp-dj software so you can try it out, and there are quite a few other implementations to be found.
For some idea of speed, here are times of some implementations. I don't know of any implementations of AKS, APR-CL, or BPSW that are faster than the ones shown (please comment if you know of one). Primo starts off a bit slower than ecpp-dj shown, but at 500 or so digits it is faster, and has a better slope past that. It is the program of choice for large inputs (2,000-30,000 digits).