Is there any deterministic method to check whether a number is prime or not? - primes

And also is this method a deterministic method, written below:
bool isPrime(int a){
if( a <= 0) return false;
if( a == 1) return false;
if( a == 2) return true;
if( a == 3) return true;
int sqr = sqrt(a)+1;
if( a%2 == 0) return false;
for(int i=3;i<=sqr;i+=2){
if( a%i == 0 )
return false;
}
return true;
}

If your input is less than 2^64 (more than enough for your example using an int) there are some good methods:
1) BPSW. Fast, deterministic, correct for all 64-bit inputs, no known counterexamples above this (though we believe they exist)
2) Deterministic Miller-Rabin. The Wikipedia page gives some correct but inefficient base sets -- the ones at Best Known SPRP Bases are the best known for 64-bit inputs. At most 7 tests for any 64-bit input. New (Sep 2015) results give deterministic results for 81-bit input with 13 tests.
3) Hashed deterministic M-R. This is just an optimization of #2. Only a single M-R test needed for 32-bit inputs, 2 or 3 for 64-bit inputs. See Forisek and Jancina 2015 paper and my different hashed implementation.
Trial division, the method you're showing, is quite good for tiny inputs, say under a million or so. It is still computationally ok for a while past that, but it is exponential time in the bit length of the input. It slows down very rapidly, and really isn't usable past 26 or so digits (just because of the huge time growth). In my test, at 25 digits it is 400M times slower than BPSW (a PRP test at this size), 13M times slower than ECPP, 3M times slower than APR-CL.
Graph of run times for primality tests on large inputs
If your input is larger than 64-bit, some options include:
BLS75 methods (from the seminal 1975 paper), including N-1, N+1, and hybrid methods based on partial factoring. These are still used, and are surprisingly fast for numbers up to ~40 digits. Generalized Pocklington is a special case of one of the theorems. Since this relies on partial factoring of n-1 and/or n+1, it doesn't scale well in general and fizzles out around 80-100 digits for practical use.
APR-CL. Quite fast (e.g. half a second for a 200 digit number). Open source in Pari/GP and mpz_aprcl.
ECPP. Fastest method for large inputs not of special form. Primo (free to use and the gold standard), ecpp-dj (open source). This uses randomization, so it isn't deterministic in some sense, but it is 100% correct, which is what many people mean in this context. It also can provide a certificate for fast third-party validation, making it especially attractive.
AKS. Horrendously slow. Theoretical breakthrough and fascinatingly simple math, but practically useless. It is faster than trial division at 20 or so digits, and eventually will pass the BLS75 methods, but it's nowhere close to the methods we usually use: APR-CL or ECPP. Various implementations exist, with the fastest I'm aware of being in ecpp-dj and Perl/ntheory [caveat: I'm the author]. Polynomial time, but the exponent is higher than APR-CL for inputs under a quadrillion or so digits (ridiculously large sizes).

Yes, I think it is ... Deterministic Algorithm.
One other more efficient deterministic prime number test algorithm is AKS Primality Testing.
Also, if you want to use some probablistic(non-deterministic) primality testing, you can refer to Miller-Rabin Test.
Hope this helps !

Related

Which operator is faster: != or >

Which operator is faster: > or ==?
Example: I want to test a value (which can have a positive value or -1) against -1 :
if(time > -1)
// or
if (time != -1)
time has type "int"
The standard doesn't say. So it's up to what opcodes the given compiler generates in its given version, and how fast a given CPU executes them.
I.e., implementation / platform defined.
You can find out for a specific compiler / platform combination by looking at / benchmarking the executable code.
But I seriously doubt it will make much of a difference; this is the kind of micro-optimization that is almost always dwarfed by higher-level architectural decisions.
It is platform-dependent. Generally though, those two operations will translate directly to the assembler instructions "branch if greater than" and "branch if not equal". It is unlikely that there is any performance difference between those two, and if there would be, it would be non-significant.
The only branch instruction which is ever so slightly faster than the others is usually "branch if zero"/"branch if not zero".
(In the dark ages when compilers sucked, C programmers therefore liked to write loops as down-counting to zero, instead of up-counting, so that comparisons would be done against zero instead of a value, in order to gain a few nanoseconds. Modern compilers can do that optimization themselves, but you still see such loops now and then.)
In general, you shouldn't concern yourself with micro-management of performance. If you spend time pondering if > is faster than !=, instead of pondering about program design, readability and functionality, you need to set your priorities straight asap.
Semantically these conditions are different. The first one checks whether object time is positive or zero.
if(time > -1)
In this case it would be better to write
if( time >= 0 )
However some functions return either a non-negative value or -1. For example a search function can return -1 if it did not find an element in an array. Or -1 can signal an error state or an absence of a value.
In this case it is better to use condition
if ( time != -1 )
As for the speed when the compiler can generate only one mashine instruction to make the comparison in the both cases.
It is not the case when you should think about the speed. You should think about what condition is more expressive and shows the intention of the programmer.

How can one verify the proper operation of a sieve close to 2^64?

Small primes - up to about 1,000,000,000,000 - are readily available from various sources. The Prime Pages (utm.edu) have lists for the first 50 million primes, primos.mat.br goes up to 10^12, and programs like the one available at primesieve.org go even higher.
However, when it comes to numbers close to 2^64 there's only the ten primes mentioned on the page Primes just less than a power of two at primes.utm.edu and that seems to be it.
The primality test found there refuses to work on numbers that don't fit a double, others - elsewhere - fail to refuse and just print trash. The primesieve.org program refuses to work with numbers that aren't at least some 40 billion below 2^64, which doesn't exactly inspire confidence in the quality of their coding. The same result everywhere: nada, zilch, niente.
The cogs and gears of sieves start creaking around the 2^62 mark, and close to 2^64 there's hardly a cog that doesn't creak loudly threatening to break apart. Hence the need for testing the implementation is greatest where verification is most difficult, because of the scarcity/absence of reliable reference data. The primesieve.org program seems to be the only one that works at least up to 2^63 or thereabouts, but I don't trust it too much because of the above-mentioned issue.
So how then can one verify the proper operation of a sieve close to 2^64? Are there reliable lists somewhere for a million (or ten million or a hundred million) primes just below and above powers of two like 2^64, 2^63 and so on? Or are there reliable (trustworthy, verified, banged-on a lot) programs that yield such sequences or that can verify primes or lists of primes?
Once a sieve has been verified it can be used to produce handy lists with sums/checksums for loads of interesting ranges, but absent such lists the situation seems difficult...
P.S.: I determined the upper limit for the primesieve.org turbo siever to be UINT64_MAX - 10 * UINT32_MAX, or 0xFFFFFFF600000009. That means only the 10 * UINT32_MAX highest primes don't have any reference data at all so far...
Instead of looking for a pre-computed list, you could compare the output of your sieve to a different sieve. A good sieve, written by Tomás Oliveira e Silva, is available at http://sweet.ua.pt/tos/software/prime_sieve.html.
Another way to test your code is by testing the primality of all numbers your sieve reports as prime (or conversely, testing the non-primality of all numbers your sieve does not report as prime). A good way to do that is the Baillie-Wagstaff test. You can find a good-quality implementation by Thomas R. Nicely at http://www.trnicely.net/misc/bpsw.html.
You might also be interested in Jan Feitsma's tables of pseudoprimes at http://www.janfeitsma.nl/math/psp2/index, which are complete to 264.
First, thanks for sharing your program and working on correctness. I think it's important to do testing, and sieving near the size boundary was something I spent time working on for my code.
"The same result everywhere: nada, zilch, niente." You're not looking hard enough. There are plenty of tools that do this. It's too bad primesieve doesn't go all the way to 2^64-1, but that doesn't mean nothing else does.
"So how then can one verify the proper operation of a sieve close to 2^64?" One thing I did it is make an edge-case test that runs through all combinations of start/end points near 2^64-1, verifying a number of methods all generate the same results. This relies on having a list of these primes to start, but there are many ways to get these. Not only does this test the sieve at this range, but tests the start/end conditions to make sure there are no issues there.
Some ways to generate a million primes below 2^64:
time perl -Mntheory=:all -E 'forprimes { say } ~0-44347170,~0' | md5sum
Takes ~2s to generate 1M primes. We can force use of different code (Perl or GMP), use primality tests, etc. Lots of ways to do this, including just looping and calling is_provable_prime($n), for example. There are also other Perl modules including Math::Primality though they are much slower.
echo 'forprime(i=2^64-44347170,2^64-1,print(i))' | time gp -f -q | md5sum
Takes ~13s to generate 1M primes. As with the Perl module, there are lots of alternate ways including looping calling isprime which is a deterministic routine (assuming a non-ancient version of Pari/GP).
#include <stdio.h>
#include <gmp.h>
int main(void) {
mpz_t n;
mpz_init_set_str(n,"18446744073665204445",10);
mpz_nextprime(n, n);
while (mpz_sizeinbase(n,2) < 65) {
/* If you don't trust mpz_nextprime, one could add this:
* if (!mpz_probab_prime_p(n, 100))
* { fprintf(stderr, "Bad nextprime!\n"); return -1; }
*/
gmp_printf("%Zd\n",n);
mpz_nextprime(n, n);
}
mpz_clear(n);
return 0;
}
Takes about 30s and get the same results. This one is more dubious as I don't trust its 25 preset-random base MR test as much as BPSW or one of the proof methods, but it doesn't matter in this case as we see the results match. Adding the extra 100 tests is very expensive in time, but would make it extremely unlikely to have false results (I suspect we have overlapping bases so this is also wasteful).
from sympy import nextprime
n = 2**64-44347170;
n = nextprime(n)
while n < 2**64:
print n
n = nextprime(n)
Using Python's SymPy. Unfortunately primerange uses crazy memory when given 2^64-1 so that's not possible to use. Doing the simple nextprime method isn't ideal -- it takes about 5 minutes, but generates the same results (the current SymPy isprime uses 46 prime bases, which is many more than needed for deterministic results under 2^64).
There are other tools, e.g. FLINT, GAP, etc.
I realize that since you're on Windows, the world is wonky and lots of things don't work right. I have tested Perl's ntheory on Windows and with both Cygwin and Strawberry Perl from command prompt I get the same results. The GMP code ought to work the same, assuming GMP works correctly.
Edit add: If your results don't match one of the comparison methods, then one of the two (or both) is wrong. It may be the comparison code that is wrong! It helps everyone if you find and report errors. It's unlikely but possible they are both wrong in the same way, which is why I like to compare with as many other sources as possible. To me that is more robust than picking one "golden" code to compare against. Especially if you're using an oddball platform that may not have been thoroughly tested.
For BPSW, there are a few implementations around:
Pari. AES Lucas, in the Pari source code so not sure how portable it is.
TR Nicely. Strong Lucas, standalone code.
David Cleaver. Standard, Strong or Extra Strong Lucas. Standalone library, very clear, very easy to use.
My non-GMP code, including asm Montgomery math for x86_64. Quite a bit faster than bigint codes of course.
My GMP code. Standard, Strong, AES, or Extra strong Lucas. Faster than the other bigint codes. Also has other Frobenius and other compositeness tests. Can be made standalone.
I have a version using LibTomMath that I hope to get into one of the Perl6 VMs. Only interesting if you want to use LTM.
All verified vs. the Feitsma data. I'm sure there are more implementations around as well. FLINT has a variation that is quite fast, but it isn't really BPSW (but it's been verified for numbers under 2^64).
In general, one must use less naive techniques than trial division, or be very patient.
(gp/PARI documentation)
For 64-bit integers, trial division takes millions of times as long as even a simple sieve, let alone thoroughbreds like Kim Walisch's program (primesieve.org) which is orders of magnitude faster.
The reference sieve I want to verify (there's a standalone .cpp # pastebin) finds about a million primes per second when sieving close to 2^64, whereas the trial division code I lifted out of the gmp implementation takes 20 seconds to find even one. Restricting trial division to presieved primes (stored as deltas with one byte per prime for fast iteration) speeds it up by an order of magnitude, but it still outputs less than one prime per second on my laptop.
Hence, trial division can deliver only homœopathic amounts of reference data, even if I use all cores I can lay hands on including Kindle, phone and toaster.
More sophisticated tests like Miller-Rabin or the Baillie-PSW linked by user448810 are several orders of magnitude faster than trial division. For numbers up to 2^64 the Baillie-PSW has been verified to be deterministic (no strong pseudo primes below that threshold). The Miller-Rabin may or may not be deterministic up to 2^64 if the first 12 primes are used as base, or the 7-base set found by Jim Sinclar (meaning the 'net offers statements to that effect but apparently no evidence).
With Baillie-PSW verified - and faster to boot - it seems like a good choice. Unfortunately it is also several orders of magnitude more complicated than a sieve, making it even more important to find trustworthy implementations that are ready to compile without lots of twiddling or - ideally - available as binaries.
Thomas Nicely's Baillie-PSW page has source code that uses the gmp, and gp/PARI can use either gmp or its own code. The latter is also available as a binary, which is very fortunate since building gmp code on an exotic, off-beat platform like MinGW under Windows is a non-trivial undertaking, even if MPIR is used instead of gmp.
That gets us some bulk data but still nowhere near enough for verifying the sieve, since it is orders of magnitude too slow even for covering the blank area left by the cap of primesieve.org (10 * 2^32 numbers).
This is where Will Ness's bigint idea comes in. The operation of the sieve can be verified up to 1,000,000,000,000 using reference data from multiple, independent sources. Switching index variables from 32-bit to 64-bit eliminates most of the boundary cases that could cause the code to mess up in higher regions, leaving only a very few places where even uint64_t gets close to its limits. With those places thoroughly inspected and generously covered by test cases derived from the Baillie-PSW undertaking we can have reasonably high confidence that the sieve code is good. Add copious verification against primesieve.org in the range from 10^12 up to its cap, and it should be sufficient to regard the sieve implementation as trustworthy.
With the sieve up and running, it's easy to cover arbitray ranges with bulk data. Or with digests, as a canned/compressed means of verification that can serve needs of any size and shape. It's what I use in the demo .cpp I mentioned earlier, although my real code uses a mixture between an optimised digest implementation for general work and a special raw memory checksum of 128 bits for quick self-checks of factor sieve bitmaps.
SUMMARY
up to 1,000,000,000,000 verification against primos.mat.br or similar
up to 2^64 - 10 * 2^32 verification against primesieve.org
rest up to 2^64-1: verification of strategically chosen segments using Baillie-PSW (e.g. gp/PARI)

How to detect if it is ok to activate additional processing

Motivation is kind of hard to explain so I'll provide an example: Assume you receive high number of samples every second and your task is to classify them.
Lets also say this: You have two classifiers: heuristicFast, and heuristicSlow. So lets say that for every sample you run heuristicFast() and then if the result is close to undecided (lets say [0.45,0.55] range for classifier where 0 is class 1 and 1 is class2) I run more precise heuristicSlow.
Now the problem is that this is real time system so I want to be sure that I don't overload the CPUs (I'm using threading) even when high perchentage of calls to heuristicFast returns results in the [0.45,0.55] range.
What is the best way to accomplish this?
My best idea is to have entrycount for the heuristicSlow and then dont enter it if the entrycount is > number_of_cores / 2?
std::atomic<int> entrycount(0);
//...
if (classificationNotClear(result_heuristic_fast) && (entrycount<kMaxConcurrantCalls))
{
entrycount++;
final_result=heuristicSlow();
entrycount--;
}
else
final_result=result_heuristic_fast;
//...
Since you are building a real-time system, you have crucial information available: The maximum allowed running times for your classification and for both heuristics.
You could simply compute the leftover time for a fully fast heuristic ( total time minus sample count times fast heuristic time ) and determine how many applications of the slow heuristic fit into this time. Write this number into a counter and decrement.
Even fancier solution:
Sort your fast heuristic results by uncertainty (i.e. by abs(result-0.5)) and run the slow heuristic for as many cases as you've got time left.

make an integer even

Sometimes I need to be sure that some integer is even. As such I could use the following code:
int number = /* magic initialization here */;
// make sure the number is even
if ( number % 2 != 0 ) {
number--;
}
but that does not seem to be very efficient the most efficient way to do it, so I could do the following:
int number = /* magic initialization here */;
// make sure the number is even
number &= ~1;
but (besides not being readable) I am not sure that solution is completely portable.
Which solution do you think is best?
Is the second solution completely portable?
Is the second solution considerably faster that the first?
What other solutions do you know for this problem?
What if I do this inside an inline method? It should (theoretically) be as fast as these solutions and readability should no longer be an issue, does that make the second solution more viable?
note: This code is supposed to only work with positive integers but having a solution that also works with negative numbers would be a plus.
Personally, I'd go with an inline helper function.
inline int make_even(int n)
{
return n - n % 2;
}
// ....
int m = make_even(n);
Before accepting an answer I will make my own that tries to summarize and
complete some of the information found here:
Four possible methods where described (and some small variations of these).
if (number % 2 != 0) {
number--;
}
number&= ~1
number = number - (number % 2);
number = (number / 2) * 2;
Before proceeding any further let me clarify something:
The expected gain for using any of these methods is minimal, even if we could
prove that one method is 200% faster than the others the worst one is so fast
that the only way to have visible gain in speed would be if this method was
called many times in a CPU bound application. As such this is more of an
exercise for fun than a real optimization.
Analysis
Readability
As far as readability goes I would rank method 1 as the most readable,
method 4 as the second best and method 2 as the worse.
People are free to disagree but I ranked them like this because:
In method 1 it is as explicit as possible that if the number is odd you
want to subtract from it making it even.
Method 4 is also very much explicit but I ranked it second because at
first glance you might think it is doing nothing, and only a fraction of a
second latter you're like "Oh... Integer division.".
Method 2 and 3 are almost equivalent in terms of readability, but many
people are not used to bitwise operations and as such I ranked method 2 as
the worse.
With that in mind I would add that it is generally accepted that the best way
to implement this is using an inline function, and none of the options is
that unreadable, readability is not really an issue (direct uses in the code
are explicit and clear and reading the method will never be that hard).
If you don't want to use an inline method I would recommend that you only use
method 1 or method 4.
Compatibility issues
Underflow
It has been mentioned that method 1 may underflow, depending on the way the
processor represents integers. Just to be sure you can add the following
STATIC_ASSERT when using method 1.
STATIC_ASSERT(INT_MIN % 2 == 0, make_even_may_underflow);
As for method 3, even when INT_MIN is not even it may not underflow
depending on whether the result has the same sign of the divisor or the
dividend. Having the same sign of the divisor never underflows because
INT_MIN - (-1) is closer to 0.
Add the following STATIC_ASSERT just to be sure:
STATIC_ASSERT(INT_MIN % 2 == 0 || -1 % 2 < 0, make_even_may_underflow);
Of course you can still use these methods when the STATIC_ASSERT fails since
it would only be a problem when you pass INT_MIN to your make_even method,
but I would STRONGLY advice against it.
(Un)supported bit representations
When using method 2 you should make sure your compiler bit representation
behaves as expected:
STATIC_ASSERT( (1 & ~1) == 0, unsupported_bit_representation);
// two's complement OR sign-and-magnitude.
STATIC_ASSERT( (-3 & ~1) == -4 || (-3 & ~1) == -2 , unsupported_bit_representation);
Speed
I also did some naive speed tests using the Unix time utility. I ran every
different method (and its variations) 4 times and recorded the results,
since the results didn't vary much I didn't find necessary to run more tests.
The obtained results show method 4 and method 2 as the fastest of them
all.
Conclusion
According to the provided information, I would recommend using method 4. Its
readable, I am not aware of any compatibility issues and performs great.
I hope you enjoy this answer and use the information contained here to make
your own informed choice. :)
The source code is available if you want to check my results. Please note
that the tests where compiled using g++ and run in Mac OS X. Different
platforms and compilers may give different results.
int even_number = (number / 2) * 2;
This should work regardless architecture as long as optimizer is not going in the way (it shouldn't but who knows).
I would use the second solution. In any binary representation, regardless of the number of bits, big-endian vs. little-endian, or other architecture differences, that operation will have the effect of setting the lowest bit to zero. It's fast and completely portable. The intent of the code can be explained via comments, if you meet any poor C programmers who can't figure out what it means.
The &= solution looks best to me. If you want to make it more portable and more readable:
const int MakeEven = -2;
int number = /* magic initialization here */
// Make sure number is even
number &= MakeEven;
The second solution should be considerably faster than the first. Is it completely portable? Most likely, although there's probably some computer somewhere that does math differently.
This should work for positive and negative integers.
Use your second solution as inline function and put static assert into implementation of it to document and test that it works on platform that it is compiled on.
BOOST_STATIC_ASSERT( (1 & ~1) == 0 );
BOOST_STATIC_ASSERT( (-1 & ~1) == -2 );
Your second solution only works if your sign representation is "two's complement" or "sign and magnitude". To do it in place I'd go with suszterpatt's variant, which should (almost) always work
number -= (number % 2);
You don't know for sure in which direction this will "round" for negative values, so in extreme cases you might have an underflow.
even_integer = (any_integer >> 1) << 1;
This solution achieves the goal in the most performant way compared to the other suggested solutions.
In general, bitwise shift is the cheapest possible operation. Some compilers generate the same assembly for "number = (number / 2) * 2" as well but that is not guaranteed on all target platforms and programming languages.
The following approach is simple and requires no multiplication or division.
number = number & ~1;
or
number = (number + 1) & ~1;

Factorizing a number

I've got a number which is less than 500,000,000 and I want to factorize it in an efficient way. What algorithm do you suggest? Note: I have a time limit of 0.01 sec!
I've just written this C++ code but it's absolutely awful!
void factorize(int x,vector<doubly> &factors)
{
for(int i=2;i<=x;i++)
{
if(x%i==0)
{
doubly a;
a.number=i;
a.power=0;
while(x%i==0)
{
a.power++;
x/=i;
}
factors.push_back(a);
}
}
}
and doubly is something like this:
struct doubly
{
int number;
int power;
//and some functions!!
};
just another point: I know that n is not a prime
As you might know, factorization is a hard problem. You might also know that you only have to test divisibility with primes. A small, but well known hint: You only have to test up to the square root of n. I leave the reasoning to you.
Look at the sieve of Eratosthenes. And maybe you find a hint in these questions and answers? How about that one?
If you want to make this faster even - without the full trade of in space/time of this answer - calculate all prime numbers up to square root of 500,000,000 in advance and put them into an array. Obviously this is broken when the upper limit grows ;)
Start to study the algorithms.
What is the fastest factorization algorithm?
Factorize all the integers up to 500,000,000 in advance (doesn't matter how) and store the factors in a database or fixed-length record format. Your lookup will be fast, and the database ought to fit onto a modern PC.
This is one end of the time/space tradeoff, but you didn't say what you're trying to optimize for.
Alternatively, look at the algorithm for GNU coreutils "factor".
You may try Pollard's rho heuristic, it's suitable for complex numbers with relatively small divisors:
Pollard's rho
If this is a homework assignment, I believe you should re-read your lecture material.
Anyway, you know your number is composite and very small, that's fine.
For a naive trial-division with all numbers, you need sqrt(500000000) tests at most - that's about 22360 times for worst-case. You can obviously skip even numbers since they're divisible with 2 (check that first). So then this becomes 11180 divisions for 0.01 s. If your computer can do 1.1 M divisions per second then you can just use the naive approach.
Or, you can make a list of primes off-line, up to sqrt(500M) and then trial-try each of those. This will cut down on divisions some more.
Or, if the factors are not too far away from each other, you could try Fermat's method.
If these won't work, you can try to use Pollard's rho and others.
Or, if this is not homework, restate the problem to work around the limitations (as some have suggested, can you precompute the factored numbers beforehand etc.).