Fortran 95: super large numbers for prime test - fortran

I'm pretty new to Fortran, as in started learning it 2 days ago new. I started learning Fortran because I was getting into prime numbers, and I wrote a program in python that was so fast, it could determine 123098237 was a prime in 0.1 seconds.
Impressive, I know.
What's not impressive is when I try to find out if (2^127)-1 or 170141183460469231731687303715884105727 (it is, by the way) is a prime number. The program ran so long, I just ended up having to stop it.
So, I started looking for some faster languages to write it in, so I wrote the program in C.
It was faster, but the problem of super large prime numbers came into play.
I was going to to see if there was a solution but then I heard through the grapevine that, if your programming with numbers, Fortran is the fastest and best way to go. I vaguely remember my step dad's old Fortran 77 text books from college, but they were basically useless to me, because they were talking about working with punch cards. So, I went online, got gfortran for Ubuntu 12.04 x86, got a couple of pdfs, and started learning. Before you know it I made a program that received input and tested for primality, and worked!
But, the same old problem came up, the number was too big.
And so, how do I handle big numbers like this with Fortran?

Fortran, like many other compiled languages, doesn't provide such large integers or operations on them out-of-the-box. An up to date compiler ought to provide an integer with 18 decimal digits, but no more than that.
If you want to program, in Fortran, data types and operations for such big integers use your favourite search engine on terms such as Fortran multiple precision. You could even search around here on SO for relevant questions and answers.
If you want to investigate the mathematics of such large integers stick with Python; you'll struggle to write software yourself which matches its speed of operations on multiple precision arithmetic. One of the reasons that Python takes a long time to determine the primality of a large number is that it takes a program, any program written in any language, a long time to determine the primality of a large number. If you dig around you're likely to find that the relevant Python routines actually call code written in C or something similarly low-level. Investigate, if you wish, the topic of the computational complexity of primality testing.
I'm not saying you won't be able to write code to outperform the Python intrinsics, just that you will find it a challenge.

Most languages provide certain standard intrinsic types which are fully adequate for solving standard scientific and engineering problems. You don't need 80 digit numbers to calculate the thickness of a bridge girder or plan a spacecraft orbit. It would be difficult to measure to that accuracy. In Fortran, if you want to do extra precision calculations (e.g., for number theory) you need to look to libraries that augment the language, e.g., mpfun90 at http://crd-legacy.lbl.gov/~dhbailey/mpdist/ or fmlib at http://myweb.lmu.edu/dmsmith/fmlib.html

I'll guess that your algorithm is trial division. If that's true, you need a better algorithm; the implementation language won't matter.
Pseudocode for the Miller-Rabin primality test is shown below. It's probabilistic, but you can reduce the chance of error by increasing the k parameter, up to a maximum of about k=25:
function isPrime(n, k=5)
if n < 2 then return False
for p in [2,3,5,7,11,13,17,19,23,29]
if n % p == 0 then return n == p
s, d = 0, n-1
while d % 2 == 0
s, d = s+1, d/2
for i from 0 to k
x = powerMod(randint(2, n-1), d, n)
if x == 1 or x == n-1 then next i
for r from 1 to s
x = (x * x) % n
if x == 1 then return False
if x == n-1 then next i
return False
return True
I'll leave it to you to translate that to Fortran or some other language; if you're programming in C, there is a library called GMP that is frequently used for handling very large numbers, and the function shown above is built-it to that library. It's very fast; even numbers that are hundreds of digits long should be classified as prime or composite almost instantly.
If you want to be certain of the primality of a number, there are other algorithms that can actually provide a proof of primality. But they are much more complicated, and much slower.
You might be interested in the essay Programming with Prime Numbers at my blog.

Related

Fast primality test with 100% certainty?

I'm using GMP (with MPIR) for arbitrary size datatypes. I also use its primality test function, which uses Miller-Rabin method, but it is not accurate. This is what I want to fix.
I was able to confirm that the number 18446744073709551253 is a prime by using brute-force, with the sqrt approach.
Is there any way of checking large numbers being prime or not, with 100% accuracy?
It should not use too much memory/storage space, few megabytes is acceptable.
It should be faster than the sqrt method I used.
It should work for numbers that are at least 64bit in size, or larger.
Finally, it should be 100% accurate, no maybes!
What are my options ?
I could live with the brute force method (for 64bit numbers) though, but out of interest, I want faster & larger. Also, the 64bit number check was too slow: total 43 seconds!
For very large numbers, the AKS primality test is a deterministic primality test that runs in time O(log7.5n log log n), where n is the number of interest. This is exponentially faster than the O(√n) algorithm. However, the algorithm has large constant factors, so it's not practical until your numbers get rather large.
Hope this helps!
As a general point 100% certainty is not possible on a physical computer since there is a small but finite possibility that some component has failed invisibly and that the answer given at the end is not correct. Given that fact, then you can run enough probabilistic Miller-Rabin tests that the probability of the number being composite is far less than the probability that your hardware has failed. It is not difficult to test up to a 1 in 2^256 level of certainty:
boolean isPrime(num)
limit <- 256
certainty <- 0
while (certainty < limit)
if (millerRabin returns notPrime)
return false
exit
else
certainty <- certainty + 2
endif
endwhile
return true
end isPrime
This will test that the number is prime, up to a certainty of 1 in 2^256. Each M-R test adds a factor of four to the certainty. I have seen the resulting primes called "industrial strength primes", good enough for all practical purposes, but not quite for theoretical mathematical certainty.
For small n, trial division works; the limit there is probably somewhere around 10^12. For somewhat larger n, there are various studies (see works of Gerhard Jaeschke and Zhou Zhang) that calculate the smallest pseudoprime for various collections of Miller-Rabin bases; that will take you to about 10^25. After that, things get hard.
The "big guns" of primality proving are the APRCL method (it may be called Jacobi sums or Gaussian sums) and the ECPP method (based on elliptic curves). Both are complex, so you will want to find an implementation, don't write your own. These methods can both handle numbers of several hundred digits.
The AKS method is proven polynomial time, and is easy to implement, but the constant of proportionality is very high, so it is not useful in practice.
If you can factor n-1, or even partially factor it, Pocklington's method can determine the primality of n. Pocklington's method itself is quick, but the factoring may not be.
For all of these, you want to be reasonably certain that a number is prime before you try to prove it. If your number is not prime, all these methods will correctly determine that, but first they will waste much time trying to prove that a composite number is prime.
I have implementations of AKS and Pocklington at my blog.
The method of proving depends on the type of prime number you are trying to prove (for example, the Mersenne primes have special methods for proving primality that work only with them) and the size in decimal digits. If you are looking at hundreds of digits, then there is only one solution, albeit an inadequate one: The AKS algorithm. It is provably faster than other primality proving algorithms for large enough primes, but by the time it becomes useful, it will take so long that it really isn't worth the trouble.
Primality proving for big numbers is still a problem that is not yet sufficiently solved. If it was, the EFF awards would all be awarded and cryptography would have some problems, not for the list of primes, but for the methods used to find them.
I believe that, in the near future, a new algorithm for proving primality will arise that doesn't depend on a pre-generated list of primes up to the square root of n, and that doesn't do a brute-force method for making sure that all primes (and a lot of non-primes as well) under the square root are used as witnesses to n's primality. This new algorithm will probably depend on math concepts that are much simpler than those used by analytic number theory. There are patterns in the primes, that much is certain. Identifying those patterns is a different matter entirely.

How to handle big data element in c++?

I want to divide the return value of pow(2.0,(n-8)) by 86399.
The problem is 10 <= n <= 100000000.
How can I handle such a large return value?
I'm on Ubuntu 11.10 64 bits, using C++ 4.0.0-8
You can't unless you use a big numbers library. 64 bits can't hold a number that big. And even then, it will probably take a while. 2^(86392) has about 26000 digits in it.
If you want to get just a modulus, there are some nice algorithms for that. See http://en.wikipedia.org/wiki/Modular_exponentiation.
If you want to try bignums still, check out http://gmplib.org/.
One very easy way would be to use GMP -- http://gmplib.org/
This discussion should answer your question Modular Exponentiation for high numbers in C++
For numbers that large, you'll have to do something clever. There's no way you can represent that full number naively in any reasonable way without bigint libraries, and even then it's really too big for brute force. The number itself would take up tens of megabytes.

Fast way to compute n times 10 raised to the power of minus m

I want to compute 10 raised to the power minus m. In addition to use the math function pow(10, -m), is there any fast and efficient way to do that?
What I ask such a simple question to the c++ gurus from SO is that, as you know, just like base 2, 10 is also a special base. If some value n times the 10's power minus m, it is equivalent to move n's decimal point to the left m times. I think it must be a fast and efficient way to cope with.
For floating point m, so long as your standard library implementation is well written, then pow will be efficient.
If m is an integer, and you hinted that it is, then you could use an array of pre calculated values.
You should only be worrying about this kind of thing if that routine is a bottleneck in your code. That is if the calls to that routine take a significant proportion of the total running time.
Ten is not a special value on a binary machine, only two is. Use pow or exponentiation by squaring.
Unfortunately there is no fast and efficient way to calculate it using IEEE 754 floating point representation. The fastest way to get the result is to build a table for every value of m that you care about, and then just perform a lookup.
If there's a fast and efficient way to do it then I'm sure your CPU supports it, unless you're running on an embedded system in which case I'd hope that the pow(...) implementation is well written.
10 is special to us as most of us have ten fingers. Computers only have two digits, so 2 is special to them. :)
Use lookup table there cant be more than 1000 floats and especially if m is integer.
If you could operate with log n instead of n for a significant time, you could save time because instead of
n = pow(10*n,-m)
you now have to calculate (using the definition l = log10(n))
l = -m*(l+1)
Just some more ideas which may lead you to further solutions...
If you are interested in
optimization on algorithm level you
might look for a parallelized
approach.
You may speed up on
system/archtectural level on using Ipp
(for Intel Processors), or e.g. AMD
Core Math Library (ACML) for AMD
To use the power of your graphics
card may be another way (e.g. CUDA for NVIDEA cards)
I think it's also worth to look at
OpenCL
IEEE 754 specifies a bunch of floating-point formats. Those that are in widespread use are binary, which means that base 10 isn't in any way special. This is contrary to your assumption that "10 is also a special base".
Interestingly, IEEE 754-2008 does add decimal floating-point formats (decimal32 and friends). However, I'm yet to come across hardware implementations of those.
In any case, you shouldn't be micro-optimizing your code before you've profiled it and established that this is indeed the bottleneck.

Reinventing The Wheel: Random Number Generator

So I'm new to C++ and am trying to learn some things. As such I am trying to make a Random Number Generator (RNG or PRNG if you will). I have basic knowledge of RNGs, like you have to start with a seed and then send the seed through the algorithm. What I'm stuck at is how people come up with said algorithms.
Here is the code I have to get the seed.
int getSeed()
{
time_t randSeed;
randSeed = time(NULL);
return randSeed;
}
Now I know that there is are prebuilt RNGs in C++ but I'm looking to learn not just copy other people's work and try to figure it out.
So if anyone could lead me to where I could read or show me examples of how to come up with algorithms for this I would be greatly appreciative.
First, just to clarify, any algorithm you come up with will be a pseudo random number generator and not a true random number generator. Since you would be making an algorithm (i.e. writing a function, i.e. making a set of rules), the random number generator would have to eventually repeat itself or do something similar which would be non-random.
Examples of truly random number generators are one's that capture random noise from nature and digitize it. These include:
http://www.fourmilab.ch/hotbits/
http://www.random.org/
You can also buy physical equipment that generate white noise (or some other means on randomness) and digitally capture it:
http://www.lavarnd.org/
http://www.idquantique.com/true-random-number-generator/products-overview.html
http://www.araneus.fi/products-alea-eng.html
In terms of pseudo random number generators, the easiest ones to learn (and ones that an average lay person could probably make on their own) are the linear congruential generators. Unfortunately, these are also some of the worst PRNGs there are.
Some guidelines for determining what is a good PRNG include:
Periodicity (what is the range of available numbers?)
Consecutive numbers (what is the probability that the same number will be repeated twice in a row)
Uniformity (Is it just as likely to pick numbers from a certain sub range as another sub range)
Difficulty in reverse engineering it (If it is close to truly random then someone should not be able to figure out the next number it generates based on the last few numbers it generated)
Speed (how fast can I generate a new number? Does it take 5 or 500 arithmetic operations)
I'm sure there are others I am missing
One of the more popular ones right now that is considered good in most applications (ie not crptography) is the Mersenne Twister. As you can see from the link, it is a simple algorithm, perhaps only 30 lines of code. However trying to come up with those 20 or 30 lines of code from scratch takes a lot of brainpower and study of PRNGs. Usually the most famous algorithms are designed by a professor or industry professional that has studied PRNGs for decades.
I hope you do study PRNGs and try to roll your own (try Knuth's Art of Computer Programming or Numerical Recipes as a starting place), but I just wanted to lay this all out so at the end of the day (unless PRNGs will be your life's work) its much better to just use something someone else has come up with. Also, along those lines, I'd like to point out that historically compilers, spreadsheets, etc. don't use what most mathematicians consider good PRNGs so if you have a need for a high quality PRNGs don't use the standard library one in C++, Excel, .NET, Java, etc. until you have research what they are implementing it with.
A linear congruential generator is commonly used and the Wiki article explains it pretty well.
To quote John von Neumann:
Anyone who considers arithmetical
methods of producing random digits is
of course in a state of sin.
This is taken from Chapter 3 Random Numbers of Knuth's book "The Art of Computer Programming", which must be the most exhaustive overview of the subject available. And once you have read it, you will be exhausted. You will also know why you don't want to write your own random number generator.
The correct solution best fulfills the requirements and the requirements of every situation will be unique. This is probably the simplest way to go about it:
Create a large one dimensional array
populated with "real" random values.
"seed" your pseudo-random generator by
calculating the starting index with
system time.
Iterate through the array and return
the value for each call to your
function.
Wrap around when it reaches the end.

What is the maximum theoretically possible compression rate?

This is a theoretical question, so expect that many details here are not computable in practice or even in theory.
Let's say I have a string s that I want to compress. The result should be a self-extracting binary (can be x86 assembler, but it can also be some other hypothetical Turing-complete low level language) which outputs s.
Now, we can easily iterate through all possible such binaries and programs, ordered by size. Let B_s be the sub-list of these binaries who output s (of course B_s is uncomputable).
As every set of positive integers must have a minimum, there must be a smallest program b_min_s in B_s.
For what languages (i.e. set of strings) do we know something about the size of b_min_s? Maybe only an estimation. (I can construct some trivial examples where I can always even calculate B_s and also b_min_s, but I am interested in more interesting languages.)
This is Kolmogorov complexity, and you are correct that it's not computable. If it were, you could create a paradoxical program of length n that printed a string with Kolmogorov complexity m > n.
Clearly, you can bound b_min_s for given inputs. However, as far as I know most of the efforts to do so have been existence proofs. For instance, there is an ongoing competition to compress English Wikipedia.
Claude Shannon estimated the information density of the English language to be somewhere between 0.6 and 1.3 bits per character in his 1951 paper Prediction and Entropy of Printed English (PDF, 1.6 MB. Bell Sys. Tech. J (3) p. 50-64).
The maximal (avarage) compression rate possible is 1:1.
The number of possible inputs is equal to the number of outputs.
It has to be to be able to map the output back to the input.
To be able to store the output you need container at the same size as the minimal container for the input - giving 1:1 compression rate.
Basically, you need enough information to rebuild your original information. I guess the other answers are more helpful for your theoretical discussion, but just keep this in mind.