How can I calculate this prime product faster with PARI/GP? - primes

I want to calculate the product over 1-1/p , where p runs over the primes upto 10^10
I know the approximation exp(-gamma)/ln(10^10) , where gamma is the Euler-Mascheroni-constant and ln the natural logarithm, but I want to calculate the exact product to see how close the approximation is.
The problem is that PARI/GP takes very long to calculate the prime numbers from about 4.2 * 10^9 to 10^10. The prodeuler-command also takes very long.
Is there any method to speed up the calculation with PARI/GP ?

I'm inclined to think the performance issue has mostly to do with the rational numbers rather than the generation of primes up to 10^10.
As a quick test I ran
a(n)=my(t=0);forprime(p=1,n,t+=p);t
with a(10^10) and it computed in a couple of minutes which seems reasonable.
The corresponding program for your request is:
a(n)=my(t=1);forprime(p=1,n,t*=(1-1/p));t
and this runs much slower than the first program, so my question would be to ask if there is a way to reformulate the computation to avoid rationals until the end? Is my formulation above even as you intended? - the numbers are extremely large even for 10^6, so it is no wonder it takes a long time to compute and perhaps the issue has less to do with the numbers being rational but just their size.
One trick I have used to compute large products is to split the problem so that at each stage the numbers on the left and right of the multiplication are roughly the same size. For example to compute a large factorial, say 8! it is much more efficient to compute ((1*8)*(2*7))*((3*6)*(4*5)) rather than the obvious left to right approach.
The following is a quick attempt to do what you want using exact arithmetic. It takes approximately 8mins up to 10^8, but the size of the numerator is already 1.9 million digits so it is unlikely this could ever get to 10^10 before running out of memory. [even for this computation i needed to increase the stack size].
xvecprod(v)={if(#v<=1, if(#v,v[1],1), xvecprod(v[1..#v\2]) * xvecprod(v[#v\2+1..#v]))}
faster(n)={my(b=10^6);xvecprod(apply(i->xvecprod(
apply(p->1-1/p, select(isprime, [i*b+1..min((i+1)*b,n)]))), [0..n\b]))}
Using decimals will definitely speed things up. The following runs reasonably quickly for up to 10^8 with 1000 digits of precision.
xvecprod(v)={if(#v<=1, if(#v,v[1],1), xvecprod(v[1..#v\2]) * xvecprod(v[#v\2+1..#v]))}
fasterdec(n)={my(b=10^6);xvecprod(apply(i->xvecprod(
apply(p->1-1.0/p,select(isprime,[i*b+1..min((i+1)*b,n)]))),[0..n\b]))}
The fastest method using decimals is the simplest:
a(n)=my(t=1);forprime(p=1,n,t*=(1-1.0/p));t
With precision set to 100 decimal digits, this produces a(10^9) in 2 minutes and a(10^10) in 22 minutes.
10^9: 0.02709315486987096878842689330617424348105764850
10^10: 0.02438386113804076644782979967638833694491163817
When working with decimals, the trick of splitting the multiplications does not improve performance because the numbers always have the same number of digits. However, I have left the code, since there is a potential for better accuracy. (at least in theory.)
I am not sure I can give any good advice on the number of digits of precision required. (I'm more of a programmer type and tend to work with whole numbers). However, my understanding is that there is a possibility of losing 1 binary digit of precision with every multiplication, although since rounding can go either way on average it won't be quite so bad. Given that this is a product of over 450 million terms, that would imply all precision is lost.
However, using the algorithm that splits the computation, each value only goes through around 30 multiplications, so that should only result in a loss of at most 30 binary digits (10 decimal digits) of precision so working with 100 digits of precision should be sufficient. Surprisingly, I get the same answers either way, so the simple naive method seems to work.
During my tests, I have noticed that using forprime is much faster than using isprime. (For example, the fasterdec version took almost 2 hours compared with the simple version which took 22 minutes to get to the same result.). Similary, sum(p=1,10^9,isprime(p)) takes approximately 8 minutes, compared with my(t=1);forprime(p=1,10^9,t++);t which takes just 11 seconds.

Related

Select ULP value in float comparison

I've read several resource on the network and I understood there's no a single value or universal parameters when we compare float numbers. I've read from here several replies and I found the code from Google test to compare the floats. I want to better understand the meaning of ULP and its value. Reading comments from source code I read:
The maximum error of a single floating-point operation is 0.5 units in
the last place. On Intel CPU's, all floating-point calculations are
done with 80-bit precision, while double has 64 bits. Therefore, 4
should be enough for ordinary use.
It's not really clear why "therefore 4 should be enough". Can anyone explain why? From my understanding we are saying that we can tolerate 4*10^-6 or 4*10^-15 as difference between our numbers to say if they are the same or not, taking into account the number of significant digits of float (6/7) or double (15/16). Is it correct?
It is wrong. Very wrong. Consider that every operation can accumulate some error—½ ULP is the maximum (in round-to-nearest mode), so ¼ might be an average. So 17 operations are enough to accumulate more than 4 ULP of error just from average effects.1 Today’s computers do billions of operations per second. How many operations will a program do between its inputs and some later comparison? That depends on the program, but it could be zero, dozens, thousands, or millions just for “ordinary“ use. (Let’s say we exclude billions because then it gets slow for a human to use, so we can call that special-purpose software, not ordinary.)
But that is not all. Suppose we add a few numbers around 1 and then subtract a number that happens to be around the sum. Maybe the adds get a total error around 2 ULP. But when we subtract, the result might be around 2−10 instead of around 1. So the ULP of 2−10 is 1024 times smaller than the ULP of 1. That error that is 2 ULP relative to 1 is 2048 ULP relative to the result of the subtraction. Oops! 4 ULP will not cut it. It would need to be 4 ULP of some of the other numbers involved, not the ULP of the result.
In fact, characterizing the error is difficult in general and is the subject of an entire field of study, numerical analysis. 4 is not the answer.
Footnote
1 Errors will vary in direction, so some will cancel out. The behavior might be modeled as a random walk, and the average error might be proportional to the square root of the number of operations performed.

Adding large numbers returns strange, large numbers

I am trying to do some calculations in Fortran that looks like:
large number (order E40) - large number (order E40)
I should get back zero. Most of the time it works, but in a couple of cases I'm getting weird numbers. One answer Fortran gave me was -1E20. Another weird answer I got was 32768, which is 2^15, oddly enough.
Does anyone have any clue as to why this is happening?
It's hard to tell without actual code, but...
This is only to be expected if the numbers are sufficiently similar. While 1e20 is pretty large compared to 1 or 2, it is pretty small compared to 1e40.
In fact, even with double precision, you only have 15-17 digits of precision. Considering that, the values you get are below the accuracy possible with numbers in the range of 1e40.
What you see is numerical noise.
[ Another possibility, of course, is that you are trying to do this in single precision. This is not possible (max. exponent ~38) and anything might happen. ]

controlling overflow and loss in precision while multiplying doubles

ques:
I have a large number of floating point numbers (~10,000 numbers) , each having 6 digits after decimal. Now, the multiplication of all these numbers would yield about 60,000 digits. But the double range is for 15 digits only. The output product has to have 6 digits of precision after decimal.
my approach:
I thought of multiplying these numbers by 10^6 and then multiplying them and later dividing them by 10^12.
I also thought of multiplying these numbers using arrays to store their digits and later converting them to decimal. But this also appears cumbersome and may not yield correct result.
Is there an alternate easier way to do this?
I thought of multiplying these numbers by 10^6 and then multiplying them and later dividing them by 10^12.
This would only achieve further loss of accuracy. In floating-point, large numbers are represented approximately just like small numbers are. Making your numbers bigger only means you are doing 19999 multiplications (and one division) instead of 9999 multiplications; it does not magically give you more significant digits.
This manipulation would only be useful if it prevented the partial product to reach into subnormal territory (and in this case, multiplying by a power of two would be recommended to avoid loss of accuracy due to the multiplication). There is no indication in your question that this happens, no example data set, no code, so it is only possible to provide the generic explanation below:
Floating-point multiplication is very well behaved when it does not underflow or overflow. At the first order, you can assume that relative inaccuracies add up, so that multiplying 10000 values produces a result that's 9999 machine epsilons away from the mathematical result in relative terms(*).
The solution to your problem as stated (no code, no data set) is to use a wider floating-point type for the intermediate multiplications. This solves both the problems of underflow or overflow and leaves you with a relative accuracy on the end result such that once rounded to the original floating-point type, the product is wrong by at most one ULP.
Depending on your programming language, such a wider floating-point type may be available as long double. For 10000 multiplications, the 80-bit “extended double” format, widely available in x86 processors, would improve things dramatically and you would barely see any performance difference, as long as your compiler does map this 80-bit format to a floating-point type. Otherwise, you would have to use a software implementation such as MPFR's arbitrary-precision floating-point format or the double-double format.
(*) In reality, relative inaccuracies compound, so that the real bound on the relative error is more like (1 + ε)9999 - 1 where ε is the machine epsilon. Also, in reality, relative errors often cancel each other, so that you can expect the actual relative error to grow like the square root of the theoretical maximum error.

Large doubles/float/numbers

Say I have a huge floating number, say a trillion decimal places out. Obviously a long double can't hold this. Let's also assume I have a computer with more than enough memory to hold it. How do you do something like this?
You need arbitrary-precision arithmetic.
Arbitrary-precision math.
It's easy to say "arbitrary precision arithmetic" (or something similar), but I think it's worth adding that it's difficult to conceive of ways to put numbers anywhere close to this size to use.
Just for example: the current estimates of the size of the universe are somewhere in the vicinity of 150-200 billion light years. At the opposite end of the spectrum, the diameter of a single electron is estimated at a little less than 1 atometer. 1 light year is roughly 9.46x1015 meters (for simplicity, we'll treat it as 1016 meters).
So, let's take 1 atometer as our unit, and figure out the size of number for the diameter of the universe in that unit. 1018 units/meter * 1016 meters/light year * 1011 light years/universe diameter = about a 45 digit number to express the diameter of the universe in units of roughly the diameter of an electron.
Even if we went the next step, and expressed it in terms of the theorized size of a superstring, and added a few extra digits just in case the current estimates are off by a couple orders of magnitude, we'd still end up with a number around 65 digits or so.
This means, for example, that if we knew the diameter of the universe to the size of a single superstring, and we wanted to compute something like volume of the universe in terms of superstring diameters, our largest intermediate result would be something like 600-700 digits or so.
Consider another salient point: if you were to program a 64-bit computer running at, say, 10 GHz to do nothing but count -- increment a register once per clock cycle -- it would take roughly 1400 years for it to just cycle through the 64-bit numbers so it wrapped around to 0 again.
The bottom line is that it's incredibly difficult to come up with excuses (much less real reasons) to carry out calculations to anywhere close to millions, billions/milliards or trillions/billions of digits. The universe isn't that big, doesn't contain that many atoms, etc.
Sounds like what logarithms were invented for.
Without knowing what you intend to do with the number, it's impossible to accurately say how to represent it.

Accurate evaluation of 1/1 + 1/2 + ... 1/n row

I need to evaluate the sum of the row: 1/1+1/2+1/3+...+1/n. Considering that in C++ evaluations are not complete accurate, the order of summation plays important role. 1/n+1/(n-1)+...+1/2+1/1 expression gives the more accurate result.
So I need to find out the order of summation, which provides the maximum accuracy.
I don't even know where to begin.
Preferred language of realization is C++.
Sorry for my English, if there are any mistakes.
For large n you'd better use asymptotic formulas, like the ones on http://en.wikipedia.org/wiki/Harmonic_number;
Another way is to use exp-log transformation. Basically:
H_n = 1 + 1/2 + 1/3 + ... + 1/n = log(exp(1 + 1/2 + 1/3 + ... + 1/n)) = log(exp(1) * exp(1/2) * exp(1/3) * ... * exp(1/n)).
Exponents and logarithms can be calculated pretty quickly and accuratelly by your standard library. Using multiplication you should get much more accurate results.
If this is your homework and you are required to use simple addition, you'll better add from the smallest one to the largest one, as others suggested.
The reason for the lack of accuracy is the precision of the float, double, and long double types. They only store so many "decimal" places. So adding a very small value to a large value has no effect, the small term is "lost" in the larger one.
The series you're summing has a "long tail", in the sense that the small terms should add up to a large contribution. But if you sum in descending order, then after a while each new small term will have no effect (even before that, most of its decimal places will be discarded). Once you get to that point you can add a billion more terms, and if you do them one at a time it still has no effect.
I think that summing in ascending order should give best accuracy for this kind of series, although it's possible there are some odd corner cases where errors due to rounding to powers of (1/2) might just so happen to give a closer answer for some addition orders than others. You probably can't really predict this, though.
I don't even know where to begin.
Here: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Actually, if you're doing the summation for large N, adding in order from smallest to largest is not the best way -- you can still get into a situation where the numbers you're adding are too small relative to the sum to produce an accurate result.
Look at the problem this way: You have N summations, regardless of ordering, and you wish to have the least total error. Thus, you should be able to get the least total error by minimizing the error of each summation -- and you minimize the error in a summation by adding values as nearly close to each other as possible. I believe that following that chain of logic gives you a binary tree of partial sums:
Sum[0,i] = value[i]
Sum[1,i/2] = Sum[0,i] + Sum[0,i+1]
Sum[j+1,i/2] = Sum[j,i] + Sum[j,i+1]
and so on until you get to a single answer.
Of course, when N is not a power of two, you'll end up with leftovers at each stage, which you need to carry over into the summations at the next stage.
(The margins of StackOverflow are of course too small to include a proof that this is optimal. In part because I haven't taken the time to prove it. But it does work for any N, however large, as all of the additions are adding values of nearly identical magnitude. Well, all but log(N) of them in the worst not-power-of-2 case, and that's vanishingly small compared to N.)
http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic
You can find libraries with ready for use implementation for C/C++.
For example http://www.apfloat.org/apfloat/
Unless you use some accurate closed-form representation, a small-to-large ordered summation is likely to be most accurate simple solution (it's not clear to me why a log-exp would help - that's a neat trick, but you're not winning anything with it here, as far as I can tell).
You can further gain precision by realizing that after a while, the sum will become "quantized": Effectively, when you have 2 digits of precision, adding 1.3 to 41 results in 42, not 42.3 - but you achieve almost a precision doubling by maintaining an "error" term. This is called Kahan Summation. You'd compute the error term (42-41-1.3 == -0.3) and correct that in the next addition by adding 0.3 to the next term before you add it in again.
Kahan Summation in addition to a small-to-large ordering is liable to be as accurate as you'll ever need to get. I seriously doubt you'll ever need anything better for the harmonic series - after all, even after 2^45 iterations (crazy many) you'd still only be dealing with a numbers that are at least 1/2^45 large, and a sum that's on the order of 45 (<2^6), for an order of magnitude difference of 51 powers-of-two - i.e. even still representable in a double precision variable if you add in the "wrong" order.
If you go small-to-large, and use Kahan Summation, the sun's probably going to extinguish before today's processors reach a percent of error - and you'll run into other tricky accuracy issues just due to the individual term error on that scale first anyhow (being that a number of the order of 2^53 or larger cannot be represented accurately as a double at all anyhow.)
I'm not sure about the order of summation playing an important role, I havent heard that before. I guess you want to do this in floating point arithmetic so the first thing is to think more inline of (1.0/1.0 + 1.0/2.0+1.0/3.0) - otherwise the compiler will do integer division
to determine order of evaluation, maybe a for loop or brackets?
e.g.
float f = 0.0;
for (int i=n; i>0; --i)
{
f += 1.0/static_cast<float>(i);
}
oh forgot to say, compilers will normally have switches to determine floating point evaluation mode. this is maybe related to what you say on order of summation - in visual C+ these are found in code-generation compile settings, in g++ there're options -float that handle this
actually, the other guy is right - you should do summation in order of smallest component first; so
1/n + 1/(n-1) .. 1/1
this is because the precision of a floating point number is linked to the scale, if you start at 1 you'll have 23 bits of precision relative to 1.0. if you start at a smaller number the precision is relative to the smaller number, so you'll get 23 bits of precision relative to 1xe-200 or whatever. then as the number gets bigger rounding error will occur, but the overall error will be less than the other direction
As all your numbers are rationals, the easiest (and also maybe the fastest, as it will have to do less floating point operations) would be to do the computations with rationals (tuples of 2 integers p,q), and then do just one floating point division at the end.
update to use this technique effectively you will need to use bigints for p & q, as they grow quite fast...
A fast prototype in Lisp, that has built in rationals shows:
(defun sum_harmonic (n acc)
(if (= n 0) acc (sum_harmonic (- n 1) (+ acc (/ 1 n)))))
(sum_harmonic 10 0)
7381/2520
[2.9289682]
(sum_harmonic 100 0)
14466636279520351160221518043104131447711/278881500918849908658135235741249214272
[5.1873775]
(sum_harmonic 1000 0)
53362913282294785045591045624042980409652472280384260097101349248456268889497101
75750609790198503569140908873155046809837844217211788500946430234432656602250210
02784256328520814055449412104425101426727702947747127089179639677796104532246924
26866468888281582071984897105110796873249319155529397017508931564519976085734473
01418328401172441228064907430770373668317005580029365923508858936023528585280816
0759574737836655413175508131522517/712886527466509305316638415571427292066835886
18858930404520019911543240875811114994764441519138715869117178170195752565129802
64067621009251465871004305131072686268143200196609974862745937188343705015434452
52373974529896314567498212823695623282379401106880926231770886197954079124775455
80493264757378299233527517967352480424636380511370343312147817468508784534856780
21888075373249921995672056932029099390891687487672697950931603520000
[7.485471]
So, the next better option could be to mantain the list of floating points and to reduce it summing the two smallest numbers in each step...