Precision problems with very large reals - Fortran - fortran

The problem I'm attempting to tackle at the moment involves computing the order of 10 modulo(n), where n could be any number less than 1000. I have a function to do exactly that, however, I am unable to obtain accurate results as the value of the order increases.
The function works correctly as long as the order is sufficiently small, but returns incorrect values for large orders. So I stuck in some output to the terminal to locate the problem, and discovered that when I use exponentiation, the accuracy of my reals is being compromised.
I declared ALL variables in the function and in the program I tested it from as real(kind=nkind) where nkind = selected_real_kind(p=18, r=308). Any numbers explicitly referenced are also declared as, for example, 1.0_nkind. However, when I print out 10**n for n counting up from 1, I find that at 10**27, the value is correct. However, 10**28 gives 9999999999999999999731564544. All higher powers are similarly distorted, and this inaccuracy is the source of my problem.
So, my question is, is there a way to work around the error? I don't know of any way to use a more extended precision than I'm already using in the calculations.
Thanks,
Sean
*EDIT: There's not much to see in the code, but here you go:
integer, parameter :: nkind = selected_real_kind(p=18, r = 308)
real(kind=nkind) function order_ten_modulo(n)
real(kind=nkind) :: n, power
power = 1.0_nkind
if (mod(n, 5.0_nkind) == 0 .or. mod(n, 2.0_nkind) == 0) then
order_ten_modulo = 0
return
end if
do
if (power>300.0) then ! Just picked this number as a safeguard against endless looping -
exit
end if
if (mod(10.0_nkind**power, n) == 1.0_nkind) then
order_ten_modulo = power
exit
end if
power = power + 1.0_nkind
end do
return
end function order_ten_modulo

Related

Use CEILING Without the Effect of Rounding Error

I'm trying to use the intrinsic function ‘CEILING’, but the rounding error makes it difficult to get what I want sometimes. The sample code is just very simple:
PROGRAM MAIN
IMPLICIT NONE
INTEGER, PARAMETER :: ppm_kind_double = KIND(1.0D0)
REAL(ppm_kind_double) :: before,after,dx
before = -0.112
dx = 0.008
after = CEILING(before/dx)
WRITE(*,*) before, dx, before/dx, after
END
And I got results:
The value I give to 'before' and 'dx' in the code is just for demonstration. For those before/dx = -13.5 for example, I want to use CEILING to get -13. But for the picture I show, I actually want to get -14. I have considered using some arguments like
IF(ABS(NINT(before/dx) - before/dx) < 0.001)
But that's simply not beautiful. Is there any better way to do this?
Update:
I was surprised to find that the problem won't occur if I set the variables to constants in ppm_kind_double. So I guess this 'rounding error' will only happen when the number of digits for rounding accuracy of the machine I use is more than what's defined in ppm_kind_double. I actually run my program(not this demo code) on a cluster, which I don't know about the machine precision. So maybe it's quad precision on that machine that leads to the problem?
After I set constants to double precision:
before = -0.112_ppm_kind_double
dx = 0.008_ppm_kind_double
This is a bit tricky, because you never know where the rounding error comes from. If dx was just a tiny bit larger than 0.008 then the division before/dx might still be rounded to the same value, but now -13 would be the correct answer.
That said, the most common method around that that I have seen is to just nudge the previous value ever so little into the opposite direction. Something like this:
program sign_test
use iso_fortran_env
implicit none
real(kind=real64) :: a, b
integer(kind=int32) :: c
a = -0.112
b = 0.008
c = my_ceiling(a/b)
print*, a, b, c
contains
function my_ceiling(v)
implicit none
real(kind=real64), intent(in) :: v
integer(kind=int32) :: my_ceiling
my_ceiling = ceiling(v - 1d-6, kind=int32)
end function my_ceiling
end program sign_test
This won't have any impact on the vast majority of values, but there are now a few values that will get rounded up by more than intended.
note if your reals are notionally "exact" to a specified precision you might do something like this:
after=nint(1000*before)/nint(1000*dx)
this works for your example.. you haven't said what you'd expect for both values positive and so on so you might need to work it a bit.

Handling large digit number variables in object-oriented C++

I am working on numerical analysis using a solver (the programming is based on object-oriented C++) compiled with double precision, and my unit is 64-bits. My problem is that when the solver computes a large number - say -1.45 to the power 21, to take an actual example - and stacks it in the allocated memory by passing this value to an existing variable, it is converted to 0. So of course, when I later use this variable in a division I get a segmentation fault. I do not understand how this process works, and because I use the DP, I do not see how to fix the issue. Could anyone give me a hand with this matter please ?
In case it helps: I just ran a test where I state a=-1.45e+21, and "print" the value which is returned correctly by the solver. But when I do not use the "e" exponent and enter the full value (with 19 zeros) I get 0 in return. So I guess the issue/limitation comes from the number of digits, any ideas ?? Thanks !
Edit: I post a summary of the steps I go through to compute one of the variables which poses an issue. The others being similarly defined.
First I initialise the field pointer lists:
PtrList<volScalarField> fInvFluids(fluidRegions.size());
Where the class of volScalarField is just an array of double. Then I populate the field pointer lists:
fInvFluids.set
(
i,
new volScalarField
(
IOobject
(
"fInv",
runTime.timeName(),
fluidRegions[i],
IOobject::NO_READ,
IOobject::AUTO_WRITE
),
fluidRegions[i],
dimensionedScalar
(
"fInv",
dimensionSet(3,1,-9,-1,0,0,0),
scalar(0)
)
)
);
After this, I set the field regions:
volScalarField& fInv = fInvFluids[i];
And finally I compute the value:
// Info<< " ** Calculating fInv **\n";
fInv = gT*pow(pow(T/Tlambda, 5.7)*(1 - pow(T/Tlambda, 5.7)), 3);
Where T is field variable and Tlambda a scalar value defined at run time.
A double variable probably can't hold 19 zeroes either. A (decimal) digit takes more than 3 bits, so 19 zeroes will take at least 57 bits. A double typically has a mantissa which is only 53 bits.
However, that doesn't sound like the problem you have. In C++ code, expressions have a type as well. 1 is not the same as 1.0. The first is an int and the second a double. While you can convert an int to double, they're not the same. An int most likely can hold values up to 2 billion , but formally it may have a limit as low as 32767. The solution to your problem might be as simple as adding a twentieth zero : 100000000000000000000.0 making it a double.

pseudo random distribution which guarantees all possible permutations of value sequence - C++

Random question.
I am attempting to create a program which would generate a pseudo-random distribution. I am trying to find the right pseudo-random algorithm for my needs. These are my concerns:
1) I need one input to generate the same output every time it is used.
2) It needs to be random enough that a person who looks at the output from input 1 sees no connection between that and the output from input 2 (etc.), but there is no need for it to be cryptographically secure or truly random.
3)Its output should be a number between 0 and (29^3200)-1, with every possible integer in that range a possible and equally (or close to it) likely output.
4) I would like to be able to guarantee that every possible permutation of sequences of 410 outputs is also a potential output of consecutive inputs. In other words, all the possible groupings of 410 integers between 0 and (29^3200)-1 should be potential outputs of sequential inputs.
5) I would like the function to be invertible, so that I could take an integer, or a series of integers, and say which input or series of inputs would produce that result.
The method I have developed so far is to run the input through a simple halson sequence:
boost::multiprecision::mpz_int denominator = 1;
boost::multiprecision::mpz_int numerator = 0;
while (input>0) {
denominator *=3;
numerator = numerator * 3 + (input%3);
input = input/3;
}
and multiply the result by 29^3200. It meets requirements 1-3, but not 4. And it is invertible only for single integers, not for series (since not all sequences can be produced by it). I am working in C++, using boost multiprecision.
Any advice someone can give me concerning a way to generate a random distribution meeting these requirements, or just a class of algorithms worth researching towards this end, would be greatly appreciated. Thank you in advance for considering my question.
----UPDATE----
Since multiple commenters have focused on the size of the numbers in question, I just wanted to make clear that I recognize the practical problems that working with such sets poses but in asking this question I'm interested only in the theoretical or conceptual approach to the problem - for example, imagine working with a much smaller set of integers like 0 to 99, and the permutations of sets of 10 of output sequences. How would you design an algorithm to meet these five conditions - 1)input is deterministic, 2)appears random (at least to the human eye), 3)every integer in the range is a possible output, 4)not only all values, but also all permutations of value sequences are possible outputs, 5)function is invertible.
---second update---
with many thanks to #Severin Pappadeux I was able to invert an lcg. I thought I'd add a little bit about what I did to hopefully make it easier for anyone seeing this in the future. First of all, these are excellent sources on inverting modular functions:
https://www.khanacademy.org/computing/computer-science/cryptography/modarithmetic/a/modular-inverses
https://www.khanacademy.org/computer-programming/discrete-reciprocal-mod-m/6253215254052864
If you take the equation next=ax+c%m, using the following code with your values for a and m will print out the euclidean equations you need to find ainverse, as well as the value of ainverse:
int qarray[12];
qarray[0]=0;
qarray[1]=1;
int i =2;
int reset = m;
while (m % a >0) {
int remainder=m%a;
int quotient=m/a;
std::cout << m << " = " << quotient << "*" << a << " + " << remainder << "\n";
qarray[i] =qarray[i-2]-(qarray[i-1]*quotient);
m=a;
a=remainder;
i++;
}
if (qarray[i-1]<0) {qarray[i-1]+=reset;}
std::cout << qarray[i-1] << "\n";
The other thing it took me a while to figure out is that if you get a negative result, you should add m to it. You should add a similar term to your new equation:
prev = (ainverse(next-c))%m;
if (prev<0) {prev+=m;}
I hope that helps anyone who ventures down this road in the future.
Ok, I'm not sure if there is a general answer, so I would concentrate on random number generator having, say, 64bit internal state/seed, producing 64bit output and having 2^64-1 period. In particular, I would look at linear congruential generator (aka LCG) in the form of
next = (a * prev + c) mod m
where a and m are primes to each other
So:
1) Check
2) Check
3) Check (well, for 64bit space of course)
4) Check (again, except 0 I believe, but each and every permutation of 64bits is output of LCG starting with some seed)
5) Check. LCG is known to be reversible, i.e. one could get
prev = (next - c) * a_inv mod m
where a_inv could be computed from a, m using Euclid's algorithm
Well, if it looks ok to you, you could try to implement LCG in your 15546bits space
UPDATE
And quick search shows reversible LCG discussion/code here
Reversible pseudo-random sequence generator
In your update, "appears random (to the human eye)" is the phrasing you use. The definition of "appears random" is not a well agreed upon topic. There are varying degrees of tests for "randomness."
However, if you're just looking to make it appear random to the human eye, you can just use ring multiplication.
Start with the idea of generating N! values between 0 and M (N>=410, M>=29^3200)
Group this together into one big number. we're going to generate a single number ranging from 0 to *M^N!. If we can show that the pseudorandom number generator generates every value from 0 to M^N!, we guarantee your permutation rule.
Now we need to make it "appear random." To the human eye, Linear Congruent Generators are enough. Pick a LCG with a period greater than or equal to 410!*M^N satisfying the rules to ensure a complete period. Easiest way to ensure fairness is to pick a LCG in the form x' = (ax+c) mod M^N!
That'll do the trick. Now, the hard part is proving that what you did was worth your time. Consider that the period of just a 29^3200 long sequence is outside the realm of physical reality. You'll never actually use it all. Ever. Consider that a superconductor made of Josephine junctions (10^-12kg processing 10^11bits/s) weighing the mass of the entire universe 3*10^52kg) can process roughly 10^75bits/s. A number that can count to 29^3200 is roughly 15545 bits long, so that supercomputer can process roughly 6.5x10^71 numbers/s. This means it will take roughly 10^4600s to merely count that high, or somewhere around 10^4592 years. Somewhere around 10^12 years from now, the stars are expected to wink out, permanently, so it could be a while.
There are M**N sequences of N numbers between 0 and M-1.
You can imagine writing all of them one after the other in a (pseudorandom) sequence and placing your read pointer randomly in the resulting loop of N*(M**N) numbers between 0 and M-1...
def output(input):
total_length = N*(M**N)
index = input % total_length
permutation_index = shuffle(index / N, M**N)
element = input % N
return (permutation_index / (N**element)) % M
Of course for every permutation of N elements between 0 and M-1 there is a sequence of N consecutive inputs that produces it (just un-shuffle the permutation index). I'd also say (just using symmetry reasoning) that given any starting input the output of next N elements is equally probable (each number and each sequence of N numbers is equally represented in the total period).

Test of lot of math operations in a class

Is there a way of testing functions inside a class in an easy way for correct results? I mean, I have been looking at google test unit testing, but seems more to find fails in the work classes and functions, more than in the expected result.
For example, from math theory one could know which is the square root of all numbers, now you want to check a sqrt function, seeking for floating point precision errors, and then you also want to check lot of functions that use floats and look for any precision error, is there a way to make this easy and fast ?
I can think of 2 direct solutions
1)
one of the easiest ways to test for accuracy of mathematical functions is similar to what is used as definition work for limits in calculus. taking the value to be tested, and then also using a value that is "close" on both sides. I have heard of analogies drawn between limit analysis and unit testing, but keep in mind that if your looking for speed this will not be your best options. and that this will only work on continues operations, and that this analogy is for definition work only
so what you would do is have a "limitDomain" variable defined per function (this is because some operations are more accurate then others for reasoning look up taylor approximation of [function]), and then use that as you limiter. then test: low, high, and then the value itself, and then take the avg of all three within a given margin of error,
float testMathOpX(float _input){
float low = 0.0f;
float high = 0.0f;
low = _input - limitDomainOpX;
high = _input + limitDomainOpX;
low = OpX(low);
_input = OpX(_input);
high = OpX(high);
// doing 3 separate averages with division by 2 mains the worst decimal you will have is a trailing 5, or in some cases a trailing 25
low = (low + _input)/2
high = (_input + high)/2;
_input = (low + high)/2
return _input;
}
2)
the other method that I can think of is more of a table of values approach being that you take the input, and then check to see where on the domain of the operation it lies, and if it lies within certain values then you use value replacement. The thing to realize is that you need to have a lot of work ahead of time to get these table of values, and then it becomes just domain testing of the value your taking in in the form of:
if( (_input > valLow) && (_input < valHigh)){
... replace the value with an empirically found value
}
the problem with this is that you need o find those empirically found values.
Do you have requirements on the precision or do you want to find the precision?
If it is the former, then it is not hard to create test cases using any test framework.
y = myfunc(x);
if (y > expected_y + allowed_error || y < expected_y - allowed_error) {
// Test failed
...
}
Edit:
There are two routes to finding the precision, through testing and through algorithm analysis.
Testing should be straightforward: Compare the output with the correct values (which you have to obtain in some way).
Algortithm analysis is when you calculate the expected size of the error by calculating the error of the algorithm and the error caused by lack of precision in floating point arithmetic.

Efficient convergence check

I have a grid with thousands of double precision reals.
It's iterating through, and I need it to stop when it's reached convergence to 3 decimal places.
The target is to have it run as fast as possible, but needs to give the same result every (to 3 dp) every time.
At the minute I'm doing something like this
REAL(KIND=DP) :: TOL = 0.001_DP
DO WHILE(.NOT. CONVERGED)
CONVERGED = .TRUE.
DO I = 1, NUM_POINTS
NEW POTENTIAL = !blah blah blah
IF (CONVERGED) THEN
IF (NEW_POTENTIAL < OLD_POTENTIAL - TOL .OR. NEW_POTENTIAL > OLD_POTENTIAL + TOL) THEN
CONVERGED = .FALSE.
END IF
END IF
OLD_POTENTIAL = NEW POTENTIAL
END DO
END DO
I'm thinking that many IF statements can't be too great for performance. I thought about checking for convergence at the end; finding the average value (summing the whole grid, divide by num_points), and checking if that has converged in the same way as above, but I'm not convinced this will always be accurate.
What is the best way of doing this?
If I understand correctly you've got some kind of time-stepping going on, where you create the values in new_potential by calculations on old_potential. Then make old equal to new and carry on.
You could replace your existing convergence tests with the single statement
converged = all(abs(new_potential - old_potential)<tol)
which might be faster. If the speed of the test is a major concern you could test only every other (or every third or fourth ...) iteration
A few comments:
1) If you used a potential array with 2 planes, instead of an old_ and new_potential, you could transfer new_ into old_ by swapping indices at the end of each iteration. As your code stands there's a lot of data movement going on.
2) While semantically you are right to have a while loop, I'd always use a do loop with a maximum number of iterations, just in case the convergence criterion is never met.
3) In your declaration REAL(KIND=DP) :: TOL = 0.001_DP the specification of DP on the numerical value of TOL is redundant, REAL(KIND=DP) :: TOL = 0.001 is adequate. I'd also make this a parameter, the compiler may be able to optimise its use if it knows that it is immutable.
4) You don't really need to execute CONVERGED = .TRUE. inside the outermost loop, set it before the first iteration -- this will save you a nanosecond or two.
Finally, if your convergence criterion is that every element in the potential array has converged to 3dp then that is what you should test for. It would be relatively easy to construct counterexamples for your suggested averages. However, my concern would be that your system will never converge on every element and that you should be using some matrix norm computation to determine convergence. SO is not the place for a lesson in that topic.
What are the calculations for the convergence criteria? Unless they are worse then the calculations to advance the potential it is probably better to have the IF statement to terminate the loop as soon as possible rather than guess a very large number of iterations to be sure to obtain a good solution.
Re High Performance Mark's suggestion #1, if the copying operation is a significant portion of the run time, you could also use pointers.
The only way to be sure about this stuff is to measure the run time ... Fortran provides intrinsic functions to measure both CPU and clock time. Otherwise you may modify your some portion of you code to make it faster, perhaps making it less easier to understand and possibly introducing a bug, possibly without much improvement in runtime ... if that portion was taking a small amount of the total runtime, no amount of cleverness will can make much difference.
As High Performance Mark says, though the current semantics are elegant, you probably want to guard against an infinite loop. One approach:
PotentialLoop: do i=1, MaxIter
blah
Converged = test...
if (Converged) exit PotentialLoop
blah
end do PotentialLoop
if (.NOT. Converged) write (*, *) "error, did not converge"
I = 1
DO
NEWPOT = !bla bla bla
IF (ABS(NEWPOT-OLDPOT).LT.TOL) EXIT
OLDPOT = NEWPOT
I = MOD(I,NUMPOINTS) + 1
END DO
Maybe better
I = 1
DO
NEWPOT = !bla bla bla
IF (ABS(NEWPOT-OLDPOT).LT.TOL) EXIT
OLDPOT = NEWPOT
IF (I.EQ.NUMPOINTS) THEN
I = 1
ELSE
I = I + 1
END IF
END DO