FORALL causes stack overflow - fortran

I am implementing a genetic algorithm to numerically estimate some coefficients in a system of ODEs based on experimental data. I am just learning Fortran along as I implement the algorithms. My config is a Intel Fortran 2015 running on Windows 7/Visual Studio 2013, on an old i7 processor.
I have the following piece among a multitude of lines of code:
DO i = 1, N_CROMOSOMES
IF (population(9,i) < 0.0_DOUBLE) population(9, i) = square_error(population(1:8, i))
END DO
Where I just defined DOUBLE to be:
INTEGER, PARAMETER :: DOUBLE = 16
N_CROMOSOMES is an INTEGER argument to the function, that defines the size of the array population, which in turn is a (9 x N_CROMOSOMES) array of type REAL(KIND=DOUBLE). For each column on this array, its first 8 elements represent the 8 coefficients that I am estimating, and the ninth element is the error associated with that particular 8 guesses for the coefficients. square_error is the function that determines it.
In this point of the program, I have marked columns that were just created or that were altered as having an error of -1. Hence, the "IF (population(9,i)<0.0_DOUBLE)": I am checking the columns of the array whose error is -1 in order to compute their error.
The thing is, I just rewrote most of my code, and spent the past few days correcting mysterious bugs. Before this, the code worked just fine with a FORALL instead of DO. Now it says gives an error "stack overflow" when I use FORALL, but works with DO. But it takes a lot more time to do its job.
Does anyone knows the cause of this, and also, how to solve it? It is clear to me that my code can highly benefit from paralellization, but I am not so sure how to do it.
Thanks for your time.

Related

Different least square errors with armadillo functions

Hello stackoverflow community,
I have troubles in understanding a least-square-error-problem in the c++ armadillo package.
I have a matrix A with many more rows than columns (5000 to 100 for example) so it is overdetermined.
I want to find x so that A*x=b gives me the least square error.
If i use the solve function of armadillo on my data like "x = Solve(A,b)" the error of "(A*x-b)^2" is sometimes way to high.
If on the other hand I solve for x with the analytical form by "x = (A^T * A)^-1 *A^T * b" the results are always right.
The results for x in both cases can differ by 10 magnitudes.
I had thought that armadillo would use this analytical form in the background if the system is overdetermined.
Now I would like to understand why these two methods give such different results.
I wanted to give a short example program, but i can't reproduce this behavior with a short program.
I thought about giving the Matrix here, but with 5000 times 100 it's also very big. I can deliver the values for which this happens though if needed.
So as a short background.
The matrix I get from my program is a numerically solved reaction of a nonlinear oscillator in which I put information inside by wiggeling a parameter of this system.
Because the influence of this parameter on the system is small, the values of my different rows are very similar but never the same, otherwise armadillo should throw an error.
I'm still thinking that this is the problem, but the solve function never threw any error.
Another thing that confuses me is that in a short example program with a random matrix, the analytical form is waaay slower than the solve function.
But on my program, both are nearly identically fast.
I guess this has something to do with the numerical convergence of the pseudo inverse and the special case of my matrix, but for that i don't know enough about how armadillo works.
I hope someone can help me with that problem and thanks a lot in advance.
Thanks for the replies. I think i figured the problem out and wanted to give some feedback for everybody who runs into the same problem.
The Armadillo solve function gives me the x that minimizes (A*x-b)^2.
I looked at the values of x and they are sometimes in the magnitude of 10^13.
This comes from the fact that the rows of my matrix only slightly change. (So nearly linear dependent but not exactly).
Because of that i was in the numerical precision of my doubles and as a result my error sometimes jumped around.
If i use the rearranged analytical formular (A^T * A)*x = *A^T * b with the solve function this problem doesn't occur anymore because the fitted values of x are in the magnitude of 10^4. The least square error is a little bit higher but that is okay, as i want to avoid overfitting.
I now additionally added Tikhonov regularization by solving (A^T * A + lambda*Identity_Matrix)*x = *A^T * b with the solve function of armadillo.
Now the weight vectors are in the order of around 1 and the error nearly doesn't change compared to the formular without regularization.

Writing out arrays fixes a problem for a reason unknown to me

When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.

Univac Math pack subroutines in old-school FORTRAN (pre-77)

I have been looking at an engineering paper here which describes an old FORTRAN code for solving pipe flow equations (it's dated 1974, before FORTRAN was standardised as Fortran 77). On page 42 of this document the old code calls the following subroutine:
C SYSTEM SUBROUTINE FROM UNIVAC MATH-PACK TO
C SOLVE LINEAR SYSTEM OF EQ.
CALL GJR(A,51,50,NP,NPP,$98,JC,V)
It's a bit of a long shot, but do any veterans or ancient code buffs recall this system subroutine and it's input arguments? I'm having trouble finding any information about it.
If I can adapt the old code my current application I may rewrite this in C++ or VBA, and will be looking for an equivalent function in these languages.
I'll add to this answer if I find anything more detailed, but I have a place to start looking for the arguments to GJR.
This function is part of the Sperry UNIVAC MATH-PACK library - a full list of functions in the library can be found in http://www.dtic.mil/dtic/tr/fulltext/u2/a170611.pdf GJR is described as "determinant; inverse; solution of simultaneous equations". Marginally helpful.
A better description comes from http://nvlpubs.nist.gov/nistpubs/jres/74B/jresv74Bn4p251_A1b.pdf
A FORTRAN subroutine, one of the Univac 1108 Math Pack programs,
available on the library tapes at the University of Maryland computing
center. It solves simultaneous equations, computes a determinant, or
inverts a matrix or any combination of the three above by using a
Gauss-Jordan elimination technique with column pivoting.
This is slightly more useful, but what we really want is "MATH-PACK, Programmer Reference", UP-7542 Rev. 1 from Sperry-UNIVAC (Unisys) I find a lot of references to this document but no full-text PDF of the document itself.
I'd take a look at the arguments in the function call, how they are set up and how the results are used, then look for equivalent routines in LAPACK or BLAS. See http://www.netlib.org/lapack/
I have a few books on piping networks including "Analysis of Flow in Pipe Networks" by Jeppson (same author as in the original PDF hosted by USU) https://books.google.com/books/about/Analysis_of_flow_in_pipe_networks.html?id=peZSAAAAMAAJ - I'll see if I can dig that up. The book may have a more portable matrix solver than the proprietary Sperry-UNIVAC library.
Update:
From p. 41 of http://ngds.egi.utah.edu/files/GL04099/GL04099_1.pdf I found documentation for the CGJR function, the complex version of GJR from the same library. It is likely the only difference in the arguments is variable type (COMPLEX instead of REAL):
CGJR is a subroutine which solves simultaneous equations, computes a determinant, inverts a matrix, or does any combination of these three operations, by using a Gauss-Jordan elimination technique with column pivoting.
The procedure for using CGJR is as follows:
Calling statement: CALL CGJR(A,NC,NR,N,MC,$K,JC,V)
where
A is the matrix whose inverse or determinant is to be determined. If simultaneous equations are solved, the last MC-N columns of the matrix are the constant vectors of the equations to be solved. On output, if the inverse is computed, it is stored in the first N columns of A. If simultaneous equations are solved, the last MC-N columns contain the solution vectors. A is a complex array.
NC is an integer representing the maximum number of columns of the array A.
NR is an integer representing the maximum number of rows of the array A.
N is an integer representing the number of rows of the array A to be operated on.
MC is the number of columns of the array A, representing the coefficient matrix if simultaneous equations are being solved; otherwise it is a dummy variable.
K is a statement number in the calling program to which control is returned if an overflow or singularity is detected.
1) If an overflow is detected, JC(1) is set to the negative of the last correctly completed row of the reduction and control is then returned to statement number K in the calling program.
2) If a singularity is detected, JC(1)is set to the number of the last correctly completed row, and V is set to (0.,0.) if the determinant was to be computed. Control is then returned to statement number K in the calling program.
JC is a one dimensional permutation array of N elements which is used for permuting the rows and columns of A if an inverse is being computed .. If an inverse is not computed, this array must have at least one cell for the error return identification. On output, JC(1) is N if control is returned normally.
V is a complex variable. On input REAL(V) is the option indicator, set as follows:
invert matrix
compute determinant
do 1. and 2.
solve system of equations
do 1. and 4.
do 2. and 4.
do 1., 2. and 4.
Notes on usage of row dimension arguments N and NR:
The arguments N and NR refer to the row dimensions of the A matrix.
N gives the number of rows operated on by the subroutine, while NR
refers to the total number of rows in the matrix as dimensioned by the
calling program. NR is used only in the dimension statement of the
subroutine. Through proper use of these parameters, the user may specify that only a submatrix, instead of the entire matrix, be operated on by the subroutine.
In your application (pipe flow), look at how matrix A and vector V are populated before the call to GJR and how they are used after the call.
You may be able to replace the call to GJR with a call to LAPACK's SGESV or DGESV without much difficulty.
Aside: The Fortran community really needs a drop-in 'Rosetta library' that wraps LAPACK, etc. for replacing legacy/proprietary IBM, UNIVAC, and Numerical Recipes math functions. The perfect case would be that maintainers would replace legacy functions with de facto standard math functions but in the real world, many of these older programs are un(der)maintained and there simply isn't the will (or, as in this case, the ability) to update them.
Update 2:
I started work on a compatibility library for the Sperry MATH-PACK and STAT-PACK routines as well as a few other legacy libraries, posted at https://bitbucket.org/apthorpe/alfc
Further, I located my copy of Jeppson's Analysis of Flow in Pipe Networks which is a slightly more legible version of the PDF of Steady Flow Analysis of Pipe Networks: An Instructional Manual and modernized the codes listed in the text. I have posted those at https://bitbucket.org/apthorpe/jeppson_pipeflow
Note that I found a number of errors in both the code listings and in the example problems given for many of the codes. If you're trying to learn how to write a pipe flow solver based on Jeppson's paper or text, I'd strongly suggest reviewing my updated codes and test cases because they will save you hours of effort trying to understand why the code doesn't work and why you can't replicate the example cases. This took a fair amount of forensic computing to sort out.
Update 3:
The source to CGJR and DGJR can be found in http://www.dtic.mil/dtic/tr/fulltext/u2/a110089.pdf. DGJR is the closest to what you want, though it references more routines that aren't available (proprietary UNIVAC error-handling routines). It should be easy to convert `DGJR' to single precision and skip the proprietary calls. Otherwise, use the compatibility library mentioned above.

A unique type of data conversion

in the following code
tt=5;
for(i=0;i<tt;i++)
{
int c,d,l;
scanf("%lld%lld%lld",&c,&d,&l);
printf("%d %d %d %d",c,d,l,tt);
}
in the first iteration, the value of 'tt' is changing to 0 automatically.
I know that i have declared c,d,l as int and taking input as long long so it is making c,d=0. But still, i m not able to understand how tt is becoming 0.
Small, but obligatory announcement. As it was said in comments, you face undefined behavior, so
don't be surprised by tt assigned to zero
don't be surprised by tt not assigned to zero after insignificant code changes (e.g. reordering initialization from "int i,tt;" to "int tt, i;" or vice versa)
don't be surprised by tt not assigned to zero after compiling with different flags or different compiler version or for different platform or for testing with different input
don't be surprised by anything. Any behavior is possible.
You can't expect this code to work one way or another, so don't ever use it in real program.
However, you seem to be OK with that, and the question is "what is actually happening with tt". IMHO this question is really great, it reveals passion to understand programming deeper, and it helps in digging into lower layer. So lets get started.
Possible explanation
I failed to reproduce behavior on VS2015, but situation is quite clear. Actual data aligning, variable sizes, endianness, stack growth direction and other details may differ on your PC, but the general idea should be the same.
Variables i, tt, c, d, l are local, so they are stored on stack. Lets assume, sizeof(int) is 4 and sizeof(long long) is 8 which is quite common. Then one of possible data alignments is shown on picture (addresses grow from left to right, each cell represents one byte):
When doing scanf, you pass address of c (blue arrow on next pict) for filling with data. But size of data is 8 bytes, so data of both c and tt are overwritten (blue cells on the pict). For little-endian representation, you always write zeroes to tt unless really big number is entered by user, while c actually gets valid data for small numbers.
However, valid data in c will be rewritten the same way during filling d, the same will happen to d while filling l. So only l will get nonzero value in described case. Easy test: enter large number for c, d, l and check if tt is still zero.
How to get precise answer
You can get all answers from assembly code. Enable disassembly listing (exact steps depend on toolchain: gcc has -S option, visual studio has "goto disassembly" item in context menu while on breakpoint) and analyze listing. It's really helpful to see exact instructions your CPU is going to execute. Some debuggers allow executing instructions one by one. So you need to find out how variables are alligned on stack and when exactly are they overwritten. Analyzing scanf is hard for beginners, so you can start with the simplified version of your program: replace scanf with the following (can't test, but should work):
*((long long *)(&c)) = 1; //or any other user specified value
*((long long *)(&d)) = 2;
*((long long *)(&l)) = 3;

Same program works on macos but fails on windows [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Ok, so I am working on an interface on qt, and I am using qtcreator as an IDE. The thing is that the algorithm works normally on mac, but on windows the same program gets an error.
the only difference is the compiler. The compiler I am using on windows is the visual c++, and on mac is clang (I think).
Is it possible that the same algorithm works on mac but doesn't on windows? If so, what problem could it be?
EDIT: I see that I got downvoted. I don't know exactly why. I know already what the error means, vector subscription out of range. The thing is that I don't want to waste time trying to find where the error is because it actually works fine on mac. Also the pc is better than the mac one.
EDIT 2: Indeed it looks like the same code works differently on windows than on mac. Tomorrow I will test it on mac to try to understand this, but the code that changes is this one:
vector<double> create_timeVector(double simulationTime, double step) {
vector<double> time;
time.push_back(0);
double i = 0;
do {
++i;
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
return time;
}
The size of the vector that is returned is one size bigger on windows than on the mac. The thing is that I didn't make any changes on the code.
The probable reason why it works differently is that you're using a floating point calculation to determine when the loop is to stop (or keep going, depending on how you look at it).
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
You have step as a double, simulationTime as a double, and a vector<double> called time being used. That is a recipe for loops running inconsistently across compilers, compiler optimizations, etc.
Floating point is not exact. The way to make the loops consistent is to not use any floating point calculations in the looping condition.
In other words, by hook or by crook, compute the number of calculations you need by using integer arithmetic. If you need to step 100 times, then it's 100, not a value computed from floating point arithmetic:
For example:
for (float i = 0.01F; i <= 1.0F; i+=0.01F)
{
// use i in some sort of calculation
}
The number of times that loop executes can be 99 times, or it can be 100 times. It depends on the compiler and any floating point optimizations that may apply. To fix this:
for (int i = 1; i <= 100; ++i )
{
float dI = static_cast<float>(i) / 100.0F;
// use dI instead of i some sort of calculation
}
As long as i isn't changed in the loop, the loop is guaranteed to always do 100 iterations, regardless of the hardware, compiler optimizations, etc.
See this: Any risk of using float variables as loop counters and their fractional increment/decrement for non "==" conditions?
vector subscript out of range means that you used [n] on a vector, and n was less than 0 or greater than or equal to the number of elements in the vector.
This causes undefined behaviour, so different compilers may react in different ways.
To get reliable behaviour in this case, one way is to use .at(n) instead of [] and make sure you catch exceptions. Another way is to check your index values before applying them, so that you never access out of bounds in the first place.