cannot use SUNDIALS-KINSOL in FORTRAN subroutine? - fortran

Thank you for having a look at this problem.
Problem:
seg. fault when returning from f90 subroutine that contains KINSOL solving process, after the correct computation result has been generated. No problem when the same solving process is in the main program.
Environment:
linux,
gcc,
sundials static libs
How to initiate the problem:
get the attached REDUCED test code
module moduleNonlinearSolve
integer,save::nEq
contains
subroutine solveNonlinear(u)
double precision::u(*)
integer iout(15),ier
double precision rout(2),koefScal(nEq)
koefScal(:)=1d0
call fnvinits(3,nEq,ier)
call fkinmalloc(iout,rout,ier)
call fkinspgmr(50,10,ier)
call fkinsol(u,1,koefScal,koefScal,ier)
call fkinfree()
do i=1,nEq
write(*,*),i,u(i)
end do
end subroutine
end module
subroutine fkfun(u,fval,ier)
use moduleNonlinearSolve
double precision::u(*)
double precision::fval(*)
integer::ier
forall(i=2:nEq-1)
fval(i)=-u(i-1)+2d0*u(i)-u(i+1)-1d0
end forall
fval(1)=u(1)+2d0*u(1)-u(2)-1d0
fval(nEq)=-u(nEq-1)+2d0*u(nEq)+u(nEq)-1d0
ier=0
end subroutine
program test
use moduleNonLinearSolve
double precision u(10)
nEq=size(u)
u(:)=10d0
call solveNonlinear(u)
end program``
compile
$ gfortran -c -Wall -g test.f90
$ gfortran -Wall -g -o test test.o -lsundials_fkinsol -lsundials_fnvecserial -lsundials_kinsol -lsundials_nvecserial -llapack -lblas
run
$ ./test
Note: It would work flawlessly if put all the SUNDIALS procedures in the main program.
Thank you very much for any input.
Mianzhi

According to the KINSOL documentation, the first argument of fkinmalloc must be of the same integer type as the C type long int. In your case, long int is 8 bytes long, but you are passing in an array of 4 byte integers. This will lead to fkinmalloc trying to write beyond the bounds of the array, and into some other memory. This typically leads to memory corruption, which has symptoms just like what you are observing: Crash at some random later point, such as when returning from a function. You should be able to confirm this by running the program through valgrind, which will probably report invalid writes of size 8. Anyway, replacing
integer :: iout(15)
with
integer*8 :: iout(15)
should solve the problem.

Related

Fortran performance for complex vs real variable

So, I was wondering if it is preferable to work on the real and imaginary part of the array separately instead of a complex variable for performance gain. For example,
program test
implicit none
integer,parameter :: n = 1e8
real(kind=8),parameter :: pi = 4.0d0*atan(1.0d0)
complex(kind=8),parameter :: i_ = (0.0d0,1.0d0)
double complex :: s
real(kind=8) :: th(n),sz, t1,t2, s1,s2
integer :: i
sz = 2.0d0*pi/n
do i=1,n
th(i) = sz*i
enddo
call cpu_time(t1)
s= sum(exp(th*i_))
call cpu_time(t2)
print *, t2-t1
call cpu_time(t1)
s1 = sum(cos(th))
s2 = sum(sin(th))
call cpu_time(t2)
print *, t2-t1
end program test
And the time it takes
3.7041089999999999
2.6299830000000002
So, the splited calculation does takes less time. This was a very simple calculation. But I have some long calculation and using complex variables improves the readability and does takes less lines of code. But will it sacrifice the performance of my code ? Or is it always advisable to work on the real and imaginary part separately?
Better to understand what kind of trick compiler can do for you. Generally it's not worth the effort to do so nowadays. Create a little script to study the CPU time of your code.
#!/bin/bash
src=a.f90
for fcc in gfortran ifort; do
$fcc --version
for flag in "-O0" "-O1" "-O2" "-O3"; do
fexe=$fcc$flag
echo $fcc $src -o "$fcc$flag" $flag
$fcc $src -o $fexe $flag
echo "run $fexe ..."
./$fexe
done
done
You will notice the some of the CPU time may show very close to 0, as the compiler is clever enough to discard the computation that you never used. Make the change to avoid the compile optimize out your computation.
print *, t2-t1, s
print *, t2-t1, s1, s2
The result of using ifort is here, beside the speed, notice the ACCURACY, speed comes at a price:
ifort (IFORT) 14.0.2
ifort a.f90 -o ifort-O0 -O0
run ifort-O0 ...
3.57999900000000 (-2.319317404797516E-009,7.034712528404704E-009)
4.07666600000000 -2.319317404797516E-009 7.034712528404704E-009
ifort a.f90 -o ifort-O1 -O1
run ifort-O1 ...
3.30333300000000 (-2.319317404797516E-009,7.034712528404704E-009)
3.54666700000000 -2.319317404797516E-009 7.034712528404704E-009
ifort a.f90 -o ifort-O2 -O2
run ifort-O2 ...
3.08000000000000 (-2.319317404797516E-009,7.034712528404704E-009)
1.13666600000000 -6.304215927066537E-009 1.737099880017717E-009
ifort a.f90 -o ifort-O3 -O3
run ifort-O3 ...
3.08333400000000 (-2.319317404797516E-009,7.034712528404704E-009)
1.13666600000000 -6.304215927066537E-009 1.737099880017717E-009
sum 31.999 3.496 0:35.82 99.0% 0
you may wonder what happens between -O1 and -O2 flag, if check the compiled object file, the actual internal function it linked has changed from:
U cexp
U cos
U sin
to :
U __svml_cos2
U __svml_sin2
U cexp
svml stand for short vector math library. Some trade off between speed and accuracy can be found in Intel IPP Library Fixed-Accuracy Arithmetic Functions

Reading real*8 variable with value 0 with real*4 results a large number in fortran without warning

Reading real*4 variable with value 0 with real*8 results a large number, sometimes without warning.
I'm not good at Fortran. I was just running a Fortran code I got from someone else, and it made a segmentation fault. While I was debugging it, I found that one of the subroutines is reading a variable with value 0 defined with real*8 as real*4 results a large value.
I tried to reproduce it with simple code, but compiler showed a warning for the argument mismatch. I had to nest codes to reproduce the suppressed warning in simple code, but I'm not sure what's the exact condition for suppressed warning.
Actually, for some reason, I'm suspecting it may be the problem of my compiler, as the code (not the example code, original code) ran fine on the PC of the person who gave me the code.
file hello.f:
implicit none
call sdo()
END
file test.f:
subroutine sdo()
implicit none
real*4 dsecs
dsecs=0
write(0,*) dsecs
call sd(dsecs)
return
end
file test2.f:
subroutine sd(dsecs)
implicit none
real*8 dsecs
write(0,*) dsecs
return
end
compilation and execution:
$ gfortran -o hello hello.f test.f test2.f
$ ./hello
Expected result:
0. 00000000
0. 0000000000000000
Actual results:
0. 00000000
-5.2153889789423361E+223
It is not the problem of the compiler. It is the problem of the code. Your code did issue a warning for me that you were doing something nefarious, as it should. The subroutine that thinks dsecs is 4 bytes long sent 4 bytes. The subroutine that thinks dsecs is 8 bytes long looked at 8 bytes. What's in the other 4 bytes? Who knows. How does it look like when the two get mixed together? Probably not what you want. It's like accidentally getting served a scoopful of half icecream and half garbage: unlikely to taste the way you thought.
This is one of those problems that are very simply solved with that classic joke: "Doctor, doctor, it hurts when I do this!" - "Then... don't do that."
EDIT: Sorry, I cheated. I didn't compile them as separate programs. When I do, I don't get warnings. This is also normal - at compilation step, you didn't specify how foreign subroutines look so it couldn't complain, and at linking step compiler doesn't check any more.

when to use iso_Fortran_env ,selected_int_kind,real(8),or -fdefault-real-8 for writing or compiling fortran code?

I have this simple code which uses DGEMM routine for matrix multiplication
program check
implicit none
real(8),dimension(2,2)::A,B,C
A(1,1)=4.5
A(1,2)=4.5
A(2,1)=4.5
A(2,2)=4.5
B(1,1)=2.5
B(1,2)=2.5
B(2,1)=2.5
B(2,2)=2.5
c=0.0
call DGEMM('n','n',2,2,2,1.00,A,2,B,2,0.00,C,2)
print *,C(1,1)
print *,C(1,2)
print *,C(2,1)
print *,C(2,2)
end program check
now when i compile this code with command
gfortran -o check check.f90 -lblas
I get some random garbage values. But when I add
-fdefault-real-8
to the compiling options I get correct values.
But since it is not a good way of variable declaration in Fortran. So I used the iso_fortran_env intrinsic module and added two lines to the code
use iso_fortran_env
real(kind=real32),dimension(2,2)::A,B,C
and compiled with
gfortran -o check check.f90 -lblas
Again I got wrong output .
Where I'm erring in this code?
I'm on 32bit linux and using GCC
DGEMM expects double precision values for ALPHA and BETA.
Without further options, you are feeding single precision floats to LAPACK - hence the garbage.
Using -fdefault-real-8 you force every float specified to be double precision by default, and DGEMM is fed correctly.
In your case, the call should be:
call DGEMM('n','n',2,2,2,1.00_8,A,2,B,2,0.00_8,C,2)
which specifies the value for alpha to be 1 as a float of kind 8, and zero of kind 8 for beta.
If you want to perform the matrix-vector product in single precision, use SGEMM.
Note that this is highly compiler-specific, you should consider using REAL32/REAL64 from the ISO_Fortran_env module instead (also for the declaration of A, B, and C).

Program crash for array copy with ifort

This program crashes with Illegal instruction: 4 on MacOSX Lion and ifort (IFORT) 12.1.0 20111011
program foo
real, pointer :: a(:,:), b(:,:)
allocate(a(5400, 5400))
allocate(b(5400, 3600))
a=1.0
b(:, 1:3600) = a(:, 1:3600)
print *, a
print *, b
deallocate(a)
deallocate(b)
end program
The same program works with gfortran. I don't see any problem. Any ideas ? Unrolling the copy and performing the explicit loop over the columns works in both compilers.
Note that with allocatable instead of pointer I have no problems.
The behavior is the same if the statement is either inside a module or not.
I confirm the same behavior on ifort (IFORT) 12.1.3 20120130.
Apparently, no problem occurs with Linux and ifort 12.1.5
I tried to increase the stack size with the following linking options
ifort -Wl,-stack_size,0x40000000,-stack_addr,0xf0000000 test.f90
but I still get the same error. Increasing ulimit -s to hard same problem.
Edit 2: I did some more debugging and apparently the problem happens when the array splicing operation
b(:, 1:3600) = a(:, 1:3600)
involves a value suspiciously close to 16 M of data.
I am comparing the opcodes produced, but if there is a way to see an intermediate code form that is more communicative, I'd gladly appreciate it.
Your program is correct (though I would prefer allocatable to pointer if you do not need to be able to repoint it). The problem is that ifort by default places all array temporaries on the stack, no matter how large they are. And it seems to need an array temporary for the copy operation you are doing here. To work around ifort's stupid default behavior, always use the -heap-arrays flag when compiling. I.e.
ifort -o test test.f90 -heap-arrays 1600
The number behind -heap-arrays is the threshold where it should begin using the heap. For sizes below this, the stack is used. I chose a pretty low number here - you can probably safely use higher ones. In theory stack arrays are faster, but the difference is usually totally negligible. I wish intel would fix this behavior. Every other compiler has sensible defaults for this setting.
Use "allocatable" instead of "pointer".
real, allocatable :: a(:,:), b(:,:)
Assigning a floating point number to a pointer looks dubious to me.

Variable strangely takes the value zero after the call of a subroutine

I have been facing some issues trying to convert a code previously compiled with compaq visual fortran 6.6 to gfortran.
Here is a specific problem I have met with gfortran :
There is a variable called "et" which takes the value 3E+10. Then the program calls a subroutine. "et" doesn't appear in the subroutine, but after coming back to the main program it has now the value 0.
When compliling with compaq visual fortran I didn't have this problem.
The code I am working on is a huge scientific program, so I put below only a small part of it :
c
c calculate load/unload modulus
c
500 t=(s1-s3)/2.
aa=1.00
if(iconeps.ne.1)bb=1.00
if(smean.lt.ap1) smean=ap1
if(xn.gt.0.000001) aa=(smean/atmp)**xn
if(iconeps.eq.1)go to 220
if(xm.gt.0.000001) bb=(smean/atmp)**xm
220 if(t.ge.0.99*sm1) go to 600
et=xku*aa*atmp+tt*tm1
if(iconeps.ne.1)bt=xkb*atmp*bb
go to 900
600 et=(xkl*aa*atmp+tt*tm1)*(1.0-rf*sr)**2
if(iconeps.ne.1)bt=xkb*atmp*bb
900 continue
btmax=17.0*et
btmin=0.33*et
if(iconeps.ne.1)then
tbt=(alf1+alf3*dtt)*dtt*(1.+vide)*tm2
btf=bt+tbt
bt=btf
endif
if(bt.lt.btmin) bt=btmin
if(bt.gt.btmax) bt=btmax
if(iconeps.eq.1)go to 1100
1000 continue
1050 if(mt.eq.mtyp4c)goto 1100
s=0.0
t=0.0
call shap4n(s,t,f,pfs,pft) ! Modification by NHV
call thick4n(s,t,xe,ye,thick)
call bmat4n(xe,ye,f,pfs,pft,b,detj,thick)
c calculate incremental strains
do 1300 i=1,4
temp=0.0
do 1200 j=1,8
1200 temp=temp+b(i,j)*disp(j)
1300 depi(i)=temp
epsv=0.0
do 1400 i=1,2
1400 epsv=epsv+depi(i)
epsv=epsv+depi(4)
ev=vide-(1.+vide)*epsv
if(ev.lt.0.0)ev=vide*.01
1100 continue
call perm(permws,xkw,coef,rw,tvisc,ev,vide,tt,pp)
: "et" keeps the good value until just before calling the subroutine "perm". Just after this subroutine it takes the value zero.
"et" isn't in any common block
This piece of code is part of a subroutine called by several different subroutines. What is even more strange is that when it is called in other parts of the code I doesn't have this problem ("et" keeps its value)
So if someone has ever met this kind of problem or have any idea about it I will be very gratefull
Perhaps you have a memory access error, such as an array bounds violation, or a mismatch between actual and dummy arguments. Are the interfaces of the subroutines explicit, such as being "used" from a module? Also try turning on compiler debugging options ... obviously subscript checking, but others might catch something. An extensive set for gfortran 4.5 or 4.6 is:
-O2 -fimplicit-none -Wall -Wline-truncation -Wcharacter-truncation -Wsurprising -Waliasing -Wimplicit-interface -Wunused-parameter -fwhole-file -fcheck=all -std=f2008 -pedantic -fbacktrace
Subscript checking is included in fcheck=all
I had this problem. In my main program, I was using double precision but the numbers I calculated with in my subroutine were single precision. After I changed them to double it fixed the problem and I got actual values instead of 0.