detection of the ILP mode of BLAS

detection of the ILP mode of BLAS - fortran

the blas mathematical library is distributed either in the i32lp64 mode (that is, with integer*4 integers) or in the ilp64 mode (with integer*8 or 8-bytes integers).
The question is how to distinguish between these two BLAS modes (i32lp64 vs ilp64) in a short Fortran routine, and without giving the segfault crash.

Well,
when you link this program against the ilp64 blas library, we get program crash. In the case of i32lp64 there will be no crash.
And this is the distinction between ilp64 and i32lp64 blas; not very elegant, but doable.
program test
integer*8, parameter :: inc = +1_8 + 2_8**33_8
real*8 :: a(3),d
integer :: ii
a(1)=1.0d0; a(2)=1.0d0;a(3)=1.0d0
d=ddot(3,a,inc,a,inc)
print *,"inc=",inc
print *,"d=",d
end program

Related

What is the need for a C_INT fortran type? Are C and Fortran integers so different?

If I declare an integer in fortran as:
INTEGER(C_INT) :: i then, if I understand correctly, it's safe to be passed to a C function. Now disregarding the added headache of always declaring integers this way, is there any reasons not to always declare your variables as C-interoperable? Is there any downside to doing so?
Also, in the case of something as simple a an integer, what exactly does the C_INT change from a traditional Fortran integer? Are Fortran and C integers actually different?

The C integer size is usually fixed because the OS is compiled using C and the system bindings are published as C headers. The operating system designers choose one of the common models like LLP64, LP64, ILP64 (for 64-bit pointer OS's) and the C compilers then follow this choice.
But Fortran compilers are more free. You can set them to a different configuration. Fortran compiler set to use 8-byte default integers and 8-byte default reals are still perfectly standard conforming! (C would be as well but the choice is fixed in the operating system.)
And because the integers in C and in Fortran do not have to match you need a mechanism to portably select the C interoperable kind, whatever the default kind is.
This is not just academic. you can find libraries like MKL compiled with 8-byte integers (the ILP64 model, which is not used by common operating systems for C).
So when you call some API, some function whose interface is defined in C, you want to call it properly, not depending on the settings of the Fortran compiler. That is the use case for the C-interoperable types. If a C function requires an int, you give it an integer(c_int) and you do not care if the Fortran default integer is the same or not.

Basically there is no fundamental downside and on the same compiler there is no fundamental benefit.
Even in icc a long int is different on 32 and 64 bit OS, so you need to know (or define what you want on the c-side of things).
The only one that is clearly 4-bytes is the C_INT32_T, and hence it is that one that I generally use. And it more of a "future proofing" endeavour to use C_INT32_T , to define it as you want it to be in a fixed numbers of bits sizing.
These all give a 4 byte integer on iFort.
USE ISO_C_BINDING, ONLY : C_INT, C_INT32_T
Integer ::
INTEGER(KIND=4) :: !"B" At least on iFort
INTEGER*4 !"A"
INTEGER(KIND=C_INT) ::
INTEGER(KIND=C_INT32_T) :: !"C"
Usually one finds older code with the style of "A".
I routinely use the style of "B", but even though this is a "defacto standard" it is not conforming to "the standard".
Then when I become concerned with portability I run-through and change the "B" style to "C" style and then I have less heartburn considering others who may be later compiling with the gfortran compiler... Or even some other compiler.
The single byte:
BYTE :: !Good
INTEGER(KIND=1) ::
INTEGER(KIND=C_SIGNED_CHAR) ::
INTEGER(KIND=C_INT8_T) :: !Best
The two byte:
INTEGER(KIND=2) ::
INTEGER(KIND=C_INT16_T) :: !Best
INTEGER(KIND=C_SHORT) ::
The 8 byte:
INTEGER(KIND=8) ::
INTEGER(KIND=C_INT64_T) :: !Best
INTEGER(KIND=C_LONG_LONG) ::
While this looks somewhat fishy... One can also say:
LOGICAL(KIND=4) ::
LOGICAL(KIND=C_INT32_T) :: !Best here may be C_bool and using 1-byte logical in fortran...? Assuming that there is no benefit with the size of the logical being the same as some float vector
LOGICAL(KIND=C_FLOAT) ::

How can a Fortran-OpenACC routine call another Fortran-OpenACC routine?

I'm currently attempting to accelerate a spectral element fluids solver by porting most of the routines to a GPGPU using OpenACC with the PGI (15.10) compiler. The source code is written in OO-Fortran. This software has "layers" of subroutines that call other functions and subroutines. To bring the code over to a GPU using openacc, I've been first attempting to place "$acc routine" directives in each routine that needs to be ported. During compilation, using "pgf90 -acc -Minfo=accel", I receive the following error :
nvvmCompileProgram error: 9.
Error: /tmp/pgacc2lMnIf9lMqx8.gpu (146, 24): parse invalid forward reference to function 'innerroutine_' with wrong type!
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (Test.f90: 1)
This same problem can be reproduced with the following simple fortran program :
PROGRAM Test
IMPLICIT NONE
CONTAINS
SUBROUTINE OuterRoutine( sol, xF, N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
REAL(KIND=8) :: sol(0:N,1:3)
REAL(KIND=8) :: xF(0:N,1:3)
! LOCAL
INTEGER :: i
DO i = 0, N
xF(i,1:3) = InnerRoutine( sol(i,1:3) )
ENDDO
END SUBROUTINE OuterRoutine
FUNCTION InnerRoutine( sol ) RESULT( xF )
!$acc routine
IMPLICIT NONE
REAL(KIND=8) :: sol(1:3)
REAL(KIND=8) :: xF(1:3)
xF(1) = sol(1)*sol(2)
xF(2) = sol(1)*sol(3)
xF(3) = sol(1)*sol(1)
END FUNCTION InnerRoutine
END PROGRAM Test
Again, compiling the above program with "pgf90 -acc -Minfo=accel" yields the problem.
Does openacc support acc-enabled routines calling other acc-enabled routines ?
If so, what am I doing wrong ?

You're using the OpenACC "routine" directive correctly. The problem here is that we (PGI) don't yet support using "routine" with array-valued functions. The problem being that this support requires the compiler to create a temp array to hold the return value. Meaning that every thread would need to allocate this temp array causing a severe performance penalty. Worse is how to handle sharing the temp array if is a gang or worker level routine.
We do have open requests for this feature, but it may be awhile before we can address it. In the meantime, can you try inlining the routine? i.e. compile with "-Minline".

Catch integer exceptions in Fortran

Is there a way to catch integer exceptions with gfortran or ifort like there is for catching floating point exceptions?
Consider this simple program to calculate the factorial:
program factorial
use, intrinsic :: iso_fortran_env
implicit none
integer(8) :: fac
real(REAL64) :: facR
integer,parameter :: maxOrder = 30
integer :: i
fac = 1 ; facR = 1.e0_REAL64
do i=2,maxOrder
fac=fac*i ; facR=facR*real(i,REAL64)
write(*,*) i, fac, facR
enddo ! i
end program
At some point there will be an overflow - for integer(8) as shown here, it will occur at around 21. But without the calculation using floats as a reference I couldn't tell for sure...

There is nothing in the Fortran standard that deals with integer overflow. As it stands you can't even rely on integers wrapping round when a computation exceeds the maximum value representable in the chosen kind. So, while a test such as
huge(2_8)+1 < 0_8
is likely to work with most current compilers (at least the ones I have used recently) it's not guaranteed.
I am sure that neither Intel Fortran nor gfortran provide compiler-generated run-time checks for integer overflow either. I'm not sure about other compilers but I'll be (pleasantly) surprised to learn that any of them do.
I think, therefore, that you have to proceed with your current approach.

gfortran will catch integer overflow with -ftrapv flag, see man gcc:
-ftrapv
This option generates traps for signed overflow on addition, subtraction, multiplication operations.
ifort does not seem to have that capability.

How to automatically convert a Fortran90 program whose variables are in double precision into real?

everyone
I have a Fortran90 program, and the variables are in double precision or complex*16, now I have to write another program whose variables are in real or complex, and all other things are the same as the original program.
The straightforward way is to rewrite every declaration, but I'm wondering if there are other simpler ways to achieve this, I'm using gfortran as the compiler.
Thanks

Probably the cleanest (althoug not the easiest) way would be to rewrite your program to have adjustable precision for the variables:
program test
implicit none
integer, parameter :: rp = kind(1.0d0)
real(rp) :: myreal
complex(rp) :: mycomplex
By setting the parameter rp (real precision) to kind(1.0) instead of kind(1.0d0) you can switch from double to single. Alternatively, with fortran 2003 compatible compilers you can also use the names real64 and real32 after invoking the iso_fortan_env module. (UPDATE: it needs a fortran 2008 compatible compiler, not fortran 2003, see the comment of IanH).

High Performance Fortran (HPF) without directives?

In High Performance Fortran (HPF), I could specify the distribution of arrays involved in a parallel calculation using the DISTRIBUTE directive. For example, the following minimal subroutine will sum two arrays in parallel:
subroutine mysum(x,y,z)
integer, intent(in) :: y(10000), z(10000)
integer, intent(out) :: x(10000),
!HPF$ DISTRIBUTE x(BLOCK), y(BLOCK), z(BLOCK)
x = y + z
end subroutine mysum
My question is, is the DISTRIBUTE directive necessary? I know in practise this is of little interest, but I'm curious as to whether an unadorned, directive-free, Fortran program could also be a valid HPF program?

I do not believe DISTRIBUTE statement is necessary, and I never used it.
You can achieve this implicitly by using FORALL statements instead of DO loops where applicable. Originally, DO loops would give explicit order of operation on array elements, whereas FORALL would allow the processor to determine an optimal order at runtime. I do not think this makes much difference nowadays, because modern compilers are able to optimize/vectorize/parallelize DO loops where possible. I cannot tell for sure for other compilers, but I remember using Intel Fortran Compiler to compile and run a program on 2 and 4 processors in parallel without using DISTRIBUTE.
However, depending on the processor architecture and compiler, it is best to try out what you have and see what gives you optimal results or efficiency.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

detection of the ILP mode of BLAS - fortran

Related

What is the need for a C_INT fortran type? Are C and Fortran integers so different?

How can a Fortran-OpenACC routine call another Fortran-OpenACC routine?

Catch integer exceptions in Fortran

How to automatically convert a Fortran90 program whose variables are in double precision into real?

High Performance Fortran (HPF) without directives?

Categories

Resources