I have a code that I run on several different clusters, which all have different combinations of MPI & LAPACK.
This can cause problems. For example I currently use ifort's "-i8" option, which works fine with LAPACK, but now all MPI calls are broken, because it expects integer(4), rather than integer(8).
Is there an elegant & flexible way to adapt the integer type based on the local MPI & LAPACK installation?
Hard coding the types for every specific call seems is just very cumbersome and inflexible.
MPI calls do not expect INTEGER(4) nor INTEGER(8), they expect just INTEGER. And, as always, remember what those (4) and (8) actually mean Fortran: integer*4 vs integer(4) vs integer(kind=4)
With -i8 you are changing what INTEGER means, to which kind it corresponds. You can do that, but you have to compile the MPI library with the same settings. The library may or may not be prepared to be compiled that way, but theoretically it should be possible.
You could also try passing integer(int32) instead of integer to MPI. If it is the correct kind which correspond to the default kind of the MPI library, the TKR checks and all other checks should pass OK. But it is not recommended.
To stay strictly within the Fortran standard, when you promote the default integer kind, you should also promote the default real and logical kind.
To stay portable use integers that correspond to the API of the library you use and make sure the library is meant to be used with that particular compiler and with that particular compiler configuration.
Usually, for portability one should not rely on promoting the default kinds but one should use specific kinds which suit the given purpose in the specific part of the code.
Related
I am working on a numerical solver written in Fortran which uses MPI for parallelization on large clusters (up to about 500 processes). Currently we are including mpi via
#include "mpif.h"
which, from my understanding, is deprecated and strongly discouraged. In an effort to modernize and clean up our mpi communications, we would like to switch to using the more modern mpi_f08 module. The issue we are facing is that we need to maintain the possibility of compiling a version based on the old mpi header in order to not break the coupling with another solver. I'd much appreciate some advice on how to elegantly maintain this compatibility.
Question #1: What would be an elegant way to either include the header or use the module depending on a preprocessor flag without having #ifdef statements scattered throughout the code?
My thought so far would to define a module
module mpi_module
#ifdef MPI_LEGACY
#include "mpif.h"
#else
use mpi_f08
#endif
end module
and use this module everywhere where the mpi header file is currently included. Is this a viable approach or would this have any unwanted effects which I'm currently overlooking?
Question #2: What would be an elegant way to switch between integers and the new derived types from mpi_f08 depending on the preprocessor flag? (Again, without scattering #ifdef statements throughout the code)
My initial thought on this would be to use something like
#ifdef MPI_LEGACY
#define _mpiOp_ integer
#else
#define _mpiOp_ type(MPI_Op)
#endif
so that I can simply replace
integer :: OP
by
_mpiOp_ :: OP
to obtain compatibility with both ways of including MPI. I'm also not quite happy with this solution yet, since, in my understanding, you can not put these kinds of preprocessor definitions into a module. Thus, you'd end up with a module plus a header file which you necessarily have to remember to include together each time. Again, I'm grateful for any potential flaws with this approach and any alternatives that you can point out.
Sorry for the long post, but I wanted to make my thoughts as clear as possible. I'm looking forward to your input!
The old and the new way are way too different. Not only you have a use statement instead of an include statement and a derived instead of an integer for an Op. Many routines will have different signatures and use different types.
So I am afraid the answer is that there is no elegant way. You are making a conglomerate of two things that are way too different to be elegantly combined.
As has been mentioned in the comments, the first step to get more modern is to do use mpi instead of include "mpif.h". This already enables the compiler to catch many kinds of bugs when the routines are called incorrectly. Tje extent, to which these checks will be possible, will depend on the details of the MPI library configuration. Namely, the extent of generic interfaces generated instead of just external statements.
If you have to combine your code with another code that uses the old way, it makes good sense to first do use mpi, see how it goes, and think whether it makes sense to go further.
Do Fortran kind parameter for the same precision change depending on the processor even with the same compiler? I have already read the post here.
The thing I struggle is if we are using the same compiler, say gfortran, why would there be a different set of kind parameter for the same precision? I mean, the compiler's specification is the same, so should't compiler always give us the same precision for a particular kind parameter no matter what operating system or processor I am using?
EDIT: I read some where that for integers, different CPUs support different integral data types, which means some processors might not directly support certain precision of an integer. I also read that programming language like Fortran opt for optimization so the language is implemented in a way that avoid strange precision that are not directly supported by the hardware. Does this has to do with my concern?
You are asking "do they change". The answer is "they may".
The meaning of a certain kind value for a certain type is Fortran processor (the language concept - which is not the same thing as a a microprocessor) dependent.
The concept of a Fortran processor covers the entire system that is responsible for processing and executing Fortran source - the hardware, operating system, compiler, libraries, perhaps even the human operator - all of it. Change any part of that system, and you can have a different Fortran processor.
Consequently there is no requirement that the interpretation of a particular kind value for a particular type be the same for the same compiler given variations in compiler options or hardware in use.
If you want your code to be portable, then don't make the code depend on particular kind values.
In older fortran code, when .or. is used with two integer types, is the result a bit-wise or of the operands or 0/1?
I'm updating legacy code, and believe I should be replacing these instances of .or. with IOR, but am uncertain if that was the expected result in older code. Should I instead be setting the result to either 0 or 1?
I believe what you are seeing is indeed a custom extension. I haven't seen this one in use before, but I did find a reference on the web about such things actually existing in the wild:
When Fortran programs communicate directly with digital hardware it may be necessary to carry out bit-wise logical operations on bit-patterns. Standard Fortran does not provide any direct way of doing this, since logical variables essentially only store one bit of information and integer variables can only be used for arithmetic. Many systems provide, as an extension, intrinsic functions to perform bit-wise operations on integers. The function names vary: typically they are IAND, IOR, ISHIFT. A few systems provide allow the normal logical operators such as .AND. and .OR. to be used with integer arguments: this is a much more radical extension and much less satisfactory, not only because it reduces portability, but also reduces the ability of the compiler to detect errors in normal arithmetic expressions.
Reference
Compilers with DEC/VMS links or heritage support the extension of allowing integer arguments to .OR. (and other logical operators). That group of compilers define the .OR. operation on integers as being bit wise.
A currently supported compiler with that heritage is Intel Fortran (via Compaq Fortran, via Digital Fortran, etc).
I'm working on a small Fortran library (novel code) which is being called from several C/C++ applications. The library is of such kind when almost every subroutine could be separately called from application. So I need to provide C interface for those subroutines.
I can use modules, which are very comfortable by itself. But then I need either to decode manually module name mangling (which isn't very hard for gfortran, but looks bad) or use bind(C,name="some_name") clause. The last one leads to compiler warnings like subroutine parameter wasn't explicitly made interoperable (so compiler wants me to replace double precision with real(kind=C_DOUBLE), for example). And I should in this case to replace almost every variable in a library with such ugly declarations, which results to bad-reading code.
I can use subroutines, when every file in a library consists of several subroutines (this is the way that I do now). Explicitly interfaces are fed between them with interface ... include "otherfile_h.f90" ... end interface which isn't very comfortable. Name mangling is rather simple in this case, and library subroutines could be easily directly called from C.
The approach that I use (bullet #2) requires more typing and it is error prone because of duplicating definitions in source/header files. Is there a better way to keep sources clear and readable with smart C interface?
The modern way to mix Fortran and C is to use Fortran's ISO C Binding. This will make your code portable since the ISO C Binding is part of the language standard. Manually figuring out the name mangling is compiler specific and might not work on another compiler. "Double precision" is not considered a best practice declaration for modern Fortran (see, e.g., Extended double precision and http://fortranwiki.org/fortran/show/Modernizing+Old+Fortran). The modern way is to use "real (kind=XYZ)". The concept of the language is that typically the programmer uses SELECTED_REAL_KIND intrinsic function to define an constant (e.g., MyDouble) for the precision that they need. If the precision that you need is C_DOUBLE, then it is very appropriate Fortran to use that kind value. This is not an ugly declaration. (I don't understand your bullet #2.)
I have to work on a fortran program, which used to be compiled using Microsoft Compaq Visual Fortran 6.6. I would prefer to work with gfortran but I have met lots of problems.
The main problem is that the generated binaries have different behaviours. My program takes an input file and then has to generate an output file. But sometimes, when using the binary compiled by gfortran, it crashes before its end, or gives different numerical results.
This a program written by researchers which uses a lot of float numbers.
So my question is: what are the differences between these two compilers which could lead to this kind of problem?
edit:
My program computes the values of some parameters and there are numerous iterations. At the beginning, everything goes well. After several iterations, some NaN values appear (only when compiled by gfortran).
edit:
Think you everybody for your answers.
So I used the intel compiler which helped me by giving some useful error messages.
The origin of my problems is that some variables are not initialized properly. It looks like when compiling with compaq visual fortran these variables take automatically 0 as a value, whereas with gfortran (and intel) it takes random values, which explain some numerical differences which add up at the following iterations.
So now the solution is a better understanding of the program to correct these missing initializations.
There can be several reasons for such behaviour.
What I would do is:
Switch off any optimization
Switch on all debug options. If you have access to e.g. intel compiler, use ifort -CB -CU -debug -traceback. If you have to stick to gfortran, use valgrind, its output is somewhat less human-readable, but it's often better than nothing.
Make sure there are no implicit typed variables, use implicit none in all the modules and all the code blocks.
Use consistent float types. I personally always use real*8 as the only float type in my codes. If you are using external libraries, you might need to change call signatures for some routines (e.g., BLAS has different routine names for single and double precision variables).
If you are lucky, it's just some variable doesn't get initialized properly, and you'll catch it by one of these techniques. Otherwise, as M.S.B. was suggesting, a deeper understanding of what the program really does is necessary. And, yes, it might be needed to just check the algorithm manually starting from the point where you say 'some NaNs values appear'.
Different compilers can emit different instructions for the same source code. If a numerical calculation is on the boundary of working, one set of instructions might work, and another not. Most compilers have options to use more conservative floating point arithmetic, versus optimizations for speed -- I suggest checking the compiler options that you are using for the available options. More fundamentally this problem -- particularly that the compilers agree for several iterations but then diverge -- may be a sign that the numerical approach of the program is borderline. A simplistic solution is to increase the precision of the calculations, e.g., from single to double. Perhaps also tweak parameters, such as a step size or similar parameter. Better would be to gain a deeper understanding of the algorithm and possibly make a more fundamental change.
I don't know about the crash but some differences in the results of numerical code in an Intel machine can be due to one compiler using 80-doubles and the other 64-bit doubles, even if not for variables but perhaps for temporary values. Moreover, floating-point computation is sensitive to the order elementary operations are performed. Different compilers may generate different sequence of operations.
Differences in different type implementations, differences in various non-Standard vendor extensions, could be a lot of things.
Here are just some of the language features that differ (look at gfortran and intel). Programs written to fortran standard work on every compiler the same, but a lot of people don't know what are the standard language features, and what are the language extensions, and so use them ... when compiled with a different compiler troubles arise.
If you post the code somewhere I could take a quick look at it; otherwise, like this, 'tis hard to say for certain.