Reinterpret a Fortran array as bytes - fortran

I would like to interpret a Fortran (real*8, say) array as an array of bytes, so that it can be sent to a function to process things on the byte level. What's a simple (preferably no-copy) way to accomplish this?

First, it is not clear what is a function working on byte level. Does it use Fortran characters? Or 1-byte integers? They are different beasts in Fortran.
You could try to lie about the signature of your function and just pass the array as it is. Likely to work, not strictly standard conforming.
Transfer() is the best modern tool for similar purposes, but it may indeed involve temporaries.
If the size of the array is fixed (it is not allocatable or pointer or dummy argument) you could use equivalence which is quite similar to union in C.
But you must be careful about what is allowed, this is a notoriously dodgy area. Even the C union rules differ from the C++ rules. Fortran equivalence has its own rules and more strict, I am afraid. Type punning is not allowed, but a lot of code in the wild does it.
Doing tricks with C pointers and pointing to the same array from different pointers with different types is definitely not standard conforming and may give you expected results in some cases and wrong results in others (undefined behaviour as they call it in C and C++).

A "NO_COPY" way...but relies on DEC extensions:
USE ISO_C_BINDING
IMPLICIT NONE
UNION
MAP
REAL(KIND=C_DOUBLE) , DIMENSION(N) :: R8_Data
END MAP
MAP
BTYE , DIMENSION(N*8) :: B_Data
END MAP
MAP
CHARACTER(LEN=1) , DIMENSION(N*8) :: C_Data
END MAP
MAP
INTEGER(KIND=C_Int16_T), DIMENSION(N*4) :: I2_Data
END MAP
MAP
INTEGER(KIND=C_Int32_T), DIMENSION(N*2) :: I4_Data
END MAP
END UNION
#Valdimir equivalence also works if one does not have access to the DEC extensions.
There is a slated upgrade to gfortran to add in the MAP and UNION DEC extensions, so in time it will be there too.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56226
My appreciation of the difference is back on track...
One can use UNION/MAP inside of a structure. Outside a structure/TYPE then EQUIVALENCE does all one needs.
So As Vladimir mentioned this also is a "no copy"...
REAL(KIND=C_DOUBLE) , DIMENSION(N) :: R8_Data
BTYE , DIMENSION(N*8) :: B_Data
CHARACTER(LEN=1) , DIMENSION(N*8) :: C_Data
INTEGER(KIND=C_Int16_T), DIMENSION(N*4) :: I2_Data
INTEGER(KIND=C_Int32_T), DIMENSION(N*2) :: I4_Data
EQUIVALENCE(R8_Data, I4_Data)
It is almost more dangerous than it is worth, unless one has a specific problem.

Related

What shall be used in Modern Fortran to specify a 8 bytes real variable if iso_fortran_env is not supported?

I want to specify as type of a subroutine a floating point value (real) of 8 bytes precision.
I have read here that the modern way to do it would be:
real(real64), intent(out) :: price_open(length)
However, iso_fortran_env is not supported by f2py (same as it does not support iso_c_bindings either).
I get errors of this type:
94 | real(kind=real64) price_open(length)
| 1
Error: Parameter 'real64' at (1) has not been declared or is a variable, which does not reduce to a constant expression
The link referenced before states that using kind would be the proper way if iso_fortran_env is not available and that real*8 shall be avoided.
I have been using real(8) is that equivalent to using kinds? If not, what shall I use?
What is wrong with real*8 if I want to always enforce 8 bytes floating point values?
You say you are specifically interested in interoperability with C. However, iso_c_binding, nor iso_fortran_env are supported. These modules have constants that help you to set the right kind constant for a given purpose. The one from iso_fortran_env would NOT be the right one to choose anyway, it would be c_double.
If these constants meant to help you in your choice are not available, you are on your own. Now you can choose any other method freely.
It is completely legitimate to use just kind(1.d0) and just check that the connection to C works. Automake had been doing that for ages.
Or use selected_real_kind(), it does not matter, it is really up to you. Just check that double in C ended up being the same numeric type.
The traditional thing in automatic build processes was to do many tests about which (mainly C) constant ended up having which value. You just need to check that double precision in Fortran does indeed correspond to double in C. It is very likely, but just check it. You can have a macro that changes the choice if not, but probably it is a needless work untill you actually meet such a system.

What is the need for a C_INT fortran type? Are C and Fortran integers so different?

If I declare an integer in fortran as:
INTEGER(C_INT) :: i then, if I understand correctly, it's safe to be passed to a C function. Now disregarding the added headache of always declaring integers this way, is there any reasons not to always declare your variables as C-interoperable? Is there any downside to doing so?
Also, in the case of something as simple a an integer, what exactly does the C_INT change from a traditional Fortran integer? Are Fortran and C integers actually different?
The C integer size is usually fixed because the OS is compiled using C and the system bindings are published as C headers. The operating system designers choose one of the common models like LLP64, LP64, ILP64 (for 64-bit pointer OS's) and the C compilers then follow this choice.
But Fortran compilers are more free. You can set them to a different configuration. Fortran compiler set to use 8-byte default integers and 8-byte default reals are still perfectly standard conforming! (C would be as well but the choice is fixed in the operating system.)
And because the integers in C and in Fortran do not have to match you need a mechanism to portably select the C interoperable kind, whatever the default kind is.
This is not just academic. you can find libraries like MKL compiled with 8-byte integers (the ILP64 model, which is not used by common operating systems for C).
So when you call some API, some function whose interface is defined in C, you want to call it properly, not depending on the settings of the Fortran compiler. That is the use case for the C-interoperable types. If a C function requires an int, you give it an integer(c_int) and you do not care if the Fortran default integer is the same or not.
Basically there is no fundamental downside and on the same compiler there is no fundamental benefit.
Even in icc a long int is different on 32 and 64 bit OS, so you need to know (or define what you want on the c-side of things).
The only one that is clearly 4-bytes is the C_INT32_T, and hence it is that one that I generally use. And it more of a "future proofing" endeavour to use C_INT32_T , to define it as you want it to be in a fixed numbers of bits sizing.
These all give a 4 byte integer on iFort.
USE ISO_C_BINDING, ONLY : C_INT, C_INT32_T
Integer ::
INTEGER(KIND=4) :: !"B" At least on iFort
INTEGER*4 !"A"
INTEGER(KIND=C_INT) ::
INTEGER(KIND=C_INT32_T) :: !"C"
Usually one finds older code with the style of "A".
I routinely use the style of "B", but even though this is a "defacto standard" it is not conforming to "the standard".
Then when I become concerned with portability I run-through and change the "B" style to "C" style and then I have less heartburn considering others who may be later compiling with the gfortran compiler... Or even some other compiler.
The single byte:
BYTE :: !Good
INTEGER(KIND=1) ::
INTEGER(KIND=C_SIGNED_CHAR) ::
INTEGER(KIND=C_INT8_T) :: !Best
The two byte:
INTEGER(KIND=2) ::
INTEGER(KIND=C_INT16_T) :: !Best
INTEGER(KIND=C_SHORT) ::
The 8 byte:
INTEGER(KIND=8) ::
INTEGER(KIND=C_INT64_T) :: !Best
INTEGER(KIND=C_LONG_LONG) ::
While this looks somewhat fishy... One can also say:
LOGICAL(KIND=4) ::
LOGICAL(KIND=C_INT32_T) :: !Best here may be C_bool and using 1-byte logical in fortran...? Assuming that there is no benefit with the size of the logical being the same as some float vector
LOGICAL(KIND=C_FLOAT) ::

Array of unknown rank as subroutine argument

I am designing a module which works with the hdf5 Fortran library. This module contains subroutines to read and write arrays of different types and shapes to/from a file.
e.g. I wish to be able to call writeToHDF5(filepath, array) regardless of what the shape and type of array is. I realise that interfaces have to be used to achieve this with different types. I am however wondering if it is possible to have an assumed shape of the array.
e.g.
if an array was defined such as
integer(kind=4), dimension(*),intent(in) :: array
and a two dimensional array was passed this would work. Is there any way to do this without creating separate subroutines for each shape of the array?
As Vladimir F says, Fortran 2015 adds "assumed-rank" - this is useful Fortran-Fortran (it was requested by MPI for the Fortran bindings), but when you receive such an array, you can't do much with it directly without additional complications. Several compilers support this already, but few (if any?) support the newly added SELECT RANK construct that make this a bit more useful.
You can, however, use C_LOC and C_F_POINTER to "cast" the assumed-rank dummy to a pointer to an array of whatever rank you like, so that's a possibility.
The standard (even back to Fortran 90) does give you an out here. If you write:call writeToHDF5(filepath, array(1,1)) (assuming array is rank 2 here), the explicit interface of the called procedure can specify any rank for the dummy argument through the magic of "sequence association". There are some restrictions, though - in particular the array is not allowed to be assumed-shape or POINTER.
I know that this comes out late, but for any future readers - the answer is actually yes.
This is a simple example of a procedure to read a single data-set of integers with any given shape. The inputs needed were read from the HDF5 using a single routine which does not require any special specification and are the same for integer, real, string and so on.
I have tested it on a "0-D" array (so size of (/1/) ), 1-D, 2-D, 3-D and 4-D arrays.
In all cases, the data was retrieved properly.
(Note: I removed some checks regarding the Errorflag, as they are not critical for the example)
subroutine ReadSingleDataset_int(FileName,DataName,DataType,Data_dims,Errorflag,InputArray)
implicit none
character(len=*), intent(in) :: FileName,DataName
integer(HID_T), intent(in) :: DataType
integer(hsize_t), dimension(:), intent(in) :: Data_dims
logical, intent(out) :: ErrorFlag
integer, dimension(*), intent(inout) :: InputArray
integer :: hdferr
integer(HID_T) :: file_id,dset_id
ErrorFlag=.FALSE.
IF (.NOT.HDF5_initialized) THEN
CALL h5open_f(hdferr)
HDF5_initialized=.TRUE.
ENDIF
call h5fopen_f(trim(FileName),H5F_ACC_RDONLY_F, file_id, hdferr)
call h5dopen_f(file_id,trim(DataName),dset_id,hdferr)
call h5dread_f(dset_id,DataType,InputArray,Data_dims,hdferr)
call h5dclose_f(dset_id,hdferr)
call h5fclose_f(file_id,hdferr)
end subroutine ReadSingleDataset_int
My main interest in this now is - would it be possible to replace the type/class specific (integer/real/etc') with a generic one on the lines of:
class(*), dimension(*), intent(inout) :: InputArray
As I have several routines like that (for int, real, string) which only vary in that specification of the input type/class. If that limitation could be alleviated as well, that will be even more elegant.
(Just to point out the issue - the h5dread_f does not accept the buffer argument, InputArray in my case, to be unlimited polymorphic)

Best declaration order in Fortran?

I have actually a nested question :
Does the order of variable declaration matter in Fortran?
If yes, what is the best order to declare variables?
For example, is this program :
PROGRAM order1
IMPLICIT NONE
DOUBLE PRECISION,DIMENSION(:,:),ALLOCATABLE:: array_double_2D
DOUBLE PRECISION,DIMENSION(:),ALLOCATABLE:: array_double_1D
INTEGER,DIMENSION(:),ALLOCATABLE:: array_int_1D
INTEGER :: int1,int2
LOGICAL :: boolean1,boolean2
... instructions ...
better than this one :
PROGRAM order2
IMPLICIT NONE
LOGICAL :: boolean1,boolean2
INTEGER :: int1,int2
INTEGER,DIMENSION(:),ALLOCATABLE:: array_int_1D
DOUBLE PRECISION,DIMENSION(:),ALLOCATABLE:: array_double_1D
DOUBLE PRECISION,DIMENSION(:,:),ALLOCATABLE:: array_double_2D
... instructions ...
?
(by "better", I mean efficient in memory management and faster)
Thanks for your answers!
No, the order does not matter, unless your declaration depends on a previously declared entity.
Obviously
integer, parameter :: arr(*) = [1,2,3]
integer :: arr2(size(arr))
must use this order, because you refer to another entity.
If they don't depend on each other it does not matter. It does not matter for efficiency in any way. For style everybody can have his own opinion what is the nicest order, no reason to discuss that here.
It could matter in a common block, because then you can force an array to start at an inconvenient address in memory and be more difficult to vectorize.
It does also matter in certain type declarations:
type t1
sequence
integer(int32) :: field1
integer(int16) :: field2
end type
will be laid out in memory differently than
type t2
sequence
integer(int16) :: field2
integer(int32) :: field1
end type
and that one differently than
type t3
integer(int16) :: field2
integer(int32) :: field1
end type
because without sequence the compiler is free to insert some padding and it will typically do so in t3.
Interoperable types
type, bind(C) :: t3
...
also enforce order of the components, but the compiler can include the padding for performance. It will use the C compiler's rules for padding.
I would do this....
PROGRAM order1
IMPLICIT NONE
!DIR$ ATTRIBUTES ALIGN: array_double_2D::64
DOUBLE PRECISION,DIMENSION(:,:),ALLOCATABLE:: array_double_2D
!DIR$ ATTRIBUTES ALIGN: array_double_1D::64
DOUBLE PRECISION,DIMENSION(:) ,ALLOCATABLE:: array_double_1D
!DIR$ ATTRIBUTES ALIGN: array_int_1D::64
INTEGER,DIMENSION(:) ,ALLOCATABLE:: array_int_1D
INTEGER :: int1,int2
LOGICAL :: boolean1,boolean2
... instructions ...
Then there is no doubt that the arrays are on 64 byte boundaries.
There are also compiler switch options. In fort it is '-align array64byte'.
This will only make a difference if/when you vectorise, which you should want to be doing... Hence you should align the arrays/vectors somehow.

Convert FORTRAN DEC UNION/MAP extensions to anything else

Edit: Gfortran 6 now supports these extensions :)
I have some old f77 code that extensively uses UNIONs and MAPs. I need to compile this using gfortran, which does not support these extensions. I have figured out how to convert all non-supported extensions except for these and I am at a loss. I have had several thoughts on possible approaches, but haven't been able to successfully implement anything. I need for the existing UDTs to be accessed in the same way that they currently are; I can reimplement the UDTs but their interfaces must not change.
Example of what I have:
TYPE TEST
UNION
MAP
INTEGER*4 test1
INTEGER*4 test2
END MAP
MAP
INTEGER*8 test3
END MAP
END UNION
END TYPE
Access to the elements has to be available in the following manners: TEST%test1, TEST%test2, TEST%test3
My thoughts thusfar:
Replace somehow with fortran EQUIVALENCE.
Define the structs in C/C++ and somehow make them visible to the FORTRAN code (doubt that this is possible)
I imagine that there must have been lots of refactoring of f77 to f90/95 when the UNION and MAP were excluded from the standard. How if at all was/is this handled?
EDIT: The accepted answer has a workaround to allow memory overlap, but as far as preserving the API, it is not possible.
UNION and MAP were never part of any FORTRAN standard, they are vendor extensions. (See, e.g., http://fortranwiki.org/fortran/show/Modernizing+Old+Fortran). So they weren't really excluded from the Fortran 90/95 standard. They cause variables to overlap in memory. If the code actually uses this feature, then you will need to use equivalence. The preferred way to move data between variables of different types without conversion is the transfer intrinsic, but to you that you would have to identify every place where a conversion is necessary, while with equivalence it is taking place implicitly. Of course, that makes the code less understandable. If the memory overlays are just to save space and the equivalence of the variables is not used, then you could get rid of this "feature". If the code is like your example, with small integers, then I'd guess that the memory overlay is being used. If the overlays are large arrays, it might have been done to conserve memory. If these declarations were also creating new types, you could use user defined types, which are definitely part of Fortran >=90.
If the code is using memory equivalence of variables of different types, this might not be portable, e.g., the internal representation of integers and reals are probably different between the machine on which this code originally ran and the current machine. Or perhaps the variables are just being used to store bits. There is a lot to figure out.
P.S. In response to the question in the comment, here is a code sample. But .... to be clear ... I do not think that using equivalence is good coding pratice. With the compiler options that I normally use with gfortran to debug code, gfortran rejects this code. With looser options, gfortran will compile it. So will ifort.
module my_types
use ISO_FORTRAN_ENV
type test_p1_type
sequence
integer (int32) :: int1
integer (int32) :: int2
end type test_p1_type
type test_p2_type
sequence
integer (int64) :: int3
end type test_p2_type
end module my_types
program test
use my_types
type (test_p1_type) :: test_p1
type (test_p2_type) :: test_p2
equivalence (test_p1, test_p2)
test_p1 % int1 = 2
test_p1 % int1 = 4
write (*, *) test_p1 % int1, test_p1 % int2, test_p2 % int3
end program test
The question is whether the union was used to save space or to have alternative representations of the same data. If you are porting, see how it is used. Maybe, because the space was limited, it was written in a way where the variables had to be shared. Nowadays with larger amounts of memory, maybe this is not necessary and the union may not be required. In which case, it is just two separate types
For those just wanting to compile the code with these extensions: Gfortran now supports UNION, MAP and STRUCTURE in version 6. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56226