Impact performance of the TARGET attribute (not the POINTER attribute) - fortran

The question of performance impact due to the presence of the TARGET attribute was asked many times, but the answers lack of concrete examples.
The question Why does a Fortran POINTER require a TARGET? has a nice anwser:
An item that might be pointed to could be aliased to another item, and the compiler must allow for this. Items without the target attribute should not be aliased and the compiler can make assumptions based on this and therefore produce more efficient code.
I realize that the optimizations depend on the compiler and the processor instruction set, but what actually are these optimizations?
Consider the following code:
subroutine complexSubroutine(tab)
double precision, dimension(:) :: tab
!...
! very mysterious complex instructions performed on tab
!...
end subroutine
What are the optimization that the compiler could perform for this code
double precision, dimension(very large number) :: A
call complexSubroutine(A)
and not for this code?
double precision, dimension(very large number), target :: A
call complexSubroutine(A)

Note that the dummy argument tab in complexSubroutine in the question does not have the TARGET attribute. Inside the subroutine the compiler can assume that tab is not aliased, and the below discussion is moot. The issue applies to the scope calling the subroutine.
If the compiler knows that there is no possibility of aliasing, then the compiler knows that there is no possibility of the storage for two objects overlapping. The potential for overlap has direct implications for some operations. As a simple example, consider:
INTEGER, xxx :: forward(3)
INTEGER, xxx :: reverse(3)
forward = [1,2,3]
reverse(1:3:1) = forward(3:1:-1)
The Fortran standard defines the semantics of assignment such that reverse should end up with the value [3,2,1].
If xxx does not include POINTER or TARGET, then the compiler knows that forward and reverse do not overlap. It can execute the array assignment in a straight forward manner, perhaps by working backwards through the elements on the right hand side and forwards through the elements on the left hand, as suggested by the subscript triplets, and doing the element by element assignment "directly".
However, if forward and reverse are TARGETs, then they may well overlap. The straight forward manner suggested above may fail to produce the result required by the standard. If the two names are associated with exactly the same underlying sequence of data objects, then the transfer of reverse(3) to forward(1) will change the value that reverse(1) later references. With the naive approach above that fails to accommodate aliasing, reverse would end up with the value [3,2,3].
To deliver the result required by the standard, compilers may create a temporary object to hold the result of evaluating the right hand side of the assignment, effectively:
INTEGER, TARGET :: forward(3)
INTEGER, TARGET :: reverse(3)
INTEGER :: temp(3)
temp = forward(3:1:-1)
reverse(1:3:1) = temp
The presence and additional operations associated with the temporary may result in a performance impact.
The potential for aliasing to break the otherwise straight forward and simple approach to an operation is a general problem for many situations. In the absence of compiler and runtime smarts to determine that aliasing isn't a problem in a particular situation, the creation and use of temporaries is a general solution, with the general potential for a performance impact.
The potential for aliasing not immediately apparent to the compiler may also prevent the compiler assuming that the value of an object won't change when it doesn't see any explicit reference to the object that would imply a change.
INTEGER, TARGET :: x
...much later...
x = 4
CALL abc
IF (x == 4) THEN
...
The compiler, not knowing anything about procedure abc, cannot, in general, assume that x is not modified inside procedure abc - perhaps a pointer to x is available to the procedure in some way and the procedure has used that pointer to indirectly modify x. If x did not have the TARGET attribute, then the compiler knows that abc could not legitimately change its value. This has implications for the compiler's ability to analyse possible code paths at compile time, and to resequence operations/move operations out of loops, etc.

Related

Fortran: intent(out) and assumed-size arguments

Suppose an dummy argument is modified in a subroutine, and the subroutine doesn't care about its initial value. For a scalar of the standard numerical types (no components), intent(out) would be correct, right?
Now, if the dummy argument is allocatable and the subroutine depends on its allocation status, or if it has components that are not modified by the subroutine but should maintain their initial values (i.e. not become undefined), or, similarly, if it's an array and not all elements are set in the subroutine... Then the proper intent would be intent(inout).
My question is what happens if the dummy argument is an assumed-size array (dimension(*))? Assuming that not all elements are modified in the subroutine and one wants the other elements to retain their initial values, would intent(out) be appropriate? My reasoning is that the subroutine doesn't "know" the size of the array, so it cannot make the elements undefined as it can with a fixed-size array.
Does the standard say anything about this or are compilers free to "guess" for example that if I set A(23) = 42, then all elements A(1:22) can be made undefined (or NaN, or garbage...)?
Note: When I say "retain their initial values", I mean the values in the actual argument outside the subroutine. The subroutine itself doesn't care about the values of these elements, it never reads them or writes them.
Another question looks at what "becomes undefined" means in terms of the dummy argument. However, exactly the same aspects apply to the actual argument with which the dummy argument is associated: it becomes undefined.
The Fortran standard itself gives a note on this aspect "retaining" untouched values (Fortran 2018, 8.5.10, Note 4):
INTENT (OUT) means that the value of the argument after invoking the procedure is entirely the result of executing that procedure. If an argument might not be redefined and it is desired to have the argument retain its value in that case, INTENT (OUT) cannot be used because it would cause the argument to become undefined
That note goes on further to consider whether intent(inout) would be appropriate for when the procedure doesn't care about values:
however, INTENT (INOUT) can be used, even if there is no explicit reference to the value of the dummy argument.
That the array is assumed-size plays no part in the undefinition of the actual and dummy arguments. Again, to stress from my answer to the other question: "undefined" doesn't mean the values are changed. As the compiler has no required action to perform to undefine values, it doesn't need to work out which values to undefine: with an assumed-size dummy argument the procedure doesn't know how large it is, but this doesn't excuse the compiler from "undefining" it, because there's nothing to excuse.
You may well see that the "undefined" parts of the assumed-size array remain exactly the same, but intent(inout) (or no specified intent) remains the correct choice for a compliant Fortran program. With copy-in/copy-out mechanics with intent(out), for example, the compiler wouldn't be obliged to ensure the copy is defined and that junk isn't copied back.
Finally, yes a compiler (perhaps in the hope of being a good debugging compiler) may change values which have become undefined. If the procedure references the n-th element of an array, it's allowed to assume that there are n-1 elements before it and set them as it chooses, for an intent(out) dummy, but not an intent(inout) dummy. (If you access A(n)then the compiler is allowed to assume that that element, and all from the declared lower bound, exist.)

Is the storage of COMPLEX in fortran guaranteed to be two REALs?

Many FFT algorithms take advantage of complex numbers stored with alternating real and imaginary part in the array. By creating a COMPLEX array and passing it to a FFT routine, is it guaranteed that it can be cast to a REAL array (of twice the size) with alternating real and imaginary components?
subroutine fft (data, n, isign)
dimension data(2*n)
do 1 i=1,2*n,2
data(i) = ..
data(i+1) = ..
1 continue
return
end
...
complex s(n)
call fft (s, n, 1)
...
(and, btw, is dimension data(2*n) the same as saying it is a REAL?)
I'm only writing this answer because experience has taught me that as soon as I do write this sort of answer one of the real Fortran experts hereabouts piles in to correct me.
I don't think that the current standard, nor any of its predecessors, states explicitly that a complex is to be implemented as two neighbouring-in-memory reals. However, I think that this implementation is a necessary consequence of the standard's definitions of equivalence and of common. I don't think I have ever encountered an implementation in which a complex was not implemented as a pair of reals.
The standard does guarantee, though that a complex can be converted into a pair of reals. So, given some definitions:
complex :: z
complex, dimension(4) :: zarr
real :: r1, r2
real, dimension(8) :: rarr
the following will do what you might expect
r1 = real(z)
r2 = aimag(z)
Both those functions are elemental and here comes a wrinkle:
real(zarr)
returns a 4-element array of reals, as does
aimag(zarr)
while
[real(zarr), aimag(zarr)]
is an 8-element array of reals with the real parts of zarr followed by the complex parts. Perhaps
rarr(1:8:2) = real(zarr)
rarr(2:8:2) = aimag(zarr)
will be OK for you. I'm not sure of any neater way to do this though.
Alexander's not the only one able to quote the standard ! The part he quotes left me wondering about non-default complex scalars. So I read on, and I think that para 6 of the section he points us towards is germane
a nonpointer scalar object of any type not specified in items (1)-(5)
occupies a single unspecified storage unit that is different for each
case and each set of type parameter values, and that is different from
the unspecified storage units of item (4),
I don't think that this has any impact at all on any of the answers here.
To append to Mark's answer this is indeed stated in the Standard: Clause 16.5.3.2 "Storage sequence":
2 In a storage association context
[...]
(2) a nonpointer scalar object that is double precision real or
default complex occupies two contiguous numeric storage units,
emphasis mine.
As for storage association context: Clause 16.5.3.1 "General" reads
1 Storage sequences are used to describe relationships that exist
among variables, common blocks, and result variables. Storage
association is the association of two or more data objects that occurs
when two or more storage sequences share or are aligned with one or
more storage units.
So this occurs for common blocks, explicit equivalence, and result variables. As Mark's said, there is no explicit statement for the general case.
My guess is, that it is most convenient to follow this always to ensure compatibility.
Thanks to IanH to pointing this out!
No, as far as I know.
You can have storage association between a real single precision array and a complex single precision array by EQUIVALENCE, COMMON, and ENTRY statement.
But in general you cannot pass a complex array to a subroutine that expects a real array.

Undefined components of a derived type in Fortran

This is more a software design question than a technical problem.
I planned to use a derived type to define atomic configurations. These might have a number of properties, but not every type of configuration would have them all defined.
Here's my definition at the moment:
type config
double precision :: potential_energy
double precision :: free_energy
double precision, dimension(:), allocatable :: coords
integer :: config_type
end type config
So, in this example, we may or may not define the free energy (because it's expensive to calculate). Is it safe to still use this derived type, which would be a superset of sensible cases I can think of? Should I set some sort of default values, which mean undefined (like Python None or a null pointer)?
I know about the extends F2003 feature, but I can't guarantee that all of our compilers will be F2003-compliant. Is there another better way to do this?
Formally, operations that require the value of the object as a whole when one of its components are undefined, are prohibited (see 16.6.1p4 of the F2008 standard).
Practically you may not run into issues, but it is certainly conceivable that a compiler with appropriate debugging support might flag the undefined nature of the component when operations that require the entire value of the derived type object are carried out.
High Performance Mark's suggestion is a workaround for that, because a derived type scalar still has an "overall" value even if one of its allocatable components is not allocated (see 4.5.8). This suggestion might be useful in cases where the component is heavy in terms of memory use or similar.
But a single double precision component isn't particularly heavy - depending on the platform the descriptor for an allocatable scalar component may be of similar size. An even simpler workaround is to just give the component an arbitrary value. You could even use default initialization to do just that.
(Presumably you have some independent way of indicating whether that component contains a useful value.)
These days Fortran comprehends the concept of allocatable scalars. Like this:
type config
double precision :: potential_energy
double precision, allocatable :: free_energy
double precision, dimension(:), allocatable :: coords
integer :: config_type
end type config
If, however, you can't rely on having a Fortran 2003 compiler available this won't work. But such compilers (or rather versions) are becoming very scarce indeed.
But do go the whole hog, and drop double precision in favour of real(real64) or some other 21st century way of specifying the kind of real numbers. Using the predefined, and standard, constant real64 requires the inclusion of use iso_fortran_env in the scoping unit.

why polymorphism is so costly in haskell(GHC)?

I am asking this question with refernce to this SO question.
Accepted answer by Don stewart : First line says "Your code is highly polymorphic change all float vars to Double .." and it gives 4X performance improvement.
I am interested in doing matrix computations in Haskell, should I make it a habit of writing highly monomorphic code?
But some languages make good use of ad-hoc polymorphism to generate fast code, why GHC won't or can't? (read C++ or D)
why can't we have something like blitz++ or eigen for Haskell? I don't understand how typeclasses and (ad-hoc)polymorphism in GHC work.
With polymorphic code, there is usually a tradeoff between code size and code speed. Either you produce a separate version of the same code for each type that it will operate on, which results in larger code, or you produce a single version that can operate on multiple types, which will be slower.
C++ implementations of templates choose in favor of increasing code speed at the cost of increasing code size. By default, GHC takes the opposite tradeoff. However, it is possible to get GHC to produce separate versions for different types using the SPECIALIZE and INLINABLE pragmas. This will result in polymorphic code that has speed similar to monomorphic code.
I want to supplement Dirk's answer by saying that INLINABLE is usually recommended over SPECIALIZE. An INLINABLE annotation on a function guarantees that the module exports the original source code of the function so that it can be specialized at the point of usage. This usually removes the need to provide separate SPECIALIZE pragmas for every single use case.
Unlike INLINE, INLINABLE does not change GHC's optimization heuristics. It just says "Please export the source code".
I don't understand how typeclasses work in GHC.
OK, consider this function:
linear :: Num x => x -> x -> x -> x
linear a b x = a*x + b
This takes three numbers as input, and returns a number as output. This function accepts any number type; it is polymorphic. How does GHC implement that? Well, essentially the compiler creates a "class dictionary" which holds all the class methods inside it (in this case, +, -, *, etc.) This dictionary becomes an extra, hidden argument to the function. Something like this:
data NumDict x =
NumDict
{
method_add :: x -> x -> x,
method_subtract :: x -> x -> x,
method_multiply :: x -> x -> x,
...
}
linear :: NumDict x -> x -> x -> x -> x
linear dict a b x = a `method_multiply dict` x `method_add dict` b
Whenever you call the function, the compiler automatically inserts the correct dictionary - unless the calling function is also polymorphic, in which case it will have received a dictionary itself, so just pass that along.
In truth, functions that lack polymorphism are typically faster not so much because of a lack of function look-ups, but because knowing the types allows additional optimisations to be done. For example, our polymorphic linear function will work on numbers, vectors, matricies, ratios, complex numbers, anything. Now, if the compiler knows that we want to use it on, say, Double, now all the operations become single machine-code instructions, all the operands can be passed in processor registers, and so on. All of which results in fantastically efficient code. Even if it's complex numbers with Double components, we can make it nice and efficient. If we have no idea what type we'll get, we can't do any of those optimisations... That's where most of the speed difference typically comes from.
For a tiny function like linear, it's highly likely it will be inlined every time it's called, resulting in no polymorphism overhead and a small amount of code duplication - rather like a C++ template. For a larger, more complex polymorphic function, there may be some cost. In general, the compiler decides this, not you - unless you want to start sprinkling pragmas around the place. ;-) Or, if you don't actually use any polymorphism, you can just give everything monomorphic type signatures...

Convert FORTRAN DEC UNION/MAP extensions to anything else

Edit: Gfortran 6 now supports these extensions :)
I have some old f77 code that extensively uses UNIONs and MAPs. I need to compile this using gfortran, which does not support these extensions. I have figured out how to convert all non-supported extensions except for these and I am at a loss. I have had several thoughts on possible approaches, but haven't been able to successfully implement anything. I need for the existing UDTs to be accessed in the same way that they currently are; I can reimplement the UDTs but their interfaces must not change.
Example of what I have:
TYPE TEST
UNION
MAP
INTEGER*4 test1
INTEGER*4 test2
END MAP
MAP
INTEGER*8 test3
END MAP
END UNION
END TYPE
Access to the elements has to be available in the following manners: TEST%test1, TEST%test2, TEST%test3
My thoughts thusfar:
Replace somehow with fortran EQUIVALENCE.
Define the structs in C/C++ and somehow make them visible to the FORTRAN code (doubt that this is possible)
I imagine that there must have been lots of refactoring of f77 to f90/95 when the UNION and MAP were excluded from the standard. How if at all was/is this handled?
EDIT: The accepted answer has a workaround to allow memory overlap, but as far as preserving the API, it is not possible.
UNION and MAP were never part of any FORTRAN standard, they are vendor extensions. (See, e.g., http://fortranwiki.org/fortran/show/Modernizing+Old+Fortran). So they weren't really excluded from the Fortran 90/95 standard. They cause variables to overlap in memory. If the code actually uses this feature, then you will need to use equivalence. The preferred way to move data between variables of different types without conversion is the transfer intrinsic, but to you that you would have to identify every place where a conversion is necessary, while with equivalence it is taking place implicitly. Of course, that makes the code less understandable. If the memory overlays are just to save space and the equivalence of the variables is not used, then you could get rid of this "feature". If the code is like your example, with small integers, then I'd guess that the memory overlay is being used. If the overlays are large arrays, it might have been done to conserve memory. If these declarations were also creating new types, you could use user defined types, which are definitely part of Fortran >=90.
If the code is using memory equivalence of variables of different types, this might not be portable, e.g., the internal representation of integers and reals are probably different between the machine on which this code originally ran and the current machine. Or perhaps the variables are just being used to store bits. There is a lot to figure out.
P.S. In response to the question in the comment, here is a code sample. But .... to be clear ... I do not think that using equivalence is good coding pratice. With the compiler options that I normally use with gfortran to debug code, gfortran rejects this code. With looser options, gfortran will compile it. So will ifort.
module my_types
use ISO_FORTRAN_ENV
type test_p1_type
sequence
integer (int32) :: int1
integer (int32) :: int2
end type test_p1_type
type test_p2_type
sequence
integer (int64) :: int3
end type test_p2_type
end module my_types
program test
use my_types
type (test_p1_type) :: test_p1
type (test_p2_type) :: test_p2
equivalence (test_p1, test_p2)
test_p1 % int1 = 2
test_p1 % int1 = 4
write (*, *) test_p1 % int1, test_p1 % int2, test_p2 % int3
end program test
The question is whether the union was used to save space or to have alternative representations of the same data. If you are porting, see how it is used. Maybe, because the space was limited, it was written in a way where the variables had to be shared. Nowadays with larger amounts of memory, maybe this is not necessary and the union may not be required. In which case, it is just two separate types
For those just wanting to compile the code with these extensions: Gfortran now supports UNION, MAP and STRUCTURE in version 6. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56226