Setting size() of a scalar - fortran

In some legacy code, I found the following line
size(k)=N
What (if anything) does this do? As far as I know, it does not make sense to set size(thing) to a value in Fortran.
Furthermore, k is implicitly defined as a scalar integer (i.e. there it is not declared in the source file), it is used as a loop counter. N is also integer, it is read from a file.
If I surround the statement with debug prints,
write(0,*) size(kkk), N, kkk
size(kkk)=N
write(0,*) size(kkk), N, kkk
I get output like the following:
-619273800 601 1
601 601 1
11007 595 2
595 595 2
-619273800 620 3
620 620 3
11007 595 4
595 595 4
0 617 5
617 617 5
0 618 6
618 618 6
0 612 7
612 612 7
So it seems like something is being set, but I do not know what. Also, I do not think that this is leading to any problems (in fact, I do not think it really does anything), I am just curious.

SIZE is the name of an intrinsic function which returns the size of an array. In that context, you are correct that having it on the left-hand size of an assignment expression makes no sense. And it's not allowed.
However, Fortran allows you to declare variables (and functions, etc.) with the same name as an intrinsic function. When this happens, reference to the name is to the variable and not the function. You can read about this in Fortran 2008's 16.3.1 which talks about local identifiers. In particular, it notes
An intrinsic procedure is inaccessible by its own name in a scoping unit that uses the same name as a local identifier of class (1) for a different entity.
Class (1) covers named variables.
As you say the array variable size comes from module use association, you can rename it to something else
use module_with_size, not_intrinsic_size => size
and refer to it by its new name from there on, leaving size to refer to the intrinsic function.
Having size(k)=N also makes sense for size a function when that is a statement function declaration.

Related

Can overindexing in FORTRAN 77 modify the program itself?

Here is a little program in FORTRAN 77
dimension totlev(20)
do 100 i=1,24
totlev(i)=0.0
write(0,*) 'totlev i=',i, totlev(i)
100 continue
end
I compile it using MinGW by typing gfortran test.f and I do get a warning (not an error):
test.f:4:14:
do 100 i=1,25
2
totlev(i)=0.0
1
Warning: Array reference at (1) out of bounds (25 > 20) in loop beginning at (2)
test.f:5:40:
test.f:3:72:
do 100 i=1,25
2
test.f:5:40:
write(0,*) 'totlev i=',i, totlev(i)
1
Warning: Array reference at (1) out of bounds (25 > 20) in loop beginning at (2)
However, not always such a warning would be produced if it was a longer program. An executable is created. When I run it it behaves like an infinite loop.
And this is my problem: How is an infinite loop even possible with the DO iteration? Isn't it a logical impossibility? My only explanation is that overindexing in this case reaches to the program code itself and changes it. Is that possible?
I use Windows 7 OS if that's relevant.
It's not changing the code, it's changing the variable i. Both the array totlev(20) and the scalar i are local variables, and thus typically stored in the program's stack frame (though the standard leaves this choice to the 'processor', Fortran-speak for implementation). In this case the compiler apparently put i 4 'real's (probably 16 bytes) after the end of totlev, so assigning to totlev(24) actually changes i. Fortran basically requires that an integer and single/default-precision real variable be the same size, and while it doesn't require any particular relationship between the representations for integers and reals, most machines today use 'IEEE 754' floating-point and in that system a real 0.0 has the same representation as an integer 0.
On many though not all computer architectures it is possible to address code by indexing an array out of range, but this almost always requires indexes far out of range: millions or billions or more, not one or two. On older architectures it was often possible both to read and write code this way, but most systems since about 1980 have memory protection so that you can't write to code. In particular all Windows NT-series systems do this, which includes Windows 7.

Where does this Fortran variable come from?

Can anyone tell me about the costh variable used in the following subroutine? From where is the subroutine obtaining the value of this variable? Is it some error or inbuilt function?
The complete subroutine is given below.
SUBROUTINE MOMENT(I,A,AP,MODE,EP,NCASC,ID)
COMMON/MOM/VY(99996),VZ(99996),VX(99996)
DIMENSION ERES(99996),FCT(99996)
COMMON/XQANG/SUM(300,10),MDIR,COSTH
EQUIVALENCE(ERES(1),VY(1)),(FCT(1),VZ(1))
IF(ID-2)10,30,40
C INITIALIZATION
10 N=1
NN=N+NCASC-1
DO 20 J=N,NN
VY(J)=EP*MDIR
VX(J)=0.
20 VZ(J)=EP*(1-MDIR)
GO TO 60
C
C CALCULATION OF MOMENT
C
CKM 30 RN4=RANF(0.)
30 RN4=RAN(ISEED) !KM
PHI=6.28318*RN4
SOX=2.*AP*EP/(A**2*931.5)
C BUG FIX A*(AP+A)REPLACED BY A**2 12/15/82
C IF(SOX.LT.0.)WRITE(*,943)AP,EP,A
VT=SQRT(SOX)
SOX=1.-COSTH**2
IF(SOX.LT.0.)SOX=0.
SINTH=SQRT(SOX)
VZSE=VT*COSTH
VYSE=VT*SINTH*SIN(PHI)
VXSE=VT*SINTH*COS(PHI)
VZ(I)=VZ(I)+VZSE
VY(I)=VY(I)+VYSE
VX(I)=VX(I)+VXSE
VZP=VZ(I)-VZSE*(A/AP+1.)
VYP=VY(I)-VYSE*(A/AP+1.)
VXP=VX(I)-VXSE*(A/AP+1.)
VPP2=VZP*VZP+VYP*VYP+VXP*VXP
EPART=AP*VPP2*469.
SOX=0.
IF(VPP2.GT.0.0)SOX=VZP/SQRT(VPP2)
C DO NOT USE QUICK FUNCTIONS HERE
ANG=ACOS(SOX)*180./3.1415927
CALL OUTEM(2,MODE,EPART,ANG)
GO TO 60
C
C END CALCULATION
C
40 VFTS=VX(I)**2+VY(I)**2+VZ(I)**2
ERES(I)=0.5*A*VFTS*931.5
IF(VFTS.NE.0.)GO TO 50
FCT(I)=0.0
GO TO 60
50 FCT(I)=ACOS(VZ(I)/SQRT(VFTS))*180./3.1415927
60 RETURN
END
This line
COMMON/XQANG/SUM(300,10),MDIR,COSTH
informs the subroutine about a common block called XQANG which has an element called COSTH. In the absence of other information, and in a code of that vintage, this is most likely to be a real variable.
Common blocks are an early-Fortran mechanism for sharing variables across program units, such as between a main program and a subroutine. In straightforward use the same common block declaration will be found in multiple locations, with the same list of variables. Each declaration refers to the same variables.
There is a twist though, and one to watch out for carefully. The common block is actually a shared block of memory, and there is no requirement that each instance of the declaration identify the same variables. One common use of common blocks was to declare, say, an array of 100 reals in one location, but to declare two arrays each of 50 reals elsewhere -- same memory, different variables.
Even better, they could also be used to change the types of variables (sort of). One usage of a common block might contain a real variable occupying 4 bytes, while another usage of the same block might contain a 4-byte integer variable at the same location - same bits, different interpretations.
These twists are among the reasons for common blocks being widely deprecated. Another reason for their deprecation is that they obscure the sharing of variables, these days most of us prefer to explicitly pass arguments into and out of subroutines through their argument lists.
I'd guess very few Fortran programmers under 50 are still writing new code using them. But Fortran programmers from 8 - 80 are still working on codes containing them.

Unexpected Statement Function at 1 in Fortran

I am new to Fortran and writing this small program to write out 100 ordered pairs for a circle.
But I get the error mentioned above and I don't know how to resolve.
implicit real*8(a-h,o-z)
parameter(N=100)
parameter(pi = 3.14159265358979d0)
integer*8 k
dtheta=2*pi/N
r=1.0d0
x00=0.0d0
y00=0.0d0
do k=0,N-1
xb(k)=r*cos(k*dtheta)-x00
yb(k)=r*sin(k*dtheta)-y00
enddo
open(64,file='xbyb.m',status='unknown')
write(64,*) (xb(k),k=0,N-1),(yb(k),k=0,N-1)
close(64)
end
You do not declare the arrays xb and yb.
Although not technically FORTRAN 77 I still suggest using implicit none or at least an equivalent compiler option to be forced to declare everything explicitly. Implicit typing is evil and leads to bugs.
As High Performance Mark reminds, the syntax
f(k) = something
declares a feature (now obsolescent in Fortran 95 and later) called a statement function. It declares a function of one argument k. The only way for the compiler to recognize you mean an array reference instead is to properly declare the array. The compiler complains the statement function is unexpected because the declaration must be placed before the executable statements.
Your implied do loop in the write statement is Fortran 90 anyway, so no need to stick to FORTRAN 77 in the 21st century.
Other tips:
status='unknown' is redundant, that is the default, just leave it out.
You can just write r = 1 and x00 = 0.

HashMap implementation optimization

I have a hashmap containing about half a million entries the key is a string whose values comes as a combination of 5 different inputs. (string concatenation) the domain of each of the input is small but the combination of the 5 inputs gives this huge map (500K items). Now I am thinking of optimizing this structure.
My idea is to hash the input (the combination of 5 inputs) by hashing each individual input and combining that 5 hashes into one single hash (int 32 or 64) and then look up that hash.
My question is there a known data structure that can handle this situation well? and Is it worth doing that optimization? I wanna optimize both memory and run-time.
I am using C++ and std::unordered_map the key is the combined string from the 5 inputs and the output is random. I didn't find any relation between the inputs and outputs (random or serial).
125 458 699 sadsadasd 5 => 56.
125 458 699 sadsadasd 3 => 57.
125 458 699 sadsadasd 4 => 58.
125 458 699 sadsadasd 5 => 25.
125 458 699 gsdfsds 3 => 89.
The domain of each of the inputs is small (the 4th input has 2K different values while the other inputs can have about only 20 different values).
You could use GNU perf to generate a perfect hash function for your keys.
It seems to me that there is no way to reduce the size of your keys that will result in reliable extraction. Hashing the 5 inputs into 1 integer is a one-way function that will prevent you from performing reliable lookups.
The way round this would be to keep a translation table, but that's actually more overhead because each distinct tuple of inputs would require the storage for 2 hashes and the tuple.
I think you're best off using a std::tuple<int, int, int, std::string, int> as the key type in a single map.
If you use a std::map<tuple<>, data_type> you won't need to provide a hashing function. If you stay with the unordered_map, you'll need to provide one since std::tuple has no default hash<> specialisation.

Importing data from file to array

I have 2 dimensional table in file, which look like this:
11, 12, 13, 14, 15
21, 22, 23, 24, 25
I want it to be imported in 2 dimensional array. I wrote this code:
INTEGER :: SMALL(10)
DO I = 1, 3
READ(UNIT=10, FMT='(5I4)') SMALL
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
But it imports everything in one dimensional array.
EDIT:
I've updated code:
program filet
integer :: reason
integer, dimension(2,5) :: small
open(10, file='boundary.inp', access='sequential', status='old', FORM='FORMATTED')
rewind(10)
DO
READ(UNIT=10, FMT='(5I4)', iostat=reason) SMALL
if (reason /= 0) exit
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
write (*,*) small(2,1)
end program
Here is output:
11 12 13 14 15
21 22 23 24 25
12
Well, you have defined SMALL to be a 1-D array, and Fortran is just trying to be helpful. You should perhaps have defined SMALL like this;
integer, dimension(2,5) :: small
What happened when the read statement was executed was that the system ran out of edit descriptor (you specified 5 integers) before either SMALL was full or the end of the file was encountered. If I remember rightly Fortran will re-use the edit descriptor until either SMALL is full or the end-of-file is encountered. But this behaviour has been changed over the years, according to Fortran standards, and various compilers have implemented various non-standard features in this part of the language, so you may need to check your compiler's documentation or do some more experiments to figure out exactly what happens.
I think your code is also a bit peculiar in that you read from SMALL 3 times. Why ?
EDIT: OK, we're getting there. You have just discovered that Fortran stores arrays in column-major order. I believe that most other programming languages store them in row-major order. In other words, the first element of your array is small(1,1), the second (in memory) is small(2,1), the third is small(1,2) and so forth. I think that your read (and write) statements are not standard but widely implemented (which is not unusual in Fortran compilers). I may be wrong, it may be standard. Either way, the read statement is being interpreted to read the elements of small in column-major order. The first number read is put in small(1,1), the second in small(2,1), the third in small(1,2) and so on.
Your write statement makes use of the same feature; you might have discovered this for yourself if you had written out the elements in loops with the indices printed too.
The idiomatic Fortran way of reading an array and controlling the order in which elements are placed into the array, is to include an implied-do loop in the read statement, like this:
READ(UNIT=10, FMT='(5I4)', iostat=reason) ((SMALL(row,col), col = 1,numCol), row=1,numRow)
You can also use this approach in write statements.
You should also study your compiler documentation carefully and determine how to switch on warnings for all non-standard features.
Adding to what High Performance Mark wrote...
If you want to use commas to separate the numbers, then you should use list-directed IO rather than formatted IO. (Sometimes this is called format-free IO, but that non-standard term is easy to confuse with binary IO). This is easier to use since you don't have to arrange the numbers precisely in columns and can separate them with spaces or commas. The read is simply "read (10, *) variables"
But sticking to formatted IO, here is some sample code:
program demo1
implicit none
integer, dimension (2,5) :: small
integer :: irow, jcol
open ( unit=10, file='boundary.txt', access='sequential', form='formatted' )
do irow=1, ubound (small, 1)
read (10, '(5I4)') (small (irow, jcol), jcol=1, ubound (small, 2))
end do
write (*, '( / "small (1,2) =", I2, " and small (2,1)=", I2 )' ) small (1,2), small (2,1)
end program demo1
Using the I4 formatted read, the data need to be in columns:
12341234123412341234
11 12 13 14 15
21 22 23 24 25
The data file shouldn't contain the first row "1234..." -- that is in the example to make the alignment required for the format 5I4 clear.
With my example program, there is an outer do loop for irow and an "implied do loop" as part of the read statement. You could also eliminate the outer do loop and use two implied do loops on the read statement, as High Performance Mark showed. In this case, if you kept the format specification (5I4), it would get reused to read the second line -- this is called format reversion. (On a more complicated format, one needs to read the rules to understand which part of the format is reused in format reversion.) This is standard, and has been so at least since FORTRAN 77 and probably FORTRAN IV. (Of course, the declarations and style of my example are Fortran 90).
I used "ubound" so that you neither have to carry around variables storing the dimensions of the array, nor use specific numeric values. The later method can cause problems if you later decide to change the dimension of the array -- then you have to hunt down all of the specific values (here 2 and 5) and change them.
There is no need for a rewind after an open statement.