delete duplicate rows in Fortran77 - fortran

I have a file which is a table of 119 columns (separated by spaces) and around 50000 rows (lines). I would like to remove the duplicated entries, i.e. those rows which have all identical columns (119). I sketched this code:
PROGRAM deldup
IMPLICIT NONE
DOUBLE PRECISION PAR(119),PAR2(119)
INTEGER I,J,K,LINE,TREP
CHARACTER filename*40
c Get the input file name
CALL getarg(1,filename)
c File where the results will be stored.
OPEN(29, FILE="result.dat", STATUS='UNKNOWN')
c Current line number
LINE=0
c counting repeated points
TREP=0
101 LINE=LINE+1
OPEN(27, FILE=filename, STATUS='OLD')
c Verifying that we are not in the first line... and we read the
c corresponding one
IF (LINE.NE.1) THEN
DO K=1,LINE-1
READ(27,11,ERR=103,END=9999)
END DO
ENDIF
READ(27,11,ERR=103,END=9999) (PAR(I),I=1,119)
c Start comparing line by line looking for matches. If a match is
c found , close the
c file and open it again to read the next line. If the end of file is
c reached and not iqual rows found, write the line in "results.dat"
102 READ(27, 11,END=104, ERR=102) (PAR2(I),I=1,119)
DO J=1,119
IF ( PAR(J).NE.PAR2(J) ) THEN
GOTO 102
ELSEIF (J.EQ.119) THEN
TREP=TREP+1
GOTO 103
ENDIF
END DO
104 WRITE(29,11) (PAR(I),I=1,119)
103 CLOSE(27)
GOTO 101
9999 WRITE(*,*) "DONE!,", TREP, "duplicated points found!"
CLOSE(27)
CLOSE(28)
CLOSE(29)
11 FORMAT(200E14.6)
END
which actually works it is just super slow. Why? Is there any library that I can use? Sorry for my ignorance, I am completely new with Fortran77.

For each line you open and close the original file, which is very slow! To speed things up, you could just use rewind.
The main issue, though, is the complexity of your algorithm: O(n^2) [You compare each line to every other line]. As a start, I would keep a list of unique line, and compare against that list. If a new row is already listed, discard it - if not, it is a new unique row. This would reduce the complexity to O(n*m), with (hopefully) m << n (m is the number of unique rows). Sorting the rows will probably speed up the comparison.
The next remark would be to move from I/O to memory! Read in the complete file into an array, or at least keep the list of unique rows in memory. A 50,000x119 double precision array requires ~45MB of RAM, so I think this should be feasible ;-)
Write the result back in one piece in a final step.

First question: Why stick with Fortran 77? Since g95 and gfortran have come along, there is no real reason to use a standard that has been obsolete for more than twenty years.
The canonical way to remove duplicate is to sort them, remove duplicates, and then output them in the original order. If you use a good sorting algorithm such as quicksort or heapsort, this will give you O(n log n) performance.
One additional remark: It is also a good idea to put magic numbers such as 119 in your program into PARAMETER statements.

Related

write output on a text file in Fortran code

I have a matrix A(3,4) in Fortran , I want to write it on a text file like this:
A(1,1) A(2,1) A(3,1)
A(1,2) A(2,2) A(3,2)
A(1,3) A(2,3) A(3,3)
A(1,4) A(2,4) A(3,4)
I use below code. It has two problems at first it is overwritten for each i and it is written in rows. I would be gratful to guide me to solve it. Thanks
do i=1,4
open (unit=10,file="out.txt",action="write")
write (10,*) A(1,i) , A(2,i) , A(3,i)
close (10)
As mentioned by Ian, your file is overwritten for each i because your open statement is inside the loop. Fortran is reopening the file fresh for each i. Move the open statement to before the loop so it is only opened once.
Of course it is written in rows because the first index in a 2-D array is the row index. You can switch the indices if you wish. On the other hand, according to your first box, it appears as though you want the rows across the columns.
You say you need to write just some elements. As long as they are in a contiguous block, you will want to use an implied do loop in the write statement. It is much more concise and you can write large blocks without typing out a lot of variables specifically. It would look like this:
open (unit=10,file="out.txt",action="write")
do i=1,4
write (10,*) (A(j,i), j=1,3)
end do
close (10)
Again, this reverses rows and columns, if you want traditional representation, switch the i and j.

Controlling newlines when writing out arrays in Fortran

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

How do I read data from a file with description and blank lines with Fortran 77?

I am new to Fortran 77. I need to read the data from a given text file into two arrays, but there are some lines that either are blank or contain descriptive information on the data set before the lines containing the data I need to read. How do I skip those lines?
Also, is there a way my code can count the number of lines containing the data I'm interested in in that file? Or do I necessarily have to count them by hand to build my do-loops for reading the data?
I have tried to find examples online and in Schaum's Programming with Fortran 77, but couldn't find anything too specific on that.
Part of the file I need to read data from follows below. I need to build an array with the entries under each column.
Data from fig. 3 in Klapdor et al., MPLA_17(2002)2409
E(keV) counts_in_bin
2031.5 5.4
2032.5 0
2033.5 0
I am assuming this question is very basic, but I've been fighting with this for a while now, so I thought I would ask.
If you know where the lines are that you don't need/want to read, you can advance the IO with a call to read with no input items.
You can use:
read(input-unit,*)
to read a line from your input file, discard its contents and advance IO to the next line.
It has been a long time since I have looked at F77 code, but in general if your read statement in a DO loop can deal with finding empty lines, or even a record that contains only blanks, then you could write logic to trap that condition and go to a break or continue statement. I just don't recall if read can deal with the situation intelligently.
Alternatively, if you are using a UNIX shell and coreutils, you can use sed to remove empty line, /^$/
or /^ *$/ to preprocess the file before you send it onto F77
Something like
$ sed infile -e 'd/^$/;d/^ *$/' > outfile
It should look something like this:-
C Initialise
integer i
character*80 t1,t2,t3
real*8 x,y
open(unit=1,file='qdata.txt')
C Read headers
read(1,100)t1
100 format(A80)
write(6,*) t1
read(1,100)t2
write(6,*) t2
read(1,100)t3
write(6,*) t3
write(6,*)
C Read data
do 10 i=1,10
read(1,*,end=99) x,y
write(6,*) x,y
10 continue
99 continue
end
So I've used a classic formatted read to read in the header lines, then free-format to read the numbers. The free-format read with the asterisk skips white space including blank lines so it does what you want, and when there is no more data it will go to statement 99 and finish.
The output looks like this:-
Data from fig. 3 in Klapdor et al., MPLA_17(2002)2409
E(keV) counts_in_bin
2031.5000000000000 5.4000000000000004
2032.5000000000000 0.0000000000000000
2033.5000000000000 0.0000000000000000

Standard Input and Command line file in Fortran77

I've been given the challenge to port a Fortran 77 program into C#.
I've found out that read(5,*) read from the standard input, i.e. the keyboard.
Now I'm trying to understand how the following works:
1. When I run the program, I have to run it as cheeseCalc<blue.dat>output.txt
, which read a blue.dat file and produces a output.txt file. How does read work in this case?
In the same program, there is READ(5,* )IDUM and later it also has read(5,*)idum,idum,tinit. What is happening in this case?
The blue.dat file has the following lines:
HEAD make new cake
INPUT VARIABLES
MFED MASS-FEED 30 ;1001 1 100 PEOPLE TO FEED
TOVE TEMP-IN-OVEN 150.0 ;1001 20 100 TEMPERATURE OF OVEN, C
UPDATED: Just for context, the initial lines of code in the program are:
program cheeseCalc
CHARACTER*76 IDENT
CHARACTER*1 IDUM
READ(5,104)IDENT
104 FORMAT(4X,A)
READ(5,*)IDUM
c write start record
write(6,102)IDENT
102 format('**START',/,4X,A,/)
read(5,*)idum,idum,frate
110 format(f10.0)
frate2=frate/3.6
read(5,*)idum,idum,tempo
* Do calculation *
write(6,*)frate2,tempo
end
Any help will be appreciated!! Thanks!
The full detail of the general read statement is documented elsewhere, but there is an idiom here which is perhaps worth elaborating on.
The statement read(5,*) ... is list-directed input from the external unit number 5. Let's assume (it's not guaranteed, but it's likely and you seem happy with that for your setup) that this external unit is standard input.
The idiomatic part is the repeated use of a single variable in an input list such as
read(5,*) idum, idum, ...
This (and the fact that idum is an (awfully named) length-1 character variable) signifies that the user doesn't care about the input in the first two fields). The first string, delimited by blanks, is read then the first character is assigned to idum. Then idum is immediately set to the first character of the next string.
The purpose of this is to set the place in the record to the third field, which is read into the (real) variable frate (in the first case).
Equally
read(5,*) idum
is just skipping the second line (strictly, reading the first character, but that's not used anywhere before the next read into idum): the first blank-delimited field is read but the next read moves on to the next line rather than continuing with that one.

Fortran randomely writing data in file

How to write a text or dat file in FORTRAN like a 2D array of integers and each time to enter a value, if in any row there is no value just insert in the start but if some values exists insert to the end of values. This insertion of values can be random, i.e. may be line number 100 first then 80 then 101 then 2. The number of entries in each line is also different.
I also need to use this file at the end but I think that will be easy as need line by line information.
Edit (what he ment possibly) :: How to write a text file in Fortran, like a 2D array of integers, each time adding one value? If there is an empty row with no values, insert one at the beginning of a row, but if there are already some values in that row, append the new value to the end of the row.
Have no idea what he was getting at with those random values and line numbers.
If you want to make decisions based on the input, read the line into a string. Then examine the contents of the string and decide which case of input. If you have numbers that you want to read, use an "internal read" to read them from the string. This question has a code example: Reading comment lines correctly in an input file using Fortran 90