Fortran95 -- Reading from a formatted text file - fortran

I need to read some values from a table. These are the first five rows, to give you some idea of what it should look like:
1 + 3 98 96 1
2 + 337 2799 2463 1
3 + 2801 3733 933 1
4 + 3734 5020 1287 1
5 + 5234 5530 297 1
My interest is in the first four columns of each row. I need to read these into arrays. I used the following code:
program ----
implicit none
integer, parameter :: totbases = 4639675, totgenes = 4395
integer :: codtot, ks
integer, dimension(totgenes) :: ngene, lend, rend
character :: genome*4639675, sign*4
open(1,file='e_coli_g_info')
open(2,file='e_coli_g_str')
do ks = 1, totgenes
read(1,100) ngene(ks),sign(ks:ks),lend(ks), rend(ks)
end do
100 format(1x,i4,8x,a1, 2(5x,i7), 22x)
do ks = 1, 100
write(*,*) ngene(ks), sign(ks:ks),lend(ks), rend(ks)
end do
end program
The loop at the end of the program is to print the first hundred entries to test that they are being read correctly. The problem is that I am getting this garbage (the fourth row is the problem):
1 + 3 757934891
2 + 337 724249387
3 + 2801 757803819
4 + 3734 757803819
5 + 5234 757935405
Clearly, the fourth column is way off. In fact, I cannot find these values anywhere in the file that I am reading from. I am using the gfortran compiler for Ubuntu 12.04. I would greatly appreciate if somebody would point me in the right direction. I'm sure it's likely that I'm missing something very obvious because I'm new at Fortran.

Fortran formats are (traditionally, there's some newer stuff that I won't go into here) fixed format, that is, they are best suited for file formats with fixed columns. I.e. column N always starts at character position M, no ifs or buts. If your file format is more "free format"-like, that is, columns are separated by whitespace, it's often easier and more robust to read data using list formatting. That is, try to do your read loop as
do ks = 1, totgenes
read(1, *) ngene(ks), sign(ks:ks), lend(ks), rend(ks)
end do
Also, as a general advice, when opening your own files, start from unit 10 and go upwards from there. Fortran implementations typically use some of the low-numbered units for standard input, output, and error (a common choice is units 1, 5, and 6). You probably don't want to redirect those.
PS 2: I haven't tried your code, but it seems that you have a bounds overflow in the sign variable. It's declared of length 4, but then you assign to index ks which goes all the way up to totgenes. As you're using gfortran on Ubuntu 12.04 (that is, gfortran 4.6), when developing compile with options "-O1 -Wall -g -fcheck=all"

Related

Reading records from a file in FORTRAN66 using stdin adding extra unwanted junk

I'm trying to read a file in the format specified below using FORTRAN 66.
1000
MS 1 - Join Grps Group Project 5 5
Four Programs Programming 15 9
Quiz 1 Quizzes 10 7
FORTRAN Programming 25 18
Quiz 2 Quizzes 10 9
HW 1 - Looplang Homework 20 15
I execute and read the file like so:
program < grades.txt
The first line is the total number of points that can be earned in a class
The rest of the lines are assignments in a class
Each line is formatted as such: Assignment name(20 chars) category (20 chars) possible points(14 chars) earned points(14 chars)
For some reason, when the code runs and reads the file, starting at the first assignment record, I get error 5006, and cannot find an explanation of the error code. The output of the program while debugging looks like this:
$ file < grades.txt
MS 1 - Join Grps Group Project 5 6417876
NOT EOF
EOF 5006
NAME CATEGORY POSSIBLE EARNED
My goal is to be able to read each line and put each column into it's appropriate array, then reference those arrays later on to print a report for each category, with each assignment, points possible, earned, and total percentage for the category, then loop, etc.
I do not understand where the "6417876" in the output is coming from, it is definitely not part of the file that's being piped into stdin while the program reads.
The code for the program is as follows:
CHARACTER*20 ASSIGNMENTT(100)
CHARACTER*20 CATEGORY(100)
INTEGER POSSIBLE(100)
INTEGER EARNED(100)
INTEGER TOTALPTS
INTEGER REASON
INTEGER I, N
READ(5,50)TOTALPTS
50 FORMAT(I4)
c Read the arrays in
I=1
100 READ(5,110,IOSTAT=REASON)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
110 FORMAT(2A20x,2I14x)
WRITE(*,110)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
I=I+1
IF (REASON < 0) GOTO 120
WRITE(*,*)"NOT EOF"
IF (I<100 .AND. REASON == 0) GOTO 100
WRITE(*,*)"EOF", REASON
c Get the number of items (For some reason stdin adds an extra item that's not in the file, so I subtract 2 instead of 1
120 N=I-2
c Display the Names and Ages
WRITE(*,200)
200 FORMAT("NAME",T20,"CATEGORY",T40,"POSSIBLE",T54,"EARNED",T68)
DO 300 I=1,N
210 FORMAT(A20,A20,I14,I14)
300 WRITE(*,210)ASSIGNMENTT(I),CATEGORY(I),POSSIBLE(I),EARNED(I)
END
What could be causing the read issues I'm facing?
The line to read the file contents was too long, so I shortened the names of the variables to save some space and the problem was solved.

Writing a Matrix Array of 2 Rows by 3 Columns, To an Output Text File in Fortran 95

I'm currently learning how to write Matrix Arrays to output Text Files in Fortran 95. The problem I'm facing is that, the Matrix Array of 2 Rows by 3 columns I'm working on, is not formatting to what I desire in the Output Text File. I believe, I'm missing One or Two Lines of Codes or failing to add a few codes to the current Lines of Codes I have. Below are My Lines of Codes, Current Output Data and Desired Output Data. The Goal is to get the "Desired Output Data". Kindly show me My mistake(s), what codes/line(s) of codes I'm missing and where I should add the codes/line(s) of codes. Every answer is welcomed and appreciated. Thank you Stackovites.
Lines of Codes:
Program Format2
!illustrates formatting Your Output
Implicit None
Integer:: B(2,3)
Integer:: k,j
!Open Output Text File at Machine 8
Open(8,file="formatoutput2.txt")
Data B/1,3,5,2,4,6/
Do k= 1,2
B(2,3)= k
!Write to File at Machine 8 and show the formatting in Label 11
Write(8,11) B(2,3)
11 format(3i3)
End Do
Do j= 3,6
B(2,3)= j
!Write to File at Machine 8 and show the formatting in Label 12
Write(8,12) B(2,3)
12 format(3i3)
End Do
End Program Format2
Current Output Data
1
2
3
4
5
6
Desired Output Data
1 3 5
2 4 6
B(2,3) refers only to one particular element of the array. Namely the element with index 2 in the first dimension and index 3 in the other dimension. To refer to a different element use B(i,j) where i and j are integers with the desired index. To refer to the whole array use just B or alternatively B(:,:) for an array section that encompasses the whole array.
So to set the values
do j = 1, 3
do i = 1, 2
B(i,j) = i + (j-1) * 2
end do
end do
and to print them use one of the methods showed in countless duplicates (Print a Fortran 2D array as a matrix Write matrix with Fortran How to write the formatted matrix in a lines with fortran77? Output as a matrix in fortran -- search for more, there will be better ones...) on this site
do i = 1, 2
write(8,'(999i3)') B(i,:)
end do
I've seen My mistakes. The instructions I gave the Fortran Compiler, was the result I got in My Output Text File. I was declaring two-2 Rows of (1,2) and (3,4,5,6); instead of declaring Three-3 Columns of (1,2); (3,4) and (5,6). Below is the correct Lines of Codes to get the Desired Output Data.
Lines of Codes:
Program Format2
!illustrates formatting Your Output
Implicit None
Integer:: B(2,3)
Integer:: k,j
!Open Output Text File at Machine 8
Open(8,file="formatoutput2.txt")
Data B/1,2,3,4,5,6/
!(a)Declare The 1st Two-2 Values You want for k Two-2 Rows, that is (1 and 2)
!(b)Note, doing (a), automatically declares the values in Column 1, that is (1 and 2)
Do k= 1,2
B(2,3)= k
End Do
!(c)Next, Declare, the Four Values You want in Column 2 and Column 3. That is [3,4,5,6]
!(d) Based on (c), We want 3 and 4 in "Column 2"; while "Column 3", gets 5 and 6.
Do j= 3,6
B(2,3)= j
End Do
!Write to File at Machine 8 and show the formatting in Label 13
Write(8,13) ((B(k,j),j= 1,3),k= 1,2)
13 format(3i3)
End Program Format2
The Above Lines of Codes, gives the Desired Output below:
1 3 5
2 4 6

Write results between different text while adapting spaces in Fortran

I try in a small code to write output results with numerical values between various text.
For the moment, I do :
! Print results
write(*,*)
write(*,*) ' Time step = ',dt
write(*,*)
write(*,1001) epsilon,step
write(*,*)
write(*,*) ' Problem size = ',size_x*size_y
write(*,*)
write(*,1002) elapsed_time
write(*,*)
write(*,*) ' Computed solution in seq.dat file '
write(*,*)
! Formats available to display the computed values on the grid
1001 format(' Convergence = ',f11.9,' after ',i9,' steps ')
1002 format(' Wall Clock = ',f15.6)
which produces at the execution :
Time step = 0.000003755783907217
Convergence = 0.100000000 after 8882 steps
Problem size = 24576
Wall Clock = 5.213814
Computed solution in Seq.dat
My issue is about the line "Wall Clock = 5.213814", I would like to get only one space juste after "Wall Clock =" before the value "5.213814". Currently, I think these multiple spaces that I get come from the "f15.6" with 1002 format(' Wall Clock = ',f15.6).
Here's what I want to get (with another value for steps) :
Time step = 0.000003755783907217
Convergence = 0.100000000 after 20910988821 steps
Problem size = 24576
Wall Clock = 5.213814
Computed solution in Seq.dat
I have set "f15.6" since I can get high number for "Wall Clock", same thing for espilon and step variables.
I don't know in all cases how to set just one space between words and values to write between them, as when I printf, in C language, different values and words on the same line.
I know there's a simple solution but can't find it.
UPDATE 1 :
I tried the solution indicated in the first answer.
Here's what I have done :
write(*,1001) epsilon,step
write(*,1002) elapsed_time
1001 format(' Convergence = ',f0.9,' after ',i9,' steps ')
1002 format(' Wall Clock = ',f0.6)
and I get :
Convergence = .100000000 after 8882 steps
Problem size = 24576
Wall Clock = 2.492813
As you can see, "Convergence" value is .100000000 instead of 0.100000000 (leading zero has disappeared).
And what about the integers values, can I write "i0" to have as few as possible ?
Thanks
Modern Fortran compilers understand a 'length' of 0 to mean: As few as possible:
program write_format
use iso_fortran_env, only: real64
implicit none
print 1001, 5.213814
print 1001, 12345678.901234_real64
1001 format("Wall Clock = ", f0.6)
end program write_format
Output:
Wall Clock = 5.213814
Wall Clock = 12345678.901234
Cheers
Usually it's not liked to update the question after the answer to ask additional questions, but since they're quite similar, I think it's okay.
Firstly, yes, format I0 means as few digits as necessary, and probably is what you want.
The second part is trickier, it seems to boil down to 'at least that many digits, but more if needed' -- and I don't think there's a format specifier for that (but I might be wrong).
I'd probably cheat and use something like this:
if (epsilon < 10.) then
write(*, 1002) epsilon
else
write(*, 1003) epsilon
end if
1002 format("Convergenge = ", f11.9)
1003 format("Convergence = ", f0.9)
But then again, I also found this answer quite intuitive: How to pad FORTRAN floating point output with leading zeros?
Adapted for you, it would mean splitting the floating point number into an integer and the rest, and putting it back together again:
write(*, 1002) int(epsilon), epsilon-int(epsilon)
1002 format("Convergence = ", I0, F0.9)
this is a bit cumbersome, but one way to get minimum width and preserve the lead zero is to use an internal write like this:
character*30 val
write(val,'(f11.9)')0.1d0
write(*,'(3a,i0,a)')'converge = ',trim(adjustl(val)),' after ',32432,' steps'
converge = 0.100000000 after 32432 steps

How to search elements of a list in a file

Please check the code for searching list elements in file.
f=open("a.txt","r")
p=open("b.txt","r")
disk=[]
for line in p:
line = line.strip()
disk.append(line)
for line in f.readlines():
for word in disk[0]:
if word in line:
print line
The list is below:
>>> disk
['5000cca025884d5', '5000cca025a1ee6']
I want to search this list elements in file below, but I am not getting the output for index 0.
c0t5000CCA025A1EE6Cd0 <preSUN30G-A2B0-279.40GB> /scsi_vhci/disk#g5000cca025a1ee6c
1. c0t5000CCA025A28FECd0 <preSUN30G-A2B0-279.40GB> i/disk#g5000cca025a28fec
2. c0t5000CCA0258BA1DCd0 <HsdfdsSUN30G-A2B0 cyl 46873 alt 2 hd 20 sec 625> i/disk#g5000cca0258ba1dc
3. c0t5000CCA025884D5Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca025884d5c
4. c0t5000CCA02592705Cd0 <UN300G cyl 46873 alt 2 hd 20 sec 625> solaris i/disk#g5000cca02592705c
The only error that presents itself in your code is this:
for word in disk[0]:
As I mentioned in the comments, what this does is grab the first string in the disk list and start iterating over the individual characters. This will lead to most of the lines in a.txt getting printed multiple times.
Another possible problem would be getting the two files backwards. I did this accidentally when I was trying to duplicate your problem. When the files are backwards, nothing gets printed, because none of the lines in a.txt are in b.txt (in fact, most of them are much longer).
Here is a project on repl.it that shows the program working.

Calculating the distance between characters

Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.