If I have a text file where lines contains some non-blank characters followed by spaces, how do I read those lines into a character variable without excess spaces?
character (len=1000) :: text
open (unit=20,file="foo.txt",action="read")
read (20,"(a)") text
will read the first 1000 characters of a line into variable text, which will be padded with spaces at the end if there are fewer than 1000 characters in the line. But if the line length is 100 you have 900 extraneous spaces, and the program does not "know" how long the line read actually was.
Fortran strings are blank-padded. There is simply no chance to distinguish any significant blank-padding in your strings with constant-length Fortran strings.
If every whitespace character is important, I suggest to treat the file as a stream-access file instead (formated or unformatted as needed), read individual characters to some array buffer and allocate a deferred-length string only after you know the length you actually need.
character (len=1000) :: text
integer :: s, ios
open (unit=20,file="foo.txt",action="read")
read (20,"(a)", size=s, advance='no', iostat=ios) text
After that last line, s contains the number of characters read, including trailing spaces, which I think is what you wanted.
Notes:
With a size tag, you must also have an advance tag set to 'no' otherwise you get a compilation error. Since the format is "(a)", the whole line is read so the next read statement will advance to the next line despite the 'no'. That's fine.
ios stores a negative integer when attempting to read past the end of the line. This will always happen if the line is shorter than length of text. That's fine.
When attempting to read past the end of the file, ios will store a different negative integer. What those two negative integers are is not set by the standard I think so you may have to experiment a bit. In my case, with the gfortran compiler, ios was -1 when attempting to read past the end of the file and -2 otherwise.
Related
I have a very long file, in which it can be assumed (if it helps) every line of which has the same format. I want to read a specific line of the file. Is it possible in C++ to move the pointer to that line via a binary search instead of starting at the top of the file and reading line by line and counting lines? That is, is it possible to access some line_of_file pointer and move it by binary search? If not in C++, is this task possible in assembly language or some other language?
You cannot usefully use binary search to find a line by line number in a text file, because text files are not indexed by line number. In other words, there is no way to figure out the line number of a given offset in the file other than starting at the beginning, reading every character, and counting the number of new line characters.
There is only one exception, and in that case binary search won't help you either. If every line in the file is the exact same length, then you can find the offset of a specific line by multiplying that length by the line number (using 0 as the number of the first line). Don't forget to include the newline character in the line length. You can use istream::seekg or ostream::seekp to position the next input or output operation, respectively. (You need to use the two-argument version. Some other warnings apply on platforms which translate newline characters to multicharacter sequences; here's looking at you, windows.)
open(NEWUNIT=fId, file=trim(filename), iostat=ierr, action='READ')
if (ierr /= 0) return
read(fID,'(A)') dataArray
close(fId)
Does this code read the complete data in a file and save it in an array or it just reads a single line?
Consider
character(len=10) name(5)
read(*,'(A)') name
Having the whole array in the input stamement is treated the same as as specifying the array's elements in order:
read(*,'(A)') name(1), name(2), name(3), name(4), name(5)
The input statement will attempt to read five values, each according to the edit descriptor A. What happens as a result depends on various things.
Format reversion means that when one value is transferred the file is positioned on the next record (line). So, in the case here assuming the file has at least two lines, the first 10 "characters" of the first line are read and put into name(1). Then the first 10 "characters" of the second line are read and put into name(2). And so on for as long as lines remain.
Situations (assuming no errors come about):
there are as many lines as elements of the array: all lines are read in to the array (but only as much of a line as the character length of the variable);
there are more lines than there are elements: only the number of lines equal to the number of records are read;
there are more elements than there are lines: an end-of-file condition occurs and the array name becomes undefined.
I wonder how Fortran's I/O is expected to behave in case of a NULL character ACHAR(0).
The actual task is to fill an ASCII file by blocks of precisely eight characters. The strings are read from a binary and may contain non-printing characters.
I tried with gfortran 4.8, 8.1 and f2c. If there is a NULL character in the string the format specifier FORMAT(A8) does not write eight characters.
Give the following F77 code a try:
c Print a string of eight character surrounded by dashes
100 FORMAT('-',A8,'-')
c Works fine if empty or any other combination of printing chars
write(*,100) ''
c In case of a short sting blanks are padded
write(*,100) '345678'
c A NULL character does something I did not expect
write(*,100) '123'//ACHAR(0)//'4567'
c Not even position editing helps
101 FORMAT('-',A8,T10,'x')
write(*,101) '123'//ACHAR(0)//'4567'
end
My output is:
- -
- 345678-
-1234567-
-1234567x
Is this expected behavior? Any idea how to get the output eight characters wide in any case?
When using an edit descriptor A8 the field width is eight. For output, eight characters will be written.
In the case of the example, it isn't the writing of the characters that is contrary to your expectations, but how they are displayed by your terminal.
You can examine the output further with tools like hexdump or you can write to an internal file and look at arbitrary substrings.
Yes, that is expected, if there is a null character, the printing of the string on the screen can stop there. The characters will still be sent, but the string does not have to be printed on the screen.
Note that C uses NULL to delimit strings and the OS may interpret the strings it receives with the same conventions. The allows the non-printable characters to be interpreted in processor specific ways by the processor and the processor includes the whole complex of the compiler, the executing environment (OS and programs in the OS) and the hardware.
I have an input file with letters and numbers that I'd like to delimit with numbers in Fortran 90/95. The input file looks like this:
AAAA (spaces) 123BBBB (spaces) 4CCCC (spaces) 5DDDD (spaces) -6EEEE
So on and so forth. I'd like for the numbers after the spaces to be with the four letters prior to the spaces. The problem I'm running into here is that the numbers can either be one, two, three, or four digits, and can have negative signs as well. I'm not sure how to automate delimiting in Fortran to get the appropriate numbers to the correct letters.
So far, I only have written a script which essentially replicates the input file and writes it to an output file. I wanted to accomplish this first before trying delimiting as above.
ALTERNATIVELY, I can try delimiting in Python (if it's easier in Python), and call the Python delimiting script from the Fortran program.
Fortran has the SCAN and VERIFY intrinsics that let you find the location in a string of the first (or optionally last) character that is (or is not) in a specified character set. Your example is malformed as there is no number after EEEE, but I'll ignore that for now.
The way I would handle this is to keep a position value, use INDEX to locate the next blank, which tells me how many letters are there from the current position. Then I would use VERIFY with a set ' -0123456789' to identify the next non-numeric. This tells me what the next number is. I'd use a list-directed READ from that substring to read the number. Repeat until end of string.
There are undoubtedly other ways of doing this, but calling out to another language is wholly unnecessary.
I received a textfile created with Notepad++ that I'm trying to read with a Fortran 95 program on both a Mac and a PC. The read line is:
read(lun,'(a)',iostat=io1) input
Since I don't know what the line lengths are I defined input to be 512 in length. With non-notepad++ files when the end of line is found the read "stops" and automatically advances to the next line of text. With the notepad++ file, it reads 512 characters, skipping over the carriage returns. When I open the file using the dos editor on the pc I see carriage return symbols (ASCII char 13) but there is no break between lines, they are all appended to one another.
I've tried searching for ichar(13) and ichar(10), backspacing to the beginning of the line and trying to force an advance to the next line; reading in with format '(a,/')', but haven't been able to get anything to work.
What you need is a pipeline type design. The basic routine is one called getline, which gets a line of data up to the carriage return. Inside the initialization, what you do is open the file as a binary file and read a buffer of say 1024 characters in. Whenever getline is called, return the next lot of characters until you get to a CR. If there aren't enough characters, move the unprocessed characters to the front and read in the remaining characters.
It is basically how compilers work - they get a stream of tokens, which, in your case is a string of characters ending with a CR, and then they process the tokens.