I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000
This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)
Related
I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000
This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)
I have question with open command of fortran.
OPEN (UNIT = , FILE=file-name, ACCESS=access, FORM=form, RECL=recl)`
Access = sequential, direct
FORM=formatted, unformatted
recl is is the record length in bytes for a file
I tried searching a lot, but could not get what is meaning of sequential or direct access, formatted or unformatted file, record length of a file. Can someone explain me what these terms mean?
File access specifies how the file will be written to (or read from) after opening. Opening with one access mode, but reading/writing consistent with another access mode, often results in a runtime error.
Sequential access, naturally enough, implies reading and writing sequentially. Writing sequentially means that output is placed in the output file in the same order that the program produces it so, if X is output before Y, the file will contain X before (closer to the beginning of the file) than Y. Reading sequentially means that reading occurs from start toward end of the file. Append access is a special form of sequential access which starts at the end of the file (so write operations add to the end of the file).
Direct access means that contents of the file can be accessed in any order. This is also called random access. Essentially, when performing input or output, the program must specify the position in the file where the operation is to occur.
The position in the direct access file in Fortran is specified in terms of "records", which all have exactly the same length (specified by the RECL= clause when the file is opened). So, if a file contains 20 records and has record length equal to 30, the total size of data the program can access from the file is 600 bytes, and every read or write operation will access a record containing 30 bytes.
An unformatted file basically means the contents of the file are read and written as a stream. An unformatted sequential access file is the equivalent of a binary file in languages like C that is read from beginning to end. An unformatted direct access file is also binary, but operations can access the file in any order (under control of the program).
A formatted file essentially means that all reading and writing must involve a format specification. There are also some special treatments such as, when writing, a newline marker written to the file at the end of every write statement.
A straight text file is typically opened as a sequential access formatted file. Every Fortran read or write operation acts on a new line (so two write statements will produce two lines in the file, and two corresponding read statements will be need to read them back in).
It is possible to have a formatted direct access file. This basically means the read and write statements must specify formats to read/write the records, but records can be accessed in any order. The ends of records are typically marked with newlines.
It's easy to find on the web (including discussion here):
A "record" is data, usually in characters. Some files have records which are all the same length, some do not. In between, there are files which store the length of each record as part of the record. It is simplest to work with files having records which are all the same length, because (for many storage devices) you can compute the beginning of a particular record by knowing the record number and the length of the records. If the records are different lengths, it is more work to keep track of the record locations.
sequential files are accessed one record at a time, like a tape (see this page for length discussion). As a rule, tapes could be rewound, read forward, but reading at a random point was harder. Doing that is direct access. This page makes it clear that there is a distinct choice between the two - you can have one or the other.
Formatted output is just that - making the output follow some report-style format (on the level of lines), while unformatted output does not follow tidy rules. See Fortran unformatted file format for examples of discussion. On a more technical slant, this page at Oracle goes into more depth.
I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000
This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)
I am getting run time error: Bad real number in item 1 of list input for this sample problem. Please, suggest the correct way.
implicit double precision (a-h,o-x)
parameter (ni=150)
dimension x(ni)
open(40,file='fortin')
do 80 i=1,5
read(40,*)x(i)
write(*,*)i,x(i)
80 continue
stop
end
The data in the fortin file arranged in column
1.0
5.0
3.0
5.0
7.0
Your code expects only numbers and it appears you have characters in the file. You can do one of two things to fix this:
Delete the words at the top of the fortin file
Add a single read(*,*) (no need for anything following it) before the loop
In my case, the problem lies in the data file, not the code.
My problem turn out to be the file is in Unicode format. When I view in vi, it's shown fine. But when I view in a viewer that does not support unicode, such as using midnight commander, it look like a mess. The one who sent me the file later told me that he save the file in UTF-16.
Are there some situation where I have to prefer binary file to text file? I'm using C++ as programming language?
For example if I have to store some large text file is it better use text file or binary file?
Edit
The file for the moment has no requirment to be readable from human. Are some performance difference, security difference and so on?
Edit
Sorry for the omit other the requirment (thanks to Carey Gregory)
The record to save are in ascii encoding
The file must be crypted ( AES )
The machine can power off any time. So I've to try to prevents errors.
I've to know if the file change outside the program, I think I'll use a sha1 digest of the file.
As a general rule, define a text format, and use it. It's much
easier to develop and debug, and it's much easier to see what is
going wrong if it doesn't work.
If you find that the files are becoming too big, or taking to
much time to transfer over the wire, consider compressing them.
A compressed text file is often smaller than you can do with
binary. Or consider a less verbose text format; it's possible
to reliably transmit a text representation of your data with
a lot less characters than XML uses.
And finally, if you do end up having to use binary, try to chose
an existing format (e.g. Google's protocol blocks), or base your
format on an existing format. Just remember that:
Binary is a lot more work than text, since you practically
have to write all of the << operators again, including those
in the standard library.
Binary is a lot more difficult to debug, because you can't
easily see what you've actually done.
Concerning your last edit:
Once you've encrypted, the results will be binary. You can
use a text representation of the binary (base64 or some such),
but the results won't be any more readable than the binary, so
it's not worth the bother. If you're encrypting in process,
before writing to disk, you automatically lose all of the
advantages of text.
The issues concerning powering off mean that you cannot use
ofstream directly. You must open or create the file with the
necessary options for full transactional integrity (O_SYNC as
a flag to open under Unix). You must write each record as
a single write request to the system.
It's always a good idea to have a checksum, just in case. If
you're worried about security, SHA1 is a good choice. But keep
in mind that if someone has access to the file, and wants to
intentionally change it, they can recalculate the SHA1 and
insert the new value as well.
All files are binary; the data within them is a binary representation of some information. If you have to store a large amount of text then the file will contain the binary representation of that text. The difference between a "binary file" and a "text file" is that creating the latter involves converting data to a text form before saving it. This is typically done so humans can read it.
The distinction between binary and text is usually made when storing data that is for computer consumption. Typically this data would not be text - it might be a list of numerical configuration values, for example: 1, 2, 3.
If you stored this in text format, your file could contain a list of human-readable numbers, and if you opened the file in Notepad you might see one number per line. But what you're actually saving here is not the binary values 1, 2, 3 - you're saving a string "1\n2\n3\n". Note that this string is 6 characters long, and the binary values (assuming ASCI) would actually be 49, 10, 50, 10, 51, 10!
If the same data were stored in binary format, you would store the numbers in the smallest useful space, and write the file as individual bytes that can often only be read by the code that created them. Opening this file in Notepad would likely display junk characters, because the data makes no sense as text. In this case you would be saving a byte array with actual values { 1, 2, 3 } - or even a single byte with the three values embedded. This could be much smaller than the human-readable equivalent.
Binary files store a sequence of bytes like all other files. You can store numeric values like integers per 4 bytes, characters per single byte, or even serialized class objects and anything you want.
When you know how to read a binary file (ie. you know what is stored in it) you can extract all the information from it. However, text files use text encodings like UTF8, ANSI etc. and they are intended to encode text characters to be processed by text editors.
Binary files are for machines only to interpret, whereas a text file, a human can also open and interpret its content.
So it depends whether you want your file to be readable by a human or not.
It depends on a lot of factors. I can think of two right now:
Do you require the file to be readable by humans?
Is compression a factor? A 10-digits number will take at least 10 bytes as text, but might take as little as four or two as binary.
All data is binary. You always need a machine to interpret it for you. Even if the data is compressed like protocol buffers, Avro, Thrift etc, it is binary, and if it is uncompressed, it is still binary. If you want to read protocol buffers by notepad, there is a two step process. Uncompress, and read. In case of text, this step of uncompressing is not needed. Same is case with encrypted. First unencrypted, and then read. Humans cannot read binary (as some commenters are mentioning). We still need notepad to interpret and display binary (so called text).
All data stored in a text file are human-readable graphic characters. Each line of data ends with a new line character.
In case of a binary file - data is stored in the same format as they are stored in the memory. There are no lines or new line characters. There is an end of file marker.
Moreover binary files show more efficiency for memory as they are stored in zeros and one's.