Hy everybody, I've found some problems in reading unformatted character strings in a simple file. When the first / is found, everything is missed after it.
This is the example of the text I would like to read: after the first 18 character blocks that are fixed (from #Mod to Flow[kW]), there is a list of chemical species' names, that are variables (in this case 5) within the program I'm writing.
#Mod ID Mod Name Type C. #Coll MF[kg/s] Pres.[Pa] Pres.[bar] Temp.[K] Temp.[C] Ent[kJ/kg K] Power[kW] RPM[rad/s] Heat Flow[kW] METHANE ETHANE PROPANE NITROGEN H2O
I would like to skip, after some formal checks, the first 18 blocks, then read the chemical species. To do the former, I created a character array with dimension of 18, each with a length of 20.
character(20), dimension(18) :: chapp
Then I would like to associate the 18 blocks to the character array
read(1,*) (chapp(i),i=1,18)
...but this is the result: from chapp(1) to chapp(7) are saved the right first 7 strings, but this is chapp(8)
chapp(8) = 'MF[kg '
and from here on, everything is leaved blank!
How could I overcome this reading problem?
The problem is due to your using list-directed input (the * as the format). List-directed input is useful for quick and dirty input, but it has its limitations and quirks.
You stumbled across a quirk: A slash (/) in the input terminates assignment of values to the input list for the READ statement. This is exactly the behavior that you described above.
This is not choice of the compiler writer, but is mandated by all relevant Fortran standards.
The solution is to use formatted input. There are several options for this:
If you know that your labels will always be in the same columns, you can use a format string like '(1X,A4,2X,A2,1X,A3,2X)' (this is not complete) to read in the individual labels. This is error-prone, and is also bad if the program that writes out the data changes format for some reason or other, or if the labes are edited by hand.
If you can control the program that writes the label, you can use tab characters to separate the individual labels (and also, later, the labels). Read in the whole line, split it into tab-separated substrings using INDEX and read in the individual fields using an (A) format. Don't use list-directed format, or you will get hit by the / quirk mentioned above. This has the advantage that your labels can also include spaces, and that the data can be imported from/to Excel rather easily. This is what I usually do in such cases.
Otherwise, you can read in the whole line and split on multiple spaces. A bit more complicated than splitting on single tab characters, but it may be the best option if you cannot control the data source. You cannot have labels containing spaces then.
Related
So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)
I am trying to create an MDM file using HLM 7 Student version, but since I don't have access to SPSS I am trying to import my data using ASCII input. As part of this process I am required to input the data format Fortran style. Try as I might I have not been able to understand this step. Could someone familiar with Fortran (or even better HLM itself) explain to me how this works? Here is my current understanding
From the example EG3.DAT they give
(A4,1X,3F7.1)
I think
A4 signifies that the ID is 4 characters long.
1X means skip a space.
F.1 means that it should read 1 decimal places.
I am very confused about what 3F7 might mean.
EG3.DAT
2020 380.0 40.3 12.5
2040 502.0 83.1 18.6
2180 777.0 96.6 44.4
Below are examples from the help documents.
Rules for format statement
Format statement example
EG1 data format
EG2 data format
EG3 data format
One similar question is Explaining Fortran Write Format. Unfortunately it does not explicitly treat the F descriptor.
3F7.1 means 3 floating point numbers, each printed over 7 characters, each with one decimal number behind the decimal point. Leading characters are blanks.
For reading you don't need the .1 info at all, just read a floating point number from those 7 characters.
You guessed the meaning of A4 (string of four characters) and 1X (one blank) correctly.
In Fortran, so-called data edit descriptors (which format the input or output of data) may have repeat specifications.
In the format (A4,1X,3F7.1) the data edit descriptors are A4 and F7.1. Only F7.1 has a repeat specification (the number before the F). This simply means that the format is as though the descriptor appeared repeated: like F7.1, F7.1, F7.1. With a repeat specification of 1, or not given, there is just the single appearance.
The format of the question, then, is like
(A4,1X,F7.1,F7.1,F7.1)
This format is one that is covered by the rules provided in one of the images of the question. In particular, the aspect of repeat specification is given in rule 2 with the corresponding example of rule 3.
Further, in Fortran proper, a repeat count specifier may also be * as special case: that's like an exceptionally large repeat count. *(F7.1) would be like F7.1, F7.1, F7.1, .... I see no indication that this is supported by HLM but if this is needed a very large repeat count may be given instead.
In 1X the 1 isn't a repeat specification but an integral, and necessary, part of the position edit descriptor.
Procedure for making MDM file from excel for HLM:
-Make sure ALL the characters in ALL the columns line up
Select a column, then right click and select Format Cells
Then click on 'Custom' and go to the 'Type' box and enter the number
of 0s you need to line everything up
-Remove all the tabs from the document and replace them with spaces.
Open the document in word and use find and replace
-To save the document as .dat
First save it as .txt
Then open it in Notepad and save it as .dat
To enter the data format (FORTRAN-Style)
The program wants to read the data file space by space, so you have to specify it perfectly so that it reads the whole set properly.
If something is off, even by a single space, then your descriptive stats will be wonky compared to if you check them in another program.
Enclose the code with brackets ()
Divide the entries with commas ,
-Need ID column for all levels
ID column needs to be sorted so that it is in order from smallest to
largest
Use A# with # being the number of characters in the ID
Use an X1 to
move from the ID to the next column
-Need to say how many characters are needed in each column
Use F
After F is the number of characters needed for that column -Use F# (#= number)
There need to be enough character spaces to provide one 'gap' space
between each column
There need to be enough to character spaces to allow for the decimal
As part of the F you need to specify the number of decimal places
You do this by adding a decimal point after the F number and then a
number to represent the spaces you need -F#.#
You can use a number in front of the F so as to 'repeat' it. Not
necessary though. -#F#.#
All in all, it should look something like this:
(A4,X1,F4.0,F5.1)
Helpful links:
https://books.google.de/books?id=VdmVtz6Wtc0C&pg=PA78&lpg=PA78&dq=data+format+fortran+style+hlm&source=bl&ots=kURJ6USN5e&sig=fdtsmTGSKFxn04wkxvRc2Vw1l5Q&hl=en&sa=X&ved=0ahUKEwi_yPurjYrYAhWIJuwKHa0uCuAQ6AEIPzAC#v=onepage&q&f=false
http://www.ssicentral.com/hlm/help6/error/Problems_creating_MDM_files.pdf
http://www.ssicentral.com/hlm/help7/faq/FAQ_Format_specifications_for_ASCII_data.pdf
The program can run, I am not sure how to use open() and save the data in another external file name output.txt. My questions are stated below - please have a look and help.
program start
implicit none
integer ::n
real(kind=8)::x,h,k
real(kind=8),external:: taylorq
x=1.0
n=20
h=exp(x)
k=taylorq(x,n)
open(10,'output.txt') ----------- *question1:(when should i put this open file?)*
write(*,*)"The exact value=",h
write(*,*)"The approximate value=",k
write(*,*)"The error=",h-k
end program start
function taylorq(x,n)
implicit none
integer::n,i
real(kind=8):: x,taylor,taylor2,taylorq,h
h=exp(x)
taylor=1.
taylor2=taylor
write(*,*)"i exact appro error"-----------question2:(actually I want to draw a table with subtitle i, exact, appro, error in each column, is there a nice way to arrange them like eg.we can use %5s)
do i=1,n
taylor=taylor*x/i
taylor2=taylor2+taylor
write(10,*)i,h,taylor2,taylor2-h --------question3:*(I want to save the data written here into file output.txt)*
end do
close(10)
taylorq=taylor2
end function taylorq
1. where to open
You should put open(10,...) so it executes before any write(10,...) -- or read(10,...) if this was input.
Since your writes occur in the function taylorq, you should open() before the statement that calls taylorq.
For programs that do very large computations, which Fortran is suited/famous for, it is often best to do
all file open's very near the beginning of the program, so that if there is a problem opening any file,
it is caught and fixed without wasting hours or days of work. But your program is much simpler than that.
2. formatting
Yes, Fortran can do formatted output -- and also formatted input. Instead of a text string with
interpolated specifiers (like C and the C part of C++, and Java, and awk and perl and shell) it uses specifiers
with optionally interpolated text values, and the specifiers are written with the format letter on
the left followed by the width (almost always) and other parameters (sometimes).
You can either put the format directly in the WRITE (or READ) statement, or in a separate FORMAT
statement referred to by its label in the I/O statement.
write (10, '(I4,F10.2,F10.2,F10.2)' ) i,h,taylor2,taylor2-h
or
write (10, 900) i,h,taylor2,taylor2-h
! this next line can be anywhere in the same program-unit
900 format (I4,F10.2,F10.2,F10.2)
Unlike C-family languages, Fortran will always output the specified width; if the value doesn't fit,
it prints asterisks ***** instead of forcing the field wider (and thus misaligned) (or truncating as
COBOL does!). Your series grows fast enough you might want to use scientific notation like E10.3.
(The format letters can be in either case, but I find them easier to read in upper. YMMV.)
There are many, MANY, more options. Any textbook or your compiler manual should cover this.
I am writing some simple output in fortran, but I want whitespace delimiters. If use the following statement, however:
format(A20,ES18.8,A12,ES18.8)
I get output like this:
p001t0000 3.49141273E+01obsgp_oden 1.00000000E+00
I would prefer this:
p001t0000 3.49141273E+01 obsgp_oden 1.00000000E+00
I tried using negative values for width (like in Python) but no dice. So, is there a way to left-justify the numbers?
Many thanks in advance!
There's not a particularly beautiful way. However, using an internal WRITE statement to convert the number to a text string (formerly done with an ENCODE statement), and then manipulating the text may do what you need.
Quoting http://rsusu1.rnd.runnet.ru/develop/fortran/prof77/node168.html
An internal file WRITE is typically
used to convert a numerical value to a
character string by using a suitable
format specification, for example:
CHARACTER*8 CVAL
RVALUE = 98.6
WRITE(CVAL, '(SP, F7.2)') RVALUE
The WRITE statement will fill the
character variable CVAL with the
characters ' +98.60 ' (note that there
is one blank at each end of the
number, the first because the number
is right-justified in the field of 7
characters, the second because the
record is padded out to the declared
length of 8 characters).
Once a number has been turned into a
character-string it can be processed
further in the various ways described
in section 7. This makes it possible,
for example, to write numbers
left-justified in a field, ...
This is easier with Fortran 95, but still not trivial. Write the number or other item to a string with a write statement (as in the first answer). Then use the Fortran 95 intrinsic "ADJUSTL" to left adjust the non-blank characters of the string.
And really un-elegant is my method (I program like a cave woman), after writing the simple Fortran write format (which is not LJ), I use a combination of Excel (csv) and ultraedit to remove the spaces effectively getting the desired LJ followed directly by commas (which I need for my specific import format to another software). BF
If what you really want is whitespace between output fields rather than left-justified numbers to leave whitespace you could simply use the X edit descriptor. For example
format(A20,4X,ES18.8,4X,A12,4X,ES18.8)
will insert 4 spaces between each field and the next. Note that the standard requires 1X for one space, some of the current compilers accept the non-standard X too.
!for left-justified float with 1 decimal.. the number to the right of the decimal is how many decimals are required. Write rounds to the desired decimal space automatically, rather than truncating.
write(*, ['(f0.1)']) RValue !or
write(*, '(f0.1)') RValue
!for left-justified integers..
write(*, ['(i0)']) intValue !or
write(*, '(i0)') RValue
*after feedback from Vladimir, retesting proved the command works with or without the array brackets
I am trying to parse a table in the form of a text file using ifstream, and evaluating/manipulating each entry. However, I'm having trouble figuring out how to approach this because of omissions of particular items. Consider the following table:
NEW VER ID NAME
1 2a 4 "ITEM ONE" (2001)
1 7 "2 ITEM" (2002) {OCT}
1.1 10 "SOME ITEM 3" (2003)
1 12 "DIFFERENT ITEM 4" (2004)
1 a4 16 "ITEM5" (2005) {DEC}
As you can see, sometimes the "NEW" column has nothing in it. What I want to do is take note of the ID, the name, the year (in brackets), and note whether there are braces or not afterwards.
When I started doing this, I looked for a "split" function, but I realized that it would be a bit more complicated because of the aforementioned missing items and the titles becoming separated.
The one thing I can think of is reading each line word by word, keeping track of the latest number I saw. Once I hit a quotation mark, make note that the latest number I saw was an ID (if I used something like a split, the array position right before the quotation mark), then keep record of everything until the next quote (the title), then finally, start looking for brackets and braces for the other information. However, this seems really primitive and I'm looking for a better way to do this.
I'm doing this to sharpen my C++ skills and work with larger, existing datasets, so I'd like to use C++ if possible, but if another language (I'm looking at Perl or Python) makes this trivially easy, I could just learn how to interface a different language with C++. What I'm trying to do now is just sifting data anyways which will eventually become objects in C++, so I still have chances to improve my C++ skills.
EDIT: I also realize that this is possible to complete using only regex, but I'd like to try using different methods of file/string manipulation if possible.
If the column offsets are truly fixed (no tabs, just true space chars a la 0x20) I would read it a line at a time (string::getline) and break it down using the fixed offsets into a set of four strings (string::substr).
Then postprocess each 4-tuple of strings as required.
I would not hard-code the offsets, store them in a separate input file that describes the format of the input - like a table description in SQL Server or other DB.
Something like this:
Read the first line, find "ID", and store the index.
Read each data line using std::getline().
Create a substring from a data line, starting at the index you found "ID" in the header line. Use this to initialize a std::istringstream with.
Read the ID using iss >> an_int.
Search the first ". Search the second ". Search the ( and remember its index. Search the ) and remember that index, too. Create a substring from the characters in between those indexes and use it to initialize another std::istringstream with. Read the number from this stream.
Search for the braces.