I wonder how Fortran's I/O is expected to behave in case of a NULL character ACHAR(0).
The actual task is to fill an ASCII file by blocks of precisely eight characters. The strings are read from a binary and may contain non-printing characters.
I tried with gfortran 4.8, 8.1 and f2c. If there is a NULL character in the string the format specifier FORMAT(A8) does not write eight characters.
Give the following F77 code a try:
c Print a string of eight character surrounded by dashes
100 FORMAT('-',A8,'-')
c Works fine if empty or any other combination of printing chars
write(*,100) ''
c In case of a short sting blanks are padded
write(*,100) '345678'
c A NULL character does something I did not expect
write(*,100) '123'//ACHAR(0)//'4567'
c Not even position editing helps
101 FORMAT('-',A8,T10,'x')
write(*,101) '123'//ACHAR(0)//'4567'
end
My output is:
- -
- 345678-
-1234567-
-1234567x
Is this expected behavior? Any idea how to get the output eight characters wide in any case?
When using an edit descriptor A8 the field width is eight. For output, eight characters will be written.
In the case of the example, it isn't the writing of the characters that is contrary to your expectations, but how they are displayed by your terminal.
You can examine the output further with tools like hexdump or you can write to an internal file and look at arbitrary substrings.
Yes, that is expected, if there is a null character, the printing of the string on the screen can stop there. The characters will still be sent, but the string does not have to be printed on the screen.
Note that C uses NULL to delimit strings and the OS may interpret the strings it receives with the same conventions. The allows the non-printable characters to be interpreted in processor specific ways by the processor and the processor includes the whole complex of the compiler, the executing environment (OS and programs in the OS) and the hardware.
Related
If I have a text file where lines contains some non-blank characters followed by spaces, how do I read those lines into a character variable without excess spaces?
character (len=1000) :: text
open (unit=20,file="foo.txt",action="read")
read (20,"(a)") text
will read the first 1000 characters of a line into variable text, which will be padded with spaces at the end if there are fewer than 1000 characters in the line. But if the line length is 100 you have 900 extraneous spaces, and the program does not "know" how long the line read actually was.
Fortran strings are blank-padded. There is simply no chance to distinguish any significant blank-padding in your strings with constant-length Fortran strings.
If every whitespace character is important, I suggest to treat the file as a stream-access file instead (formated or unformatted as needed), read individual characters to some array buffer and allocate a deferred-length string only after you know the length you actually need.
character (len=1000) :: text
integer :: s, ios
open (unit=20,file="foo.txt",action="read")
read (20,"(a)", size=s, advance='no', iostat=ios) text
After that last line, s contains the number of characters read, including trailing spaces, which I think is what you wanted.
Notes:
With a size tag, you must also have an advance tag set to 'no' otherwise you get a compilation error. Since the format is "(a)", the whole line is read so the next read statement will advance to the next line despite the 'no'. That's fine.
ios stores a negative integer when attempting to read past the end of the line. This will always happen if the line is shorter than length of text. That's fine.
When attempting to read past the end of the file, ios will store a different negative integer. What those two negative integers are is not set by the standard I think so you may have to experiment a bit. In my case, with the gfortran compiler, ios was -1 when attempting to read past the end of the file and -2 otherwise.
I am trying to write a JSON string parser in JFlex, so far I have
string = \"((\\(\"|\\|\/|b|f|n|r|t|u[0-9a-fA-F]{4})) | [^\"\\])*\"
which I thought captured the specs (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf).
I have tested it on the control characters and standard characters and symbols, but for some reason it does not accept £ or ( or ) or ¬. Please can someone let me know what is causing this behaviour?
Perhaps you are running in JLex compatability mode? If so, please see the following from the official JFlex User's Manual. It seems that it will use 7bit character codes for input by default, whereas what you want is 16bit (unicode).
You can fix this by adding the line %unicode after the first %%.
Input Character sets
%7bit
Causes the generated scanner to use an 7 bit input character set (character codes 0-127). If an input character with a code greater than 127 is encountered in an input at runtime, the scanner will throw an ArrayIndexOutofBoundsException. Not only because of this, you should consider using the %unicode directive. See also Encodings for information about character encodings. This is the default in JLex compatibility mode.
%full
%8bit
Both options cause the generated scanner to use an 8 bit input character set (character codes 0-255). If an input character with a code greater than 255 is encountered in an input at runtime, the scanner will throw an ArrayIndexOutofBoundsException. Note that even if your platform uses only one byte per character, the Unicode value of a character may still be greater than 255. If you are scanning text files, you should consider using the %unicode directive. See also section Econdings for more information about character encodings.
%unicode
%16bit
Both options cause the generated scanner to use the full Unicode input character set, including supplementary code points: 0-0x10FFFF. %unicode does not mean that the scanner will read two bytes at a time. What is read and what constitutes a character depends on the runtime platform. See also section Encodings for more information about character encodings. This is the default unless the JLex compatibility mode is used (command line option --jlex).
I am reading an ASCII text file. It is defined by the size of each field, in bytes. E.g. Each row consists of a 10 bytes for some string, 8 bytes for a floating point value, 5 bytes for an integer and so on.
My problem is reading the newline character, which has a variable size depending on the OS (usually 2 bytes for windows and 1 byte for linux I believe).
How can I get the size of the EOL character in C++?
For example, in python I can do:
len(os.linesep)
The time honored way to do this is to read a line.
Now, the last char should be \n. Strip it. Then, look at the previous character. It will either be \r or something else. If it's \r, strip it.
For Windows [ascii] text files, there aren't any other possibilities.
This works even if the file is mixed (e.g. some lines are \r\n and some are just \n).
You can tentatively do this on few lines, just to be sure you're not dealing with something weird.
After that, you now know what to expect for most of the file. But, the strip method is the general reliable way. On Windows, you could have a file imported from Unix (or vice versa).
I'm not sure that the translation occurs where you think it is. Look at the following code:
ostringstream buf;
buf<< std::endl;
string s = buf.str();
int i = strlen(s.c_str());
After this, running on Windows, i == 1. So the end of line definition in std is 1 character. As others have commented, this is the "\n" character.
In my c++ textbook, there is an "ASCII Table of Printable Characters."
I noticed a few odd things that I would appreciate some clarification on:
Why do the values start with 32? I tested out a simple program and it has the following lines of code: char ch = 1; std::cout << ch << "\n"; of code and nothing printed out. So I am kind of curious as to why the values start at 32.
I noticed the last value, 127, was "Delete." What is this for, and what does it do?
I thought char can store 256 values, why is there only 127? (Please let me know if I have this wrong.)
Thanks in advance!
The printable characters start at 32. Below 32 there are non-printable characters (or control characters), such as BELL, TAB, NEWLINE etc.
DEL is a non-printable character that is equivalent to delete.
char can indeed store 256 values, but its signed-ness is implementation defined. If you need to store values from 0 to 255 then you need to explicitly specify unsigned char. Similarly from -128 to 127, have to specify signed char.
EDIT
The so called extended ASCII characters with codes >127 are not part of the ASCII standard. Their representation depends on the so called "code page" chosen by the operating system. For example, MS-DOS used to use such extended ASCII characters for drawing directory trees, window borders etc. If you changed the code page, you could have also used to display non-English characters etc.
It's a mapping between integers and characters plus other "control" "characters" like space, line feed and carriage return interpreted by display devices (possibly virtual). As such it is arbitrary, but they are organized by binary values.
32 is a power of 2 and an alphabet starts there.
Delete is the signal from your keyboard delete key.
At the time the code was designed only 7 bits were standard. Not all bytes (parts words) were 8 bits.
I am writing some simple output in fortran, but I want whitespace delimiters. If use the following statement, however:
format(A20,ES18.8,A12,ES18.8)
I get output like this:
p001t0000 3.49141273E+01obsgp_oden 1.00000000E+00
I would prefer this:
p001t0000 3.49141273E+01 obsgp_oden 1.00000000E+00
I tried using negative values for width (like in Python) but no dice. So, is there a way to left-justify the numbers?
Many thanks in advance!
There's not a particularly beautiful way. However, using an internal WRITE statement to convert the number to a text string (formerly done with an ENCODE statement), and then manipulating the text may do what you need.
Quoting http://rsusu1.rnd.runnet.ru/develop/fortran/prof77/node168.html
An internal file WRITE is typically
used to convert a numerical value to a
character string by using a suitable
format specification, for example:
CHARACTER*8 CVAL
RVALUE = 98.6
WRITE(CVAL, '(SP, F7.2)') RVALUE
The WRITE statement will fill the
character variable CVAL with the
characters ' +98.60 ' (note that there
is one blank at each end of the
number, the first because the number
is right-justified in the field of 7
characters, the second because the
record is padded out to the declared
length of 8 characters).
Once a number has been turned into a
character-string it can be processed
further in the various ways described
in section 7. This makes it possible,
for example, to write numbers
left-justified in a field, ...
This is easier with Fortran 95, but still not trivial. Write the number or other item to a string with a write statement (as in the first answer). Then use the Fortran 95 intrinsic "ADJUSTL" to left adjust the non-blank characters of the string.
And really un-elegant is my method (I program like a cave woman), after writing the simple Fortran write format (which is not LJ), I use a combination of Excel (csv) and ultraedit to remove the spaces effectively getting the desired LJ followed directly by commas (which I need for my specific import format to another software). BF
If what you really want is whitespace between output fields rather than left-justified numbers to leave whitespace you could simply use the X edit descriptor. For example
format(A20,4X,ES18.8,4X,A12,4X,ES18.8)
will insert 4 spaces between each field and the next. Note that the standard requires 1X for one space, some of the current compilers accept the non-standard X too.
!for left-justified float with 1 decimal.. the number to the right of the decimal is how many decimals are required. Write rounds to the desired decimal space automatically, rather than truncating.
write(*, ['(f0.1)']) RValue !or
write(*, '(f0.1)') RValue
!for left-justified integers..
write(*, ['(i0)']) intValue !or
write(*, '(i0)') RValue
*after feedback from Vladimir, retesting proved the command works with or without the array brackets