I am working on some old code. More precisely what that code does. It is written in fortran95.
The bit I am interested in is this:
1 read(09,'(52x,i6,i4,12i2)',end=2)hj,fix(1),(fix(k),k=4,15)
some data wrangling
goto 1
2 continue
What I think this does is that 1 line of the input is read and assigns these values to these temporary objects, the scalar hj, and the array fix(1) and fix(4:15). It then does some data wrangling which are not important and then goes back via this goto statement, reads the next line, until it hits end of file (EOF), where it is pointed towards segment 2.
My question is twofold:
Is this understanding correct?
Is there any utility in this from a performance standpoint?
Related
When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.
I want to load a map from a text file (If you can come up with whatever else way to load a map to an array, I'm open for anything new).
Whats written in the text file is something like this but a bit larger in the scale.
6 6 10 (Nevermind what this number "10" is but the two other are the map size.)
1 1 1 1 1 1
1 0 2 0 0 1
1 0 0 0 2 1
1 2 2 0 0 1
1 0 0 0 0 1
1 1 1 1 1 1
Where 1 is border, 0 is empty, 2 is wall.
Now i want to read this text file but I'm not sure what way would be best.
What i have in mind yet is:
Reading the whole text file at once in a stringstream and convert it to string later via rdbuf() and then split the string and put it in the array.
Reading it number by number via getline().
Reading it number by number using the >> operator.
My question is which of the mentioned (or any other way if available) ways is better by means of ram use and speed.
Note: weather or not using rdbuf() is a good way. I'd appreciate a lot a good comparison between different ways of splitting a string, for example splitting a text to words regarding to the whitespaces.
Where 1 is border, 0 is empty, 2 is wall. Now i want to read this text file but I'm not sure what way would be best. What i have in mind yet is:
You don't have enough data to make a significant impact on performance by any of the means you mentioned. In other words, concentrate on correctness and robustness of your program then come back and optimize the parts that are slow.
Reading the whole text file at once in a stringstream and convert it to string later via rdbuf() and then split the string and put it in the array.
The best method for inputting data is to keep the input stream flowing. This usually means reading large chunks of data per transaction versus many small transactions of small quantities. Memory is a lot faster to search and process than an input stream.
I suggest using istream::read before using rdbuf. For either one, I recommend reading into a preallocated area of memory, that is either an array or if using string, reserve a large space in the string when constructing it. You don't want the reallocation of std::string data to slow your program.
Reading it number by number via getline().
Since your data is line oriented this could be beneficial. You read one row and process the one row. Good technique to start with, however, a bit more complicate than the one below, but simpler than the previous method.
Reading it number by number using the >> operator.
IMO, this is the technique you should be using. The technique is simple and easy to get working; enabling you to work on the remainder of your project.
Changing the Data Format
If you want to make the input faster, you can change the format of the data. Binary data, data that doesn't need translations, is the fastest format to read. It bypasses the translation of textual format to internal representation. The binary data is the internal representation.
One of the caveats to binary data is that it is hard to read and modify.
Optimizing
Don't. Focus on finishing the project: correctly and robustly.
Don't. Usually, the time you gain is wasted in waiting for I/O or
the User. Development time is costly, unnecessary optimization is a waste of development time thus a waste of money.
Profile your executable. Optimize the parts that occupy the most
execution time.
Reduce requirements / Features before changing code.
Optimize the design or architecture before changing the code.
Change compiler optimization settings before changing the code.
Change data structures & alignment for cache optimization.
Optimize I/O if your program is I/O bound.
Reduce branches / jumps / changes in execution flow.
I'm needing to write a MIPS assembler in C/C++. Before I start just writing some code, I think I should actually take some time and do some planning first. There is about 15 MIPS instructions I need to account for, include J, but not JR. The program needs to take in a file that has .text,.data, and .word sections along with labels, then output a file with the first line being in decimal with the number of instructions and the number of words of data. The rest is the machine code encoded in hex. The final set of lines consists of hexadecimal values representing the initial values of the words in the data segment. I know I'll need to do 2 passes to first parse the labels and JUMP instruction. Basically I'm just looking for advice on how to setup the data structures. Should I do an array of strings that hold the OPCODE, the RS, RT, RD, etc... then convert that to hex somehow? Or is there a better way to do this from someone that has any advice/experience? Thanks for your help/suggestions!
I actually did this a long time ago for something related to a class project! You're right about having to do 2 passes. However, don't use an array of strings for the registers. In fact you don't need to use strings at all. You can put the OPCODE in an enum, and the registers in an enum. For 15 instructions, you can easily do most of the work by handcoding switch-case, if-else statements rather than designing a fully generalized solution. It might be tempting to use regular expressions, but for your problem it's not worth the effort (though you should definitely use any opportunity you get to learn regex if you have the time!). Then use hashmap-like structures to map between the registers and OPCODE and their HEX values, and use those. You can do any address calculations directly in code. This is just a suggestion, you should definitely experiment. My main point is that if you are reading a string, you shouldn't store it in the same form when you can process it first and store something (read: object) more meaningful.
Basically, you only need the first pass for the labels etc. You can do everything else in the second pass. If you look at the basic typical compiler/assembler flow chart in any O/S textbook, you can easily emulate each step - that's what I did.
Hope this helps!
I don't understand the format of unformatted files in Fortran.
For example:
open (3,file=filename,form="unformatted",access="sequential")
write(3) matrix(i,:)
outputs a column of a matrix into a file. I've discovered that it pads the file with 4 bytes on either end, however I don't really understand why, or how to control this behavior. Is there a way to remove the padding?
For unformated IO, Fortran compilers typically write the length of the record at the beginning and end of the record. Most but not all compilers use four bytes. This aids in reading records, e.g., length at the end assists with a backspace operation. You can suppress this with the new Stream IO mode of Fortran 2003, which was added for compatibility with other languages. Use access='stream' in your open statement.
I never used sequential access with unformatted output for this exact reason. However it depends on the application and sometimes it is convenient to have a record length indicator (especially for unstructured data). As suggested by steabert in Looking at binary output from fortran on gnuplot, you can avoid this by using keyword argument ACCESS = 'DIRECT', in which case you need to specify record length. This method is convenient for efficient storage of large multi-dimensional structured data (constant record length). Following example writes an unformatted file whose size equals the size of the array:
REAL(KIND=4),DIMENSION(10) :: a = 3.141
INTEGER :: reclen
INQUIRE(iolength=reclen)a
OPEN(UNIT=10,FILE='direct.out',FORM='UNFORMATTED',&
ACCESS='DIRECT',RECL=reclen)
WRITE(UNIT=10,REC=1)a
CLOSE(UNIT=10)
END
Note that this is not the ideal aproach in sense of portability. In an unformatted file written with direct access, there is no information about the size of each element. A readme text file that describes the data size does the job fine for me, and I prefer this method instead of padding in sequential mode.
Fortran IO is record based, not stream based. Every time you write something through write() you are not only writing the data, but also beginning and end markers for that record. Both record markers are the size of that record. This is the reason why writing a bunch of reals in a single write (one record: one begin marker, the bunch of reals, one end marker) has a different size with respect to writing each real in a separate write (multiple records, each of one begin marker, one real, and one end marker). This is extremely important if you are writing down large matrices, as you could balloon the occupation if improperly written.
Fortran Unformatted IO I am quite familiar with differing outputs using the Intel and Gnu compilers. Fortunately my vast experience dating back to 1970's IBM's allowed me to decode things. Gnu pads records with 4 byte integer counters giving the record length. Intel uses a 1 byte counter and a number of embedded coding values to signify a continuation record or the end of a count. One can still have very long record lengths even though only 1 byte is used.
I have software compiled by the Gnu compiler that I had to modify so it could read an unformatted file generated by either compiler, so it has to detect which format it finds. Reading an unformatted file generated by the Intel compiler (which follows the "old' IBM days) takes "forever" using Gnu's fgetc or opening the file in stream mode. Converting the file to what Gnu expects results in a factor of up to 100 times faster. It depends on your file size if you want to bother with detection and conversion or not. I reduced my program startup time (which opens a large unformatted file) from 5 minutes down to 10 seconds. I had to add in options to reconvert back again if the user wants to take the file back to an Intel compiled program. It's all a pain, but there you go.
I'm currently trying to write a .bmp file in C++ and for the most part it works, there is however, just one issue. When I start trying to save images with different widths and heights everything goes askew and I'm struggling to solve it, so is there any way to force something to write to a specific byte (adding padding in between it and the last thing written)?
There are several sort of obvious answers, such as keeping your data in memory in a buffer, then putting the desired value in as bufr[offset]=mydata;. I presume you want something a little fancier than that, because you are, for example, doing this in a streaming sort of application where you can't have the whole object in memory at the same time.
In that case, what you're looking for is the magic offered by fseek(3) and ftell(3) (see man pages). Seek positions the file as a specific offset; tell gets the file's current offset. If it's a constant offset of 18, the you simply finish up with the file, and do
fseek(fp, 18L, SEEK_CUR)
where fp is the file pointer, SEEK_CUR is a constant declared in stdio.h, and 18 is the number 18.
Update
By the way, this is based on the system call lseek(2). Something that confuses people (read "me", I never remember this until I have been searching) is there is no matching "ltell(2)" system call. Instead, to get the current file offset, you use
off_t offset;
offset = lseek(fp, 0L, SEEK_CUR);
because lseek returns the offset after its operation. The example code above gives us the offset after moving 0 bytes from the current offset, which is of course the current offset.
UPdate
aha, C++. You said C. For C++, there are member functions for seek and tell. See the fstream man page.
Count how many bytes have been written. Write zeroes until the count hits 18. Then resume writing your real data.
If you are on Windows, everything comes to writing predefined structures: "Bitmap storage".
Also there is an example that shows how they should be filled: "Storing an Image".
If you are writing not-just-for-windows code then you can mimic these structs and fallow the guide.