So, I created a binary file with fortran, using something similar to this:
open (3,file=filename,form="unformatted",access="sequential")
write(3) matrix(i,:)
The way I understand it, fortran pads the file with 4 bytes on either end of the file, and the rest is just the data that I want (in this case, a list of 1000 doubles).
I want to read this in with gnuplot, however, I don't know how to get gnuplot to skip the first and last 4 bytes, and read the rest in as doubles. The documentation isn't very helpful in this regard.
Thanks
Andrew: I see no reason to make gnuplot handle those extra bytes before/after your data. Either Fortran does not do this padding, or it does and gnuplot handles it without a hassle.
I've had a similar problem, and Google searches always brought me back here. I figured I'd better post my solution in case the same happens to other people.
I've been trying to make a 2D colormap plot using gnuplot's "plot 'file.dat' matrix with image" command. My ASCII output files were too big, so I wanted to use binary files instead. What I did was something like the following:
in fortran:
implicit none
real, dimension(128,128) :: array
integer :: irec
! ... initialize array ...
inquire( iolength=irec ) array
open( 36, 'out.dat', form='unformatted', access='direct', recl=irec )
write( 36, rec=1 ) array
close( 36, status='keep' )
in gnuplot:
plot 'out.dat' binary array=128x128 format="%float" with image
Notes:
By default, gnuplot assumes single precision in binary files. If your
fortran program outputs in double precision, simply change "%float"
to "%double".
My program used double precision data in the array, but output files
were too big. Since images based on double or single precision are
indistinguishable to the eye, and double-precision data files are
large, I converted my double-precision data to single-precision data
before writing it to a file.
You may have to adapt the gnuplot command depending on what
you want to do with the matrix, but this loads it in and plots it
well. This did what I needed it to do, and I hope it helps anyone
else who has a similar problem.
As you can see, if Fortran adds extra bytes before/after your data,
gnuplot seems to read in the data without making you take those extra
bytes into account.
It might be easier to use direct I/O instead of sequential:
inquire (iolength = irec) matrix(1,:) !total record length for a row
open (3, file=filename, form="unformatted", access="direct", recl=irec)
write(3, rec=1) matrix(i,:)
The inquire statement gives you the length of the output list in 'recl' units. As such, the whole list fits in one record of length irec.
For writing a matrix to file column-wise you can then do:
inquire (iolength = irec) matrix(:,1)
open (3, file=filename, form="unformatted", access="direct", recl=irec)
do i=1,ncol
write(3, rec=i) matrix(:,i)
end do
or row-wise:
inquire (iolength = irec) matrix(1,:)
open (3, file=filename, form="unformatted", access="direct", recl=irec)
do i=1,nrow
write(3, rec=i) matrix(i,:)
end do
or element-wise:
inquire (iolength = irec) matrix(1,1)
open (3, file=filename, form="unformatted", access="direct", recl=irec)
do j=1,ncol
do i=1,nrow
write(3, rec=j+(ncol-1)*i) matrix(i,j)
end do
end do
or dump the entire matrix:
inquire (iolength = irec) matrix
open (3, file=filename, form="unformatted", access="direct", recl=irec)
write(3, rec=1) matrix
Testing with gnuplot 5.0, the following fortran unformatted data write of a double array x of size N,
open(FID,file='binaryfile')
do k = 1, N
write(FID) x(k)
end do
close(FID)
can be understood by gnuplot with the following:
plot 'binaryfile' binary format="%*1int%double%*1int"
The %*1int means, skip once a four byte integer, effectively skipping the header and footer data fortran wraps around output.
For more information and extrapolation for more complicated data, see the gnuplot 5.0 docs on binary and see the size of the formats with, show datafile binary datasizes. Note however that multi-column data (i.e. N doubles per write) can be accessed with the same format as above but as %Ndoubles where N is an integer. Then with using 1:3 for example, one would plot the first column against the 3rd.
Related
I'm attempting to write Fortran code that reads a raw binary file [like a .png for example] in chunks of 64 bytes, performs some operations on it and then maybe writes it to another file. The code I have written so far is as follows, written with the help of this SO answer:
integer(kind = int8), dimension(0:63) :: file_read_buffer
integer :: input_file_handle
integer :: io_error
open(newunit=input_file_handle, file=input_file_name, status="old", access="stream", action="read", iostat=io_error)
if(io_error /= 0) then
! file didn't open properly
end if
do
read(unit = input_file_handle, iostat = io_error) file_read_buffer
select case(io_error)
case(0)
! consume buffer normally
case(iostat_end)
! do EOF processing
case default
! error!
end select
end do
If EOF is reached before the array is completely filled, is there any way to know how much of it was filled before EOF was reached? Also, if EOF is raised once, will further calls to read() also return EOF?
I'm using gfortran at the moment if that helps.
You cannot determine how much was read by an input statement if an end-of-file condition is caused by that input statement.
However, as your intention is to use that input size to process just the part of the buffer that was filled, then that's not a problem: you cannot do that. That is, when an end-of-file condition is triggered your entire buffer becomes undefined.
Instead, you should throw away the entire buffer and reprocess the input. You have two options:
if you reach the end of file, reposition before it and read less data
determine how much data remains, and read less if less is available (avoiding the end-of-file condition)
For the first, if you keep track of your "successful" position, you can reposition on failure:
! Read full-sized chunks
do
read(unit, iostat=iostat) buffer
if (iostat==iostat_end) exit
inquire (unit, pos=file_pos)
end do
! reposition
read (unit, pos=pos)
! Read smaller chunks
do
read (unit, iostat=iostat) buffer(1)
if (iostat==isotat_end) exit
end
(Processing goes in the obvious places.) This is similar to the idea present in this other answer for a related problem.
For the second, using the file position and its size we can see whether there are sufficient "file storage units" to fill our buffer:
inquire (unit, size=file_size)
do
inquire (unit, pos=file_pos)
remaining = (file_size-file_pos+1)*FILE_STORAGE_SIZE/STORAGE_SIZE(buffer)
if (remaining<SIZE(buffer)) exit
read (unit) buffer
end do
read (unit) buffer(:remaining)
FILE_STORAGE_SIZE tells us how many bits make up a file storage unit, and STORAGE_SIZE how many bits to store (in memory) an element of the array.
The second option is possibly fine, but isn't really safe in general: we can't be sure that an element of storage size 16-bits corresponds to 16 bits of file storage units. This may, though, be good enough for your purposes. And you can always create a test file to see how many file storage units your buffer occupies.
I've searched for a long time before I ask: I need to output a lot of unformatted files in Fortran to Ensight. I want to name them with geo.000000, geo.000001 ... geo.0001000. Here is how I deal with wild card:
character(54) :: filename, temp
character(80) :: buffer
write(temp,'(i6.6)') step
filename = '/Users/jiecheng/Documents/SolidResults/solid.geo'//trim(temp)
open(10,file=filename,form='UNFORMATTED')
open(10,file=filename,form='UNFORMATTED')
buffer = 'Fortran Binary'
write(10) buffer
buffer = 'Ensight Model Geometry File'
write(10) buffer
write(10,'(i10)') nn
write(10,'(i10)') node_id
do i=1,3
write(10,'(E12.5)') sngl(coords1(i,:))
end do
Then I have
Fortran runtime error: Format present for UNFORMATTED data transfer
Could anybody tell me how to solve this?
For a unit connected to a file for unformatted I/O it is illegal to specify a format as you do in
write(10,'(i10)') nn
The write of the value to the unformatted file is done in machine memory (binary) representation (some conversion may happen) and not as a human readable text. Therefore, the format specification does not have any sense.
In my parallel program, there was a big matrix. Each process computed and stored a part of it. Then the program wrote the matrix to a file by letting each process wrote its own part of the matrix in the correct order. The output file is in "unformatted" form. But when I tried to read the file in a serial code (I have the correct size of the big matrix allocated), I got an error which I don't understand.
My question is: in an MPI program, how do you get a binary file as the serial version output for a big matrix which is stored by different processes?
Here is my attempt:
if(ThisProcs == RootProcs) then
open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted')
write(file_restart%unit)psi
close(file_restart%unit)
endif
#ifdef USEMPI
call mpi_barrier(mpi_comm_world,MPIerr)
#endif
do i = 1, NProcs - 1
if(ThisProcs == i) then
open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted', status = 'old', position = 'append')
write(file_restart%unit)psi
close(file_restart%unit)
endif
#ifdef USEMPI
call mpi_barrier(mpi_comm_world,MPIerr)
#endif
enddo
Psi is the big matrix, it is allocated as:
Psi(N_lattice_points, NPsiStart:NPsiEnd)
But when I tried to load the file in a serial code:
open(2,file=File1,form="unformatted")
read(2)psi
forrtl: severe (67): input statement requires too much data, unit 2 (I am using MSVS 2012+intel fortran 2013)
How can I fix the parallel part to make the binary file readable for the serial code? Of course one can combine them into one big matrix in the MPI program, but is there an easier way?
Edit 1
The two answers are really nice. I'll use access = "stream" to solve my problem. And I just figured I can use inquire to check whether the file is "sequential" or "stream".
This isn't a problem specific to MPI, but would also happen in a serial program which took the same approach of writing out chunks piecemeal.
Ignore the opening and closing for each process and look at the overall connection and transfer statements. Your connection is an unformatted file using sequential access. It's unformatted because you explicitly asked for that, and sequential because you didn't ask for anything else.
Sequential file access is based on records. Each of your write statements transfers out a record consisting of a chunk of the matrix. Conversely, your input statement attempts to read from a single record.
Your problem is that while you try to read the entire matrix from the first record of the file that record doesn't contain the whole matrix. It doesn't contain anything like the correct amount of data. End result: "input statement requires too much data".
So, you need to either read in the data based on the same record structure, or move away from record files.
The latter is simple, use stream access
open(unit = file_restart%unit, file = file_restart%file, &
form = 'unformatted', access='stream')
Alternatively, read with a similar loop structure:
do i=1, NPROCS
! read statement with a slice
end do
This of course requires understanding the correct slicing.
Alternatively, one can consider using MPI-IO for output, which is very similar to using stream output. Read this back in with stream access. You can find about this concept elsewhere on SO.
Fortran unformatted sequential writes in record files are not quite completely raw data. Each write will have data before and after the record in a processor dependent form. The size of your reads cannot exceed the record size of your writes. This means if psi is written in two writes, you will need to read it back in two reads, you cannot read it in at once.
Perhaps the most straightforward option is to instead use stream access instead of sequential. A stream file is indexed by bytes (generally) and does not contain record start and end information. Using this access method you can split the write but read all at once. Stream access is a feature of Fortran 2003.
If you stick with sequential access, you'll need to know how many MPI ranks wrote the file and loop over properly sized records to read the data as it was written. You could make the user specify the number of ranks or store that as the first record in the file and read that first to determine how to read the rest of the data.
If you are writing MPI, why not MPI-IO? Each process will call MPI_File_set_view to set a subarray view of the file, then each process can collectively write the data with MPI_FILE_WRITE_ALL . This approach is likely to scale really well on big machines (though your approach will be fine up to oh, maybe 100 processors.)
I have to export matrices from Matlab to OpenCV. I use the yaml format and then read the file in OpenCV with cv::FileStorage modelFile, and store the data in cv::Mat variables. For normal 2D Matrices, it works fine. But, for one of my big 4D Matrix, I get errors that the string is too long. The Matrix has the size of 16|16|70409|8.
Does someone know a good way to store it? Maybe it is only a format problem.
The code is:
for i = 1:matrixSize(1)
for j=1:matrixSize(2)
fprintf( file, ' - [');
for a = 1:matrixSize(3)
for b = 1:matrixSize(4)
fprintf( file, '%.6f', A(i,j,a,b));
if(a ~= matrixSize(3))
fprintf( file, ',');
end
end
end
fprintf( file, ']\n');
end
end
Thanks
my solution is to use instead of yaml, save the model in binary format and then read it with the normal fread functions.
Of course you have to know the size of each matrices.
fileID = fopen(BinModel,'w');
fwrite(fileID,[size(model.nSegs),0,0],'uint32'); % size of the matrix
fwrite(fileID,model.nSegs,'uint8'); % matrix data
The file shrinks from 1.4 GB to 200 MB.
Saludo
I have to write numerical data to binary files. Since some of the data vectors I deal with can be several gigs in size, I have learned not to use C++ iostreams. Instead I want to use C File*. I'm running into a problem right off the bat where I need to write some meta data to the front of the binary file. Since some of the meta data is not known at first I need to append the meta data as I get it to the appropriate offsets in the file.
for example lets say I have to enter a uint16_t representation for the year, month , and day, but first I need to skip the first entry(a uint32_t value for precision);
I don't know what i'm doing wrong but I can't seem to append the file with "ab".
Here's an example of what I wrote:
#include<cstdio>
uint16_t year = 2001;
uint16_t month = 8;
uint16_t day = 23;
uint16_t dateArray[]={year , month, day};
File * fileStream;
fileStream = fopen("/Users/mmmmmm/Desktop/test.bin" , "wb");
if(fileStream){
// skip the first 4 bytes
fseek ( fileStream , 4 , SEEK_SET );
fwrite(dateArray, sizeof(dateArray[0]) ,( sizeof(dateArray) / sizeof(dateArray[0]) ), filestream);
fclose(filestream);
}
// loops and other code to prepare and gather other parameters
// now append the front of the file with the precision.
uint32_t precision = 32;
File *fileStream2;
fileStream2 = fopen("/Users/mmmmmm/Desktop/test.bin" , "ab");
if(fileStream2){
// Get to the top of the file
rewind(fileStream2);
fwrite(&precision, sizeof(precision) , 1 , fileStream2);
fclose(fileStream2);
}
The appended data does not write. If I change it to "wb", then the file overwrites. I was able to get it to work with "r+b", but I don't understand why. I thought "ab" would be proper. Also , should I be using buffers or is this a sufficient approach?
Thanks for the advise
BTW this is on MacOSX
Due to the way that hard drives and filesystems work, inserting bytes in to the middle of a file is really slow and should be avoided, especially when dealing with multi-gigabyte files. If your metadata is stored in to a fixed-size header, just make sure that there's enough space for it before you start with the other data. If the header is variably sized, then chunk up the header. Put 1k of header space at the beginning, and have 8 bytes reserved to contain the value of the offset to the next header chunk, or 0 for EOF. Then when that chunk gets filled up, just add another chunk to the end of the file and write its offset to the previous header.
As for the technical IO, use the fopen() modes of r+b, w+b, or a+b, depending on your need. They all act the same with minor differences. r+b opens the file for reading and writing, starting at the first byte. It will error if the file doesn't exist. w+b will do the same, but create the file if it doesn't exist. a+b is the same as r+b, but it starts with the file pointer at the end of the file.
You can navigate the file with fseek() and rewind(). rewind() moves the file pointer back to the beginning of the file. fseek() moves the file pointer to a specified location. You can read more about it here.
"r+b" means you can read and write to any position in the file. In your second code block the rewind() call set the current position to byte 0 and the write is done at this position.
If you use "a+b", this also means read and write access, but the writes are all at the end of the file so you cannot position at byte 0, unless a new empty file is created.
To re-access the file at a specific byte, just use fseek().
fseek ( fileStream , 0 , SEEK_SET ); - positions to the precision value
fseek ( fileStream , 4 , SEEK_SET ); - positions to the year value
fseek ( fileStream , 8 , SEEK_SET ); - positions to the month value
fseek ( fileStream , 12 , SEEK_SET ); - positions to the day value
With such large files, it would be highly inefficient to rewrite gigs just to prepend a few bytes.
It would be much better to create a small companion file of the metadata you need for each file, and only add the metadata fields to the beginning of the files if they are to be rewritten anyway as part of an edit.
This is because prepending to a file is so expensive on most file systems.
NTFS has a second data channel for most files that goes unseen by almost all programs except for MS internals such as file managers and security scanning programs. You could easily cook up a program to add your metadata to that channel without needing to overwrite gigs on disk every single time.