Retrospectively closing a NetCDF file created with Fortran - fortran

I'm running a distributed model stripped to its bare minimum below:
integer, parameter :: &
nx = 1200,& ! Number of columns in grid
ny = 1200,& ! Number of rows in grid
nt = 6000 ! Number of timesteps
integer :: it ! Loop counter
real :: var1(nx,ny), var2(nx,ny), var3(nx,ny), etc(nx,ny)
! Create netcdf to write model output
call check( nf90_create(path="out.nc",cmode=nf90_clobber, ncid=nc_out_id) )
! Loop over time
do it = 1,nt
! Calculate a lot of variables
...
! Write some variables in out.nc at each timestep
CALL check( nf90_put_var(ncid=nc_out_id, varid=var1_varid, values=var1, &
start = (/ 1, 1, it /), count = (/ nx, ny, 1 /)) )
! Close the netcdf otherwise it is not readable:
if (it == nt) call check( nf90_close(nc_out_id) )
enddo
I'm in the development stage of the model so, it inevitably crashes at unexpected points (usually at the Calculate a lot of variables stage), which means that, if the model crashes at timestep it =3000, 2999 timesteps will be written to the netcdf output file, but I will not be able to read the file because the file has not been closed. Still, the data have been written: I currently have a 2GB out.nc file that I can't read. When I ncdump the file it shows
netcdf out.nc {
dimensions:
x = 1400 ;
y = 1200 ;
time = UNLIMITED ; // (0 currently)
variables:
float var1 (time, y, x) ;
data:
}
My questions are: (1) Is there a way to close the file retrospectively, even outside Fortran, to be able to read the data that have already been written? (2) Alternatively, is there another way to write the file in Fortran that would make the file readable even without closing it?

When nf90_close is called, buffered output is written to disk and the file ID is relinquished so it can be reused. The problem is most likely due to buffered output not having been written to the disk when the program terminates due to a crash, meaning that only the changes you made in "define mode" are present in the file (as shown by ncdump).
You therefore need to force the data to be written to the disk more often. There are three ways of doing this (as far as I am aware).
nf90_sync - which synchronises the buffered data to disk when called. This gives you the most control over when to output data (every loop step, or every n loop steps, for example), which can allow you to optimize for speed vs robustness, but introduces more programming and checking overhead for you.
Thanks to #RussF for this idea. Creating or opening the file using the nf90_share flag. This is the recommended approach if the netCDF file is intended to be used by multiple readers/writers simultaneously. It is essentially the same as an automatic implementation of nf90_sync for writing data. It gives less control, but also less programming overhead. Note that:
This only applies to netCDF-3 classic or 64-bit offset files.
Finally, an option I wouldn't recommend, but am including for completeness (and I guess there may be situations where this is the best option, although none spring to mind) - closing and reopening the file. I don't recommend this, because it will slow down your program, and adds greater possibility of causing errors.

Related

Strange behavior while calling properties from REFPROP FORTRAN files

I am trying to use REFPROPs HSFLSH subroutine to compute properties for steam.
When the same state property is calculated over multiple iterations
(fixed enthalpy and entropy (Enthalpy = 50000 J/mol & Entropy = 125 J/mol),
the time taken to compute using HSFLSH after every 4th/5th iteration increases to about 0.15 ms against negligible amount of time for other iterations. This is turning problematic because my program places call to this subroutine over several thousand times. Thus leading to abnormally huge program run times.
The program used to generate the above log is here:
C refprop check
program time_check
parameter(ncmax=20)
dimension x(ncmax)
real hkj,skj
character hrf*3, herr*255
character*255 hf(ncmax),hfmix
C
C SETUP FOR WATER
C
nc=1 !Number of components
hf(1)='water.fld' !Fluid name
hfmix='hmx.bnc' !Mixture file name
hrf='DEF' !Reference state (DEF means default)
call setup(nc,hf,hfmix,hrf,ierr,herr)
if (ierr.ne.0) write (*,*) herr
call INFO(1,wm,ttp,tnbp,tc,pc,dc,zc,acf,dip,rgas)
write(*,*) 'Mol weight ', wm
h = 50000.0
s = 125.0
c
C
DO I=1,NCMAX
x(I) = 0
END DO
C ******************************************************
C THIS IS THE ACTUAL CALL PLACE
C ******************************************************
do I=1,100
call cpu_time(tstrt)
CALL HSFLSH(h,s,x,T_TEMP,P_TEMP,RHO_TEMP,dl,dv,xliq,xvap,
& WET_TEMP,e,
& cv,cp,VS_TEMP,ierr,herr)
call cpu_time(tstop)
write(*,*),I,' time taken to run hsflsh routine= ',tstop - tstrt
end do
stop
end
(of course you will need the FORTRAN FILES, which unfortunately I cannot share since REFPROP isn't open source)
Can someone help me figure out why is this happening.?
P.S : The above code was compiled using gfortran -fdefault-real-8
UPDATE
I tried using system_clock to time my computations as suggested by #Ross below. The results are uniform across the loop (image below). I will have to find alternate ways to improve computation speed I guess (Sigh!)
I don't have a concrete answer, but this sort of behaviour looks like what I would expect if all calls really took around 3 ms, but your call to CPU_TIME doesn't register anything below around 15 ms. Do you see any output with time taken less than, say 10 ms? Of particular interest to me is the approximately even spacing between calls that return nonzero time - it's about even at 5.
CPU timing can be a tricky business. I recommended in a comment that you try system_clock, which can be higher precision than CPU_TIME. You said it doesn't work, but I'm unconvinced. Did you pass a long integer to system_clock? What was the count_rate for your system? Were all the times still either 15 or 0 ms?

GNURadio issues with timing

I am having trouble getting a custom block to operate at high frequency.
The block I would like to use is going to take in data from an external radio.
I am using an Ettus USRP block to stream data in from this radio, and I can display this on the QT Scope. I can set this block's sample rate to 15 MHz, and with the scope this seems to work ok.
Problem:
I have tried making a simple block with the gnuradio gr_modtool which takes in 2 floats as input and has 0 outputs. The block has private members "timer", a time_t, and "counter", an int. In the "work" function, my code simply does this at the moment:
const float *in_i = (const float *) input_items[0];
const float *in_q = (const float *) input_items[1];
if (count == 0){
if (*in_i > 0.5){
timer = clock();
count = 30000;
}
}else{
count --;
if(count == 0){
timer = clock()-timer;
printf("Count took %d clicks, or %f seconds\n",timer,(float)timer/CLOCKS_PER_SEC);
}
}
// Tell runtime system how many output items we produced.
return 0;
However, when I run this code, it takes longer than the expected time.
For 30000 cycles, it takes 0.872970 to complete, instead of the desired 0.002 seconds. Since the standard gnuradio block generated with gr_modtool is a sync block, and the input stream to the block is coming from the 15 MHz USRP, I would have expected this block to run at that same frequency. This is not currently the case.
Eventually my goal is to be able to store data streaming in over a period of time, and write it to file with certain formatting(A block already exists to do this, but there is some sort of bug that is preventing that block and the USRP block from working at the same time, so I am attempting to write my own.). However, unless I can keep up with the sample rate of 15 MHz, I will lose data. Since this block is fairly simple, I would have hoped it would be able to run quickly enough to keep up. However, the input stream block is able to pull data from the radio and output at 15 MHz, so I know my computer is capable of it.
How can I make this custom block operate more quickly, and keep up with the 15 MHz frequency?(Or, how can I make this sync block operate at the input stream frequency, since it currently does not)
Your block is not consuming any samples. I presume you're writing a sync_block (work function, not general_work), so your number of produced items is identical to the number of consumed items. But as your source code says:
// Tell runtime system how many output items we produced.
return 0;
In other words, your block tells GNU Radio that it didn't use any of the input GNU Radio offered, and produced no output. That means GNU Radio can't do nothing. You must return the number of items you've produced, and for sync blocks, that's the number of items you consumed – even if you're a sink, with zero output streams!

Slowdown when reading big file randomly with C++

I've run into some trouble when reading chunks of data at random locations all over a big file (>4GB).
The task is to save a 3D datacube to a file and transpose the axes while not loading the whole dataset into RAM.
The storage format is as follows:
I've got 3 Integer at the beginning of the File, storing the dimensions (nX, nY, nZ ).
After that follows the data as lines with length nX.
These Lines are repeated nY times which results in a page and the pages are repeated nZ times.
Meaning:
A line has nX bytes
A page has nX * nY bytes
The file has nX * nY * nZ + 12 bytes
To transpose the dataset i execute the following loop:
for( int i=0;i<nY;i++ )
{
for( int j=0;j<nZ;j++ )
{
read( pBuf, i*nX+j*nY*nX );//read nX bytes from offset i*nX+j*nX*nY
writeNext(pBuf);
}
}
When using fopen, _fseeki64 and fread it happens that after approx. 30% of the overall reads every 6th read or so takes up to 7 s, since there are multiple millions of those reads i can't accept these delays.
Thus i implemented the same algorithm with memory mapped files (CreateFile, CreateFileMapping and MapViewOfFile), but now every 6th read takes about 2 s.
Is there a method/chance of increasing the readout speed?
EDIT1:
I've added some code at http://pastebin.com/MejiTKj0
EDIT2:
Some may notice an inconsistency regarding the offset in the read function. To simplify matters i didn't tell about all variables saved in the file header thus the offset of 15 bytes is okay
If you have a HDD disk on which the files are stored, you should know that seek times dominate heavily when trying to perform random access. You may find you're better off reading the entire file sequentially into memory (a relatively quick operation compared to seek) and then performing your processing on the memory data instead. You may find this is quicker even if you need only a relatively small percentage of the overall file data.
In your loop Z / nZ should be outer most loop and Y should be inner loop. That would save seek times, if the storage memory layout has stored a nZ pages one by one .
In the current code displayed it shows nZ in inner loop, which is no good. The current arrangement of loops is analogous to book reading, with reading first line for each page of the book, then reading second line and so on;
Thank you all very much for your input.
Actually the first thing i should have checked was at fault, being the HDD, which wasn't able to provide the needed datarate.
I'm now thinking about switching to a SSD - Device.

MPI write to file sequentially

I am writing a parallel VTK file (pvti) from my fortran CFD solver. The file is really just a list of all the individual files for each piece of the data. Running MPI, if I have each process write the name of its individual file to standard output
print *, name
then I get a nice list of each file, ie
block0.vti
block1.vti
block2.vti
This is exactly the sort of list I want. But if I write to a file
write(9,*) name
then I only get one output in the file. Is there a simple way to replicate the standard output version of this without transferring data?
You could try adapting the following which uses MPI-IO, which is really the only way to ensure ordered files from multiple processes. It does assume an end of line character and that all the lines are the same length (padded with blanks if required) but I think that's about it.
Program ascii_mpiio
! simple example to show MPI-IO "emulating" Fortran
! formatted direct access files. Note can not use the latter
! in parallel with multiple processes writing to one file
! is behaviour is not defined (and DOES go wrong on certain
! machines)
Use mpi
Implicit None
! All the "lines" in the file will be this length
Integer, Parameter :: max_line_length = 30
! We also need to explicitly write a carriage return.
! here I am assuming ASCII
Character, Parameter :: lf = Achar( 10 )
! Buffer to hold a line
Character( Len = max_line_length + 1 ) :: line
Integer :: me, nproc
Integer :: fh
Integer :: record
Integer :: error
Integer :: i
! Initialise MPI
Call mpi_init( error )
Call mpi_comm_rank( mpi_comm_world, me , error )
Call mpi_comm_size( mpi_comm_world, nproc, error )
! Create a MPI derived type that will contain a line of the final
! output just before we write it using MPI-IO. Note this also
! includes the carriage return at the end of the line.
Call mpi_type_contiguous( max_line_length + 1, mpi_character, record, error )
Call mpi_type_commit( record, error )
! Open the file. prob want to change the path and name
Call mpi_file_open( mpi_comm_world, '/home/ian/test/mpiio/stuff.dat', &
mpi_mode_wronly + mpi_mode_create, &
mpi_info_null, fh, error )
! Set the view for the file. Note the etype and ftype are both RECORD,
! the derived type used to represent a whole line, and the displacement
! is zero. Thus
! a) Each process can "see" all of the file
! b) The unit of displacement in subsequent calls is a line.
! Thus if we have a displacement of zero we write to the first line,
! 1 means we write to the second line, and in general i means
! we write to the (i+1)th line
Call mpi_file_set_view( fh, 0_mpi_offset_kind, record, record, &
'native', mpi_info_null, error )
! Make each process write to a different part of the file
Do i = me, 50, nproc
! Use an internal write to transfer the data into the
! character buffer
Write( line, '( "This is line ", i0, " from ", i0 )' ) i, me
!Remember the line feed at the end of the line
line( Len( line ):Len( line ) ) = lf
! Write with a displacement of i, and thus to line i+1
! in the file
Call mpi_file_write_at( fh, Int( i, mpi_offset_kind ), &
line, 1, record, mpi_status_ignore, error )
End Do
! Close the file
Call mpi_file_close( fh, error )
! Tidy up
Call mpi_type_free( record, error )
Call mpi_finalize( error )
End Program ascii_mpii
Also please note you're just getting lucky with your standard output "solution", you're not guaranteed to get it all nice sorted.
Apart from having the writes from different ranks well mixed, your problem is that the Fortran OPEN statement probably truncates the file to zero length, thus obliterating the previous content instead of appending to it. I'm with Vladimir F on this and would write this file only in rank 0. There are several possible cases, some of which are listed here:
each rank writes a separate VTK file and the order follows the ranks or the actual order is not significant. In that case you could simply use a DO loop in rank 0 from 0 to #ranks-1 to generate the whole list.
each rank writes a separate VTK file, but the order does not follow the ranks, e.g. rank 0 writes block3.vti, rank 1 writes block12.vti, etc. In that case you can use MPI_GATHER to collect the block number from each process into an array at rank 0 and then loop over the elements of the array.
some ranks write a VTK file, some don't, and the block order does not follow the ranks. It's similar to the previous case - just have the ranks that do not write a block send a negative block number and then rank 0 would skip the negative array elements.
block numbering follows ranks order but not all ranks write a block. In that case you can use MPI_GATHER to collect one LOGICAL value from each rank that indicates if it has written a block or not.
If you are not in a hurry, you can force the output from different tasks to be in order:
! Loop over processes in order
DO n = 0,numProcesses-1
! Write to file if it is my turn
IF(nproc == n)THEN
! Write output here
ENDIF
! This call ensures that all processes wait for each other
#ifdef MPI
CALL MPI_Barrier(mpi_comm_world,ierr)
#endif
ENDDO
This solution is simple, but not efficient for very large output. This does not seem to be your case. Make sure you flush the output buffer after each write. If using this method, make sure to do tests before implementing, as success is not guaranteed on all architectures. This method works for me for outputting large NetCDF files without the need to pass the data around.

Retrieve data from file written in FORTRAN during program run

I am trying to write a series of values for time (real values) into a dat file in FORTRAN. This is a part of an MPI code and the code runs for a long time. So I would like to extract data at every time step and print it into a file and read the file any time during the execution of the program. Currently, the problem I am facing is, the values of time are not written into the file until the program ends. I have put the open statement before the do loop and the close statement after the end of do loop.
The parts of my code look like:
open(unit=57,file='inst.dat')
do loop starts
.
.
.
write(57,*) time
.
.
.
end do
close(57)
try call flush(unit). Check your compiler docs as this is i think an extension.
You mention MPI: For parallel codes I think you need to give each thread its own file/unit,
or take other measures to avoid conflicts.
From Gfortran manual:
Beginning with the Fortran 2003 standard, there is a FLUSH statement that should be preferred over the FLUSH intrinsic.
The FLUSH intrinsic and the Fortran 2003 FLUSH statement have identical effect: they flush the runtime library's I/O buffer so that the data becomes visible to other processes. This does not guarantee that the data is committed to disk.
On POSIX systems, you can request that all data is transferred to the storage device by calling the fsync function, with the POSIX file descriptor of the I/O unit as argument (retrieved with GNU intrinsic FNUM). The following example shows how:
! Declare the interface for POSIX fsync function
interface
function fsync (fd) bind(c,name="fsync")
use iso_c_binding, only: c_int
integer(c_int), value :: fd
integer(c_int) :: fsync
end function fsync
end interface
! Variable declaration
integer :: ret
! Opening unit 10
open (10,file="foo")
! ...
! Perform I/O on unit 10
! ...
! Flush and sync
flush(10)
ret = fsync(fnum(10))
! Handle possible error
if (ret /= 0) stop "Error calling FSYNC"
How about closing the file after every time step (assuming a reasonable amount of time elapses between time steps)?
do loop starts
.
.
!Note: an if statement should wrap the following so that it is
!only called by one processor.
open(unit=57,file='inst.dat')
write(57,*) time
close(57)
.
.
end do
Alternatively if the time between time steps is short, writing the data after blocks of 10, 100, ... iterations may be more efficient.