Convert CSV to Gridded Binary

Convert CSV to Gridded Binary - fortran

I am trying to convert a CSV text file with three columns and 572 rows to a gridded binary file (.bin) using gfortran.
I have two Fortran programs that I have written to achieve this.
The issue is that my binary file size is ending up way too large (9.6GB) by the end, which is not correct.
I have a sneaking suspicion that my nx and ny values in ascii2grd.90 are not correct and that is leading to the bad .bin file being created. With such a small list (only 572 rows), I am expecting the final .bin to be more in KBs, not GBs.
temp.90
!PROGRAM TO CONVERT ASCII TO GRD
program gridded
real lon(572),lat(572),temp(572)
open(2,file='/home/weather/data/file')
open(3,file='/home/weather/out.dat')
do 20 i=1,572
read(2,*)lat(i),lon(i),temp(i)
write(3,*)temp(i)
20 continue
stop
end
ascii2grd.f90
!PROGRAM TO CONVERT ASCII TO GRD
program ascii2grd
parameter(nx=26,ny=22,np=1)
real u(nx,ny,np),temp1(nx,ny)
integer :: reclen
inquire(iolength=reclen)a
open(12,file='/home/weather/test.bin',&
form='unformatted',access='direct',recl=nx*ny*reclen)
open(11,file='/home/weather/out.dat')
do k=1,np
read(11,*)((u(j,i,k),j=1,nx),i=1,ny)
10 continue
enddo
rec=1
do kk=1,np
write(12,rec=irec)((u(j,i,kk),j=1,nx),i=1,ny)
write(*,*)'Processing...'
irec=irec+1
enddo
write(*,*)'Finished'
stop
end
Sample from out.dat
6.90000010
15.1999998
21.2999992
999.000000
6.50000000
10.1000004
999.000000
18.0000000
999.000000
20.1000004
15.6000004
8.30000019
9.89999962
999.000000
Sample from file
-69.93500 43.90028 6.9
-69.79722 44.32056 15.2
-69.71076 43.96401 21.3
-69.68333 44.53333 999.00000
-69.55380 45.46462 6.5
-69.53333 46.61667 10.1
-69.1 44.06667 999.00000
-68.81861 44.79722 18.0
-68.69194 45.64778 999.00000
-68.36667 44.45 20.1
-68.30722 47.28500 15.6
-68.05 46.68333 8.3
-68.01333 46.86722 9.9
-67.79194 46.12306 999.00000

I would suggest a general strategy like the following:
Read the CSV with python/pandas (it could be many other things, although using python will be nice for step 2, as you'll see). But the important thing is that many other languages are more convenient than fortran for reading a CSV, and that will allow you to check that that step 1 is working before moving on.
Output to binary with numpy's tofile(). Also note that numpy will default to 'c' order for arrays so you may need to specify 'f' (fortran) order.
I have a utility at github called dataset2binary that automates this and may be of interest to you, or you could refer to the code at this answer. That is probably overkill though, because you seem to just be reading one big array of the same datatype. Nevertheless, the code you'd want will be similar, just simpler.

Related

Equivalent of -ffree-line-length-none in mpiifort? [duplicate]

I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000

This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)

Fortran "write" column limit? [duplicate]

I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000

This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)

compressing HDF5 files in Mathematica

I am working with Mathematica 9 and exporting huge lists (a typical list will have dimensions of 182500,4,8,42). Each file has about 6 lists of this size (all integers, not sure if this makes a difference in lists, I know it do in other array types, anyway). Saving them is HDF5 format successfully, however, the size of the files is relatively large (1.5 GB).
Therefore, I am trying to compress the files with GZIP from within Mathematica, since they claim it is an option in the export function, which has a lot of bugs by the way.
Couldn't find any help the net after all the attempts following the documentation didn't pan out. I was wondering if one of our Mathematica enthusiasts can way in with some tips.

The compression happens automatically if the filename ends with ".gz"
So instead of
Export["file.h5", data]
Use
Export["file.h5.gz", data]
List of available formats and their extension

Fortran 90 how to write very long output lines of different length

I've spent hours scouring the internet for a solution to this problem and can't find anything. I have been trying to write unformatted output to a CSV output file with multiple very long lines of varying length and multiple data types. I'm trying to first write a long header that indicates the variables that will be written below, separated by commas. Then on the lines below that, I am writing the values specified in the header. However, with sequential access, the long output lines are broken into multiple shorter lines, which is not what I was hoping for. I tried controlling the line length using recl in the open statement, but that only added a bunch of garble text and symbol after the output with the same problem still occurring. I also tried using direct access but the lines are not the same length so that would not work either. I've read about using stream i/o in Fortran2003 but I'm using Fortran90, so that won't work either. I am using Fortran 90 with the Plato IDE which uses the FTN95 compiler. I included an example program similar to what I want to do below, using an array and some dummy text, and I've included the output below that illustrating the problem. Anyone know how I can just one line per write statement? Any help would be greatly appreciated.
module types
integer, parameter :: dp=selected_real_kind(15)
end module types
program blah
use types
use inputoutput
implicit none
integer :: i
character(50)::fileNm
integer :: unitout2=20
real(dp), dimension(100) :: bigArray
fileNm='predictout2.csv'
open(unit=unitout2,file=fileNm,status="replace")
do i=1,100
bigArray(i)=i
end do
write(unitout2,*)"word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,&
&word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word"
write(unitout2,*)bigArray
close(unitout2)
end program
Here's the output for the program above (without recl):
word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word
,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,word,wo
rd,word,word,word,word,word
1.00000000000 2.00000000000 3.00000000000 4.00000000000
5.00000000000 6.00000000000 7.00000000000 8.00000000000
9.00000000000 10.0000000000 11.0000000000 12.0000000000
13.0000000000 14.0000000000 15.0000000000 16.0000000000
17.0000000000 18.0000000000 19.0000000000 20.0000000000
21.0000000000 22.0000000000 23.0000000000 24.0000000000
25.0000000000 26.0000000000 27.0000000000 28.0000000000
29.0000000000 30.0000000000 31.0000000000 32.0000000000
33.0000000000 34.0000000000 35.0000000000 36.0000000000
37.0000000000 38.0000000000 39.0000000000 40.0000000000
41.0000000000 42.0000000000 43.0000000000 44.0000000000
45.0000000000 46.0000000000 47.0000000000 48.0000000000
49.0000000000 50.0000000000 51.0000000000 52.0000000000
53.0000000000 54.0000000000 55.0000000000 56.0000000000
57.0000000000 58.0000000000 59.0000000000 60.0000000000
61.0000000000 62.0000000000 63.0000000000 64.0000000000
65.0000000000 66.0000000000 67.0000000000 68.0000000000
69.0000000000 70.0000000000 71.0000000000 72.0000000000
73.0000000000 74.0000000000 75.0000000000 76.0000000000
77.0000000000 78.0000000000 79.0000000000 80.0000000000
81.0000000000 82.0000000000 83.0000000000 84.0000000000
85.0000000000 86.0000000000 87.0000000000 88.0000000000
89.0000000000 90.0000000000 91.0000000000 92.0000000000
93.0000000000 94.0000000000 95.0000000000 96.0000000000
97.0000000000 98.0000000000 99.0000000000 100.000000000

This isn't a problem with the ACCESS used for the file (stream, sequential or direct) - it is a consequence of the format specification that you are using.
Note that you are not doing unformatted output. Formatted versus unformatted is a question of whether the output is intended to be human readable.
The star in the second specifier of the WRITE statement is a specification of list directed formatting. This means that the format used for the output is based on the list of things to be output. Beyond that and a small set of rules in the language for list directed output, you are pretty much leaving the appearance of things up to the Fortran processor (the compiler).
With list directed formatted output the processor is specifically allowed to insert as many records as it sees fit between items. It does that here, quite reasonably, in order to make it easier for people to read the file.
If you want more control over the appearance of your output, then use an explicit format. For example, something like:
write(unitout2,"(9999(G12.5,:,','))") bigArray
might be more appropriate.
(Technically when a sequential file is opened there is a processor defined maximum record length (in the absence of a programmer specified maximum length) that should not be exceeded. Practically, given the way sequential formatted files are stored on disk by nearly all current Fortran compilers, that technicality doesn't cause any problems.)

Fortran runtime error: Bad real number in item 1 of list input

I am getting run time error: Bad real number in item 1 of list input for this sample problem. Please, suggest the correct way.
implicit double precision (a-h,o-x)
parameter (ni=150)
dimension x(ni)
open(40,file='fortin')
do 80 i=1,5
read(40,*)x(i)
write(*,*)i,x(i)
80 continue
stop
end
The data in the fortin file arranged in column
1.0
5.0
3.0
5.0
7.0

Your code expects only numbers and it appears you have characters in the file. You can do one of two things to fix this:
Delete the words at the top of the fortin file
Add a single read(*,*) (no need for anything following it) before the loop

In my case, the problem lies in the data file, not the code.
My problem turn out to be the file is in Unicode format. When I view in vi, it's shown fine. But when I view in a viewer that does not support unicode, such as using midnight commander, it look like a mess. The one who sent me the file later told me that he save the file in UTF-16.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert CSV to Gridded Binary - fortran

Related

Equivalent of -ffree-line-length-none in mpiifort? [duplicate]

Fortran "write" column limit? [duplicate]

compressing HDF5 files in Mathematica

Fortran 90 how to write very long output lines of different length

Fortran runtime error: Bad real number in item 1 of list input

Categories

Resources