How to convert a GRADS ctl GRIB file to a NetCDF? - cdo-climate

I am having problems converting a GrADs ctl GRIB data file to NetCDF format. From the Data Integration and Analysis System, I downloaded 6-hourly JRA-55 reanalysis data for an entire month. It downloads in the form of a DAT file and there is a supporting GrADs ctl file that says that the data type is in GRIB format.
The contents of the anl_surf.ctl file is as follows:
dset ^anl_surf.%y4%m2%d2%h2
index ^anl_surf.idx
undef 9.999E+20
title anl_surf
* produced by grib2ctl v0.9.12.5p41
dtype grib 255
options template
ydef 320 levels
-89.570 -89.013 -88.453 -87.892 -87.331 -86.769 -86.208 -85.647 -85.085 -84.523
-83.962 -83.400 -82.839 -82.277 -81.716 -81.154 -80.592 -80.031 -79.469 -78.908
-78.346 -77.784 -77.223 -76.661 -76.100 -75.538 -74.976 -74.415 -73.853 -73.291
-72.730 -72.168 -71.607 -71.045 -70.483 -69.922 -69.360 -68.799 -68.237 -67.675
-67.114 -66.552 -65.990 -65.429 -64.867 -64.306 -63.744 -63.182 -62.621 -62.059
-61.498 -60.936 -60.374 -59.813 -59.251 -58.689 -58.128 -57.566 -57.005 -56.443
-55.881 -55.320 -54.758 -54.196 -53.635 -53.073 -52.512 -51.950 -51.388 -50.827
-50.265 -49.704 -49.142 -48.580 -48.019 -47.457 -46.895 -46.334 -45.772 -45.211
-44.649 -44.087 -43.526 -42.964 -42.402 -41.841 -41.279 -40.718 -40.156 -39.594
-39.033 -38.471 -37.909 -37.348 -36.786 -36.225 -35.663 -35.101 -34.540 -33.978
-33.416 -32.855 -32.293 -31.732 -31.170 -30.608 -30.047 -29.485 -28.924 -28.362
-27.800 -27.239 -26.677 -26.115 -25.554 -24.992 -24.431 -23.869 -23.307 -22.746
-22.184 -21.622 -21.061 -20.499 -19.938 -19.376 -18.814 -18.253 -17.691 -17.129
-16.568 -16.006 -15.445 -14.883 -14.321 -13.760 -13.198 -12.636 -12.075 -11.513
-10.952 -10.390 -9.828 -9.267 -8.705 -8.144 -7.582 -7.020 -6.459 -5.897
-5.335 -4.774 -4.212 -3.651 -3.089 -2.527 -1.966 -1.404 -0.842 -0.281
0.281 0.842 1.404 1.966 2.527 3.089 3.651 4.212 4.774 5.335
5.897 6.459 7.020 7.582 8.144 8.705 9.267 9.828 10.390 10.952
11.513 12.075 12.636 13.198 13.760 14.321 14.883 15.445 16.006 16.568
17.129 17.691 18.253 18.814 19.376 19.938 20.499 21.061 21.622 22.184
22.746 23.307 23.869 24.431 24.992 25.554 26.115 26.677 27.239 27.800
28.362 28.924 29.485 30.047 30.608 31.170 31.732 32.293 32.855 33.416
33.978 34.540 35.101 35.663 36.225 36.786 37.348 37.909 38.471 39.033
39.594 40.156 40.718 41.279 41.841 42.402 42.964 43.526 44.087 44.649
45.211 45.772 46.334 46.895 47.457 48.019 48.580 49.142 49.704 50.265
50.827 51.388 51.950 52.512 53.073 53.635 54.196 54.758 55.320 55.881
56.443 57.005 57.566 58.128 58.689 59.251 59.813 60.374 60.936 61.498
62.059 62.621 63.182 63.744 64.306 64.867 65.429 65.990 66.552 67.114
67.675 68.237 68.799 69.360 69.922 70.483 71.045 71.607 72.168 72.730
73.291 73.853 74.415 74.976 75.538 76.100 76.661 77.223 77.784 78.346
78.908 79.469 80.031 80.592 81.154 81.716 82.277 82.839 83.400 83.962
84.523 85.085 85.647 86.208 86.769 87.331 87.892 88.453 89.013 89.570
xdef 640 linear 0.000000 0.562500
pdef 157792 1 file 1 stream binary-big ^TL319.pdef
tdef 120 linear 00Z01Apr1958 6hr
zdef 1 linear 1 1
vars 7
POTsfc 0 13,1,0 ** surface Potential temperature [K]
PRESsfc 0 1,1,0 ** surface Pressure [Pa]
RH2m 0 52,105,2 ** 2 m above ground Relative humidity [%]
SPFH2m 0 51,105,2 ** 2 m above ground Specific humidity [kg/kg]
TMP2m 0 11,105,2 ** 2 m above ground Temperature [K]
UGRD10m 0 33,105,10 ** 10 m above ground u-component of wind [m/s]
VGRD10m 0 34,105,10 ** 10 m above ground v-component of wind [m/s]
ENDVARS
Using someone's answer from a similarly asked question I used the command:
cdo -f nc import_binary anl_surf.ctl anl_surf.nc
But I receive the following error:
Open Error: Unknown keyword in description file
--> The invalid description file record is:
--> index ^anl_surf.idx
The data file was not opened.
cdo import_binary (Abort): Open failed!
I found out that this error is a result of that fact that the INDEX component is not supported by CDO and the import_binary operator does not support the GRIB format.
Does anyone know if there is an operator that supports converting a GrADS ctl file with data in GRIB format to a NetCDF. Unfortunately, I cannot download this data directly in the GRIB format, only in DAT format. Any help is appreciated, thank you!

Related

test dataset existence in HDF5/c++ and handle the error

I am reading *.hdf5 serie of files with HDF5 library in C++. The files have the same datasets (same keys, but different informations), but sometimes a single dataset can miss in a file (e.g. in 100 files I have the dataset apple, in 3 files I don't have any apple dataset), and in these cases there is the following exception:
HDF5-DIAG: Error detected in HDF5 (1.10.7) thread 0:
#000: H5D.c line 298 in H5Dopen2(): unable to open dataset
major: Dataset
minor: Can't open object
[...]
#005: H5Gloc.c line 376 in H5G__loc_find_cb(): object 'apple' doesn't exist
major: Symbol table
minor: Object not found
terminate called after throwing an instance of 'H5::GroupIException'
I would like to handle this exception, for example creating an empty apple dataset for that file, when this error occurs.
Here I post the chunck of code where I read the file->the group->the dataset. Handling the error, I would like to create anyway an empty GoldenApples vector, even when the dataset apple doesn't exist.
std::string FileName = "fruit." + std::to_string(cutID) + ".hdf5";
fruitFile = H5::H5File(FileName, H5F_ACC_RDONLY );
H5::Group group = fruitFile.openGroup("fruit");
H5::DataSet dataset = group.openDataSet("apple");
H5::DataSpace dataspace = dataset.getSpace();
hsize_t naxes[2];
dataspace.getSimpleExtentDims(naxes, NULL);
AppleType = Eigen::MatrixXd::Zero(naxes[1], naxes[0]);
dataset.read(AppleType.data(), H5::PredType::NATIVE_DOUBLE);
GoldenApples = std::vector<int>(naxes[0], 0.);
//need golden apples, which are in pos (4,i) in matrix AppleType
for (int i = 0; i < naxes[0]; i++){
GoldenApples[i] = AppleType(4,i);
}
fruitFile.close();
If you are not bound to a particular library, take a look at HDFql as it's easy to check the existence of an HDF5 dataset with it. Using HDFql in C++, your use-case could be solved as follows:
// check if dataset 'apple' exists in HDF5 file 'fruit.h5'
if (HDFql::execute("SHOW DATASET fruit.h5 apple") == HDFql::Success)
{
std::cout << "Dataset apple exists!" << std::endl;
}
else
{
std::cout << "Dataset apple does not exist!" << std::endl;
}
For additional information about HDFql, please check its reference manual as well as some examples.

Creating Parquet Table in Apache Drill

I am currently running Apache Drill on a 20 node cluster and was running into some errors that I was wondering if you would be able to help me with this.
I am attempting to run the following query to create a parquet table in a new S3 bucket from another table that is in a tsv format:
create table s3_output.tmp.`<output file>` as select
columns[0], columns[1], columns[2], columns[3], columns[4], columns[5], columns[6], columns[7], columns[8], columns[9],
columns[10], columns[11], columns[12], columns[13], columns[14], columns[15], columns[16], columns[17], columns[18], columns[19],
columns[20], columns[21], columns[22], columns[23], columns[24], columns[25], columns[26], columns[27], columns[28], columns[29],
columns[30], columns[31], columns[32], columns[33], columns[34], columns[35], columns[36], columns[37], columns[38], columns[39],
columns[40], columns[41], columns[42], columns[43], columns[44], columns[45], columns[46], columns[47], columns[48], columns[49],
columns[50], columns[51], columns[52], columns[53], columns[54], columns[55], columns[56], columns[57], columns[58], columns[59],
columns[60], columns[61], columns[62], columns[63], columns[64], columns[65], columns[66], columns[67], columns[68], columns[69],
columns[70], columns[71], columns[72], columns[73], columns[74], columns[75], columns[76], columns[77], columns[78], columns[79],
columns[80], columns[81], columns[82], columns[83], columns[84], columns[85], columns[86], columns[87], columns[88], columns[89],
columns[90], columns[91], columns[92], columns[93], columns[94], columns[95], columns[96], columns[97], columns[98], columns[99],
columns[100], columns[101], columns[102], columns[103], columns[104], columns[105], columns[106], columns[107], columns[108], columns[109],
columns[110], columns[111], columns[112], columns[113], columns[114], columns[115], columns[116], columns[117], columns[118], columns[119],
columns[120], columns[121], columns[122], columns[123], columns[124], columns[125], columns[126], columns[127], columns[128], columns[129],
columns[130], columns[131], columns[132], columns[133], columns[134], columns[135], columns[136], columns[137], columns[138], columns[139],
columns[140], columns[141], columns[142], columns[143], columns[144], columns[145], columns[146], columns[147], columns[148], columns[149],
columns[150], columns[151], columns[152], columns[153], columns[154], columns[155], columns[156], columns[157], columns[158], columns[159],
columns[160], columns[161], columns[162], columns[163], columns[164], columns[165], columns[166], columns[167], columns[168], columns[169],
columns[170], columns[171], columns[172], columns[173] from s3input.`<input path>*.gz`;
This is the error output I get while running this query.
Error: DATA_READ ERROR: Error processing input: , line=2026, char=2449781. Content parsed: [ ]
Failure while reading file s3a://.gz. Happened at or shortly before byte position 329719.
Fragment 1:19
[Error Id: fe289e19-c7b7-4739-9960-c15b8a62af3b on :31010] (state=,code=0)
Do you have any idea how I can go about trying to solve this issue?

How to write or convert float-type data to leveldb in caffe

Now I am making the leveldb to train caffe framework.So I use "convert_imageset.cpp". This cpp file writes the char-type data only to leveldb.
But I have the float data to write it to leveldb. This data is pre-proceed image data so it is float type data.
how can I write or convert this float data to leveldb.
This float data is a set of vector with 4096 dimensions.
Please help me.
Or not how to convert it to HDF5Data?
HDF5 stands for hierarchical data format. You can manipulate such data format for example with R (RHDF5 documentation)
Other software that can process HDF5 are Matlab and Mathematica.
EDIT
A new set of tools called HDFql has been recently released to simplify "managing HDF files through a high-level language like C/C++". You can check it out here
def del_and_create(dname):
if os.path.exists(dname):
shutil.rmtree(dname)
os.makedirs(dname)
def get_img_datum(image_fn):
img = cv.imread(image_fn, cv.IMREAD_COLOR)
img = img.swapaxes(0, 2).swapaxes(1, 2)
datum = caffe.io.array_to_datum(img, 0)
return datum
def get_jnt_datum(joint_fn):
joint = np.load(joint_fn)
datum = caffe.io.caffe_pb2.Datum()
datum.channels = len(joint)
datum.height = 1
datum.width = 1
datum.float_data.extend(joint.tolist())
return datum
def create_dataset():
img_db_fn = 'img.lmdb'
del_and_create(img_db_fn)
img_env = lmdb.Environment(img_db_fn, map_size=1099511627776)
img_txn = img_env.begin(write=True, buffers=True)
jnt_db_fn = 'joint.lmdb'
del_and_create(jnt_db_fn)
jnt_env = lmdb.Environment(jnt_db_fn, map_size=1099511627776)
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_fns = glob.glob('imageData/*.jpg')
fileCount = len(img_fns)
print 'A total of ', fileCount, ' images.'
jnt_fns = glob.glob('jointData/*.npy')
jointCount = len(jnt_fns)
if(fileCount != jointCount):
print 'The file counts doesnot match'
exit()
keys = np.arange(fileCount)
np.random.shuffle(keys)
for i, (img_fn, jnt_fn) in enumerate( zip(sorted(img_fns), sorted(jnt_fns)) ):
img_datum = get_img_datum(img_fn)
jnt_datum = get_jnt_datum(jnt_fn)
key = '%010d' % keys[i]
img_txn.put(key, img_datum.SerializeToString())
jnt_txn.put(key, jnt_datum.SerializeToString())
if i % 10000 == 0:
img_txn.commit()
jnt_txn.commit()
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_txn = img_env.begin(write=True, buffers=True)
print '%d'%(i), os.path.basename(img_fn), os.path.basename(jnt_fn)
img_txn.commit()
jnt_txn.commit()
img_env.close()
jnt_env.close()
The above code expects images from a given path, and the labels of each image as .npy file.
Credits: https://github.com/mitmul/deeppose/blob/caffe/scripts/dataset.py
Note: I had seen Shai's answer to a question, which claims that lmdb doesnot support float-type data. But, it does work for me with the latest version of Caffe and LMDB and using this code snippet. As his answer is way too old, its highly likely that older versions may not have supported float-type data.

opencv blur() vs. fspecial('average')

my filterSize =2
h = fspecial('average', filterSize);
imageData = imfilter(imageData, h, 'replicate');
meaning my kernel is 1/4 1/4 and so on..
I'm using
cv::blur(dst, dst,cv::Size(filterSize,filterSize));
so I should get the same results.
however the openCV results has a duplicated line, probably due to circular or some other sort of padding for the matrix.
line 0 and line 1 are the same.
can you suggest for it to be without that line
openCV results:
0.027027026, 0.027027026, 0.022727273, 0.018427517, 0.017199017, 0.017813267,
0.027027026, 0.027027026, 0.022727273, 0.018427517, 0.017199017, 0.017813267,
0.02948403, 0.02948403, 0.027641278, 0.028255528, 0.03194103, 0.03194103,
0.054054055, 0.054054055, 0.055896807, 0.064496316, 0.077395573, 0.079852581,
0.11240786, 0.11240786, 0.11855037, 0.13513513, 0.14864865, 0.1566339,
0.16646191, 0.16646191, 0.17383292, 0.18611793, 0.18673219, 0.19471744,
0.18243243, 0.18243243, 0.19471744, 0.19594595, 0.18796068, 0.1928747,
Matlab results:
0.0270270 0.022727 0.018427 0.017199 0.017813 0.0251842
0.029484 0.027641 0.028255 0.031941 0.031941 0.0350122
0.0540540 0.055896 0.064496 0.077395 0.079852 0.0847665
0.1124078 0.118550 0.135135 0.148648 0.156633 0.1683046
0.1664619 0.173832 0.186117 0.186732 0.194717 0.2057739
0.1824324 0.194717 0.195945 0.187960 0.192874 0.1947174

Which compression algorithm is better suited to compress protocol buffers output?

Given:
a DB table with numeric statistics only
binary dump of the table - record after record
protocol buffers dump of the table - using proto schema created from the table schema
As I have expected the binary dump produced by the protocol buffers is smaller than the naive binary dump of the same table:
PS Z:\dev\internal\vedtool> dir net_endshapes_stat.data | % { "$($_.Name): $($_.Length)B" }
net_endshapes_stat.data: 2941331B
net_endshapes_stat.data: 4311042B
However, when I compressed both files with 7z using Ultra level and LZMA method, I have discovered that the larger file compresses better:
PS Z:\dev\internal\vedtool> dir net_endshapes_stat.7z | % { "$($_.Name): $($_.Length)B" }
net_endshapes_stat.7z: 1206186B
net_endshapes_stat.7z: 1055901B
Now, I understand that it is completely OK, still I am wondering whether there is a compression algorithm better tuned to perform on the protocol buffers output.
EDIT
Here is the proto schema:
message net_endshapes_stat {
optional fixed32 timestamp = 1;
optional sint32 shape_id = 2;
optional sint64 bps_in = 3;
optional sint64 bps_out = 4;
optional sint64 total_in = 5;
optional sint64 total_out = 6;
}