How to concatenate using ncrcat or cdo merge with packed netcdf files - nco

I have ERA5 files that I am trying to concatenate into monthly files. It appears the files have been packed to reduce size making the data type within the file a short. When I try ncrcat, it will warn about encountering a packing attribute "add_offset", then concatenate all the files together. However the values of the data become messed up. I tried using ncpdq -U to unpack the files, then ncrcat to concatenate which works. But the resulting files are too large to be useful and when I try ncpdq to repack the resulting file I receive a malloc() failure which seems related to a memory/RAM issue.
I've also tried cdo merge which strangely works perfectly for most of the concatenations, but a few of the files fail and output this error "Error (cdf_put_vara_double): NetCDF: Numeric conversion not representable"
So is there anyway to concatenate these files while they are still packed, or at least a way to repack the large files once they are concatenated

Instead of repacking the large files once they are concatenated you could try netCDF4 compression, e.g.,
ncpdq -U -7 -L 1 inN.nc in_upk_cmpN.nc # Loop over N
ncrcat in_upk_cmp*.nc out.nc
Good luck!

When data is packed, CDO will often throw an error due to too much loss of precision,
cdo -b32 mergetime in*.nc out.nc
should do the trick and avoid the error. If you want to then compress the files you can try this:
cdo -z zip_9 copy out.nc out_compressed.nc

Related

Is double compression less effective?

Let's say we have multiple packages stored as .tar.gz files and we want to combine them into one bundle. Everything I know about lossless file compression is that it attempts to find patterns in the data. From that, my intuition is that it would be able to find more patterns and therefore produce smaller bundle if I first decompress the packages into .tar files and then combine them into one bundle.tar.gz. Is my intuition correct? Or is it not worth the hassle and creating the bundle from the .tar.gz files directly would produce similar results?
I tested it with a random collection of txts (RFC 1-500 from https://www.rfc-editor.org/retrieve/bulk/) and compressing each of them individually and then creating the final .tar.gz from the compressed files yields a 15% bigger result, which supports my intuition but maybe not to an extent I expected.
total size of txts: 5.6M
total size of individually compressed txts: 2.7M
size of .tar.gz from txts: 1.4M
size of .tar.gz from compressed txts: 1.6M
I would like to understand more how it behaves in general.
Compressing something with gzip that is already compressed will generally expand the data, but only by a very small amount, multiplying the size by about 1.0003.
The fact that you are getting a 15% benefit from decompressing the pieces and recompressing the bundle means that your pieces must be relatively small in order for gzip's 32K byte matching distance to find more matches and increase the compression by that much. (You did not say how many of these individually compressed texts there were.)
By the way, it is easy to combine several .tar files into a single .tar file. Each .tar file is terminated with 1024 zero bytes. Strip that from every .tar file except the last one, and concatenate them. Then you have one .tar file to compress.

Warning (cdfInqContents): Coordinates variable XTIME can't be assigned

I have a high number of daily WRF outputs, each one consisting of 24 time steps for every single hour of the day. Now I would like to combine these single output files to one resulting file that comprises the entire time period by using cdo mergetime. I have done this before with some other output files in another context and it worked well.
When I apply this command for example:
cdo mergetime wrf_file1.nc wrf_file2.nc output_file.nc
I get the following message many times: Warning (cdfInqContents): Coordinates variable XTIME can't be assigned!
Since it is only a warning and not an error, the process continues. But it takes way too much time and the resulting output file is way too big. For example, when the two input files are about 6 GB, the resulting output file is above 40 GB, which does not make sense at all.
Anybody with an idea how to solve this?
The merged files are probably large because CDO does not, by default, compress the output file. And the WRF files are probably compressed.
You can modify your call to compress the output as follows:
cdo -z zip -mergetime wrf_file1.nc wrf_file2.nc output_file.nc

Concatenate Monthy modis data

I downloaded daily MODIS DATA LEVEL 3 data for a few months from https://disc.gsfc.nasa.gov/datasets. The filenames are of the form MCD06COSP_M3_MODIS.A2006001.061.2020181145945 but the files do not contain any time dimension. Hence when I use ncecat to concatenate various files, the date information is missing in the resulting file. I want to know how to add the time information in the combined dataset.
Your commands look correct. Good job crafting them. Not sure why it's not working. Possibly the input files are HDF4 format (do they have a .hdf suffix?) and your NCO is not HDF4-enabled. Try to download the files in netCDF3 or netCDF4 format and your commands above should work. If that's not what's wrong, then examine the output files in each step of your procedure and identify which step produces the unintended results and then narrow your question. Good luck.

Why zipping HDF5 file is still getting a good amount of compression even if all datasets are compressed inside the file?

I am using HDF5 file system in my desktop application. I have used GZIP level 5 compression with all the datasets inside the file.
But still when I am zipping the HDF5 file using 7zip, the file size is getting even smaller by around half to one third!!!
The process I am following is:
Generating the HDF5 file.
Importing data in the file.
Freeing up unaccounted space, if any, using h5repack utility.
Using 7zip I am zipping the file to .zip
How is it possible?
Where is the scope of more compression?
How to generate an even smaller HDF5 file? Any suggestions about the using property(H5P).
I thought that 7zip maybe ruthlessly compressing my file using GZIP level 9 but I tried using GZIP level 9 in my HDF5 file. New file size is still the half of the original.
You are applying compression to only the dataset elements in the HDF5 file. Other components of the HDF5 file (internal metadata and objects such as groups) aren't compressed. So, when you compress the entire file, those other components compress, and the already compressed dataset elements could compress some more also.
gzip has a maximum compression ratio of about 1000:1. If the data is more compressible than that, then you can compress it a second time to get more compression (the second time could be gzip again). You can do a simple experiment with a file consisting of only zeros:
% dd ibs=1 count=1000000 < /dev/zero > zeros
% wc -c zeros
1000000
% gzip < zeros | wc -c
1003
% gzip < zeros | gzip | wc -c
64
So what was the compression ratio of your first compression?

compressing HDF5 files in Mathematica

I am working with Mathematica 9 and exporting huge lists (a typical list will have dimensions of 182500,4,8,42). Each file has about 6 lists of this size (all integers, not sure if this makes a difference in lists, I know it do in other array types, anyway). Saving them is HDF5 format successfully, however, the size of the files is relatively large (1.5 GB).
Therefore, I am trying to compress the files with GZIP from within Mathematica, since they claim it is an option in the export function, which has a lot of bugs by the way.
Couldn't find any help the net after all the attempts following the documentation didn't pan out. I was wondering if one of our Mathematica enthusiasts can way in with some tips.
The compression happens automatically if the filename ends with ".gz"
So instead of
Export["file.h5", data]
Use
Export["file.h5.gz", data]
List of available formats and their extension