How to extract all timesteps except the specified timesteps from a netcdf file using cdo - cdo-climate

We usually use 'cdo seltimestep' to select the days we want to extract from a netcdf file into an output file that includes only the days we specified.
cdo seltimestep,6,10,11 2000-03.nc 03-days.nc
but now we want to do the opposite, we want to select all the days except the ones specified. Is there a way to do that in cdo?

You can use the operator delete in the following way:
cdo delete,timestep=6,10,11 ifile ofile

Related

Extracting CRU precipitation data from netcdf into separate GeoTIFF files

I would like to extract CRU precipitation data in netcdf format into separate GeoTIFF files. Usually if the netcdf file only have variables: lon, lat, time and pre
I can manage to extract it using below script:
for t in `cdo showdate input.nc`; do
cdo seldate,$t input.nc dummy.nc
gdal_translate -of GTiff -a_ullr <top_left_lon> <top_left_lat> <bottom_right_lon> <bottom_right_lat> -a_srs EPSG:4326 dummy.nc $t.tif
done
The CRU precipitation data have variables: lon, lat, time, pre and stn
I can't use above script because it contains 2 subdataset, got message from CDO: Input file contains subdatasets. Please, select one of them for reading.
How to select pre variables in CDO and applying into above script?
If you mean that the files have more than one variable then you can select the variable "pre" using the command selvar, which you can then pipe to seldate:
cdo seldate,$t -selvar,pre input.nc dummy.nc

Skip top N lines in snowflake load

My actual data in csv extracts starts from line 10. How can I skip top few lines in snowflake load using copy or any other utility. Do we have anything similar to SKIP_HEADER ?
I have files on S3 and its my stage. I would be creating a snowpipe later on this datasource.
yes there is a skip_header option for CSV, allowing you to skip a specified number of rows, when defining a file format. Please have a look here:
https://docs.snowflake.net/manuals/sql-reference/sql/create-file-format.html#type-csv
So you create a file format associated with the csv files you have in mind and then use this when calling the copy commands.

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

Armadillo: Save multiple datasets in one hdf5 file

I am trying to save multiple datasets into a single hdf5 file using armadillo's new feature to give custom names to datasets (using armadillo version 8.100.1).
However, only the last saved dataset will end up in the file. Is there any way to append to an existing hdf5 file with armadillo instead of replacing it?
Here is my example code:
#define ARMA_USE_HDF5
#include <armadillo>
int main(){
arma::mat A(2,2, arma::fill::randu);
arma::mat B(3,3, arma::fill::eye);
A.save(arma::hdf5_name("multi-hdf5.mat", "dataset1"), arma::hdf5_binary);
B.save(arma::hdf5_name("multi-hdf5.mat", "dataset2"), arma::hdf5_binary);
return 0;
}
The hdf5 file is read out using the h5dump utility.
Unfortunately, I don't think you can do that. I'm an HDF5 developer, not an armadillo developer, but I took a peek at their source for you.
The save functions look like they are designed to dump a single matrix to a single file. In the function save_hdf5_binary() (diskio_meat.hpp:1255 for one version) they call H5Fcreate() with the H5F_ACC_TRUNC flag, which will clobber any existing file. There's no 'open if file exists' or clobber/non-clobber option. The only H5Fopen() calls are in the hdf5_binary_load() functions and those don't keep the file open for later writing.
This clobbering is what is happening in your case, btw. A.save() creates a file containing dataset1, then B.save() clobbers that file with a new file containing dataset2.
Also, for what it's worth, 'appending to an HDF5 file' is not really the right way to think about that. HDF5 files are not byte/character streams like a text file. Appending to a dataset, yes. Files, no. Think of it like a relational database: You might append data to a table, but you probably wouldn't say that you were appending data to the database.
The latest version of Armadillo already covers that possibility.
You have to use hdf5_opts::append in the save method so if you want to save
a matrix A then you can write
A.save(hdf5_name(filename, dataset, hdf5_opts::append) ).

How to check max number of file open-wirte-close operations per second in ubuntu

Having some files on disk. The files have fixed size lines with the following format:
98969547,1236548896,1236547899,0a234505,1478889565
which 0a234505 is an IP Address in hex format.
I should open a file, read on line of the file and found the IP address. Then, create a directory on disk (if not exists) with the same name as IP Address and create a file witch holds the line under that directory.
The file name is today date e.g. 2017-02-09. If the directory and the and its file is created previously, simply append the corresponding line to the end of the file.
My files contains too much lines e.g. 100000 or greater, so this steps must be repeated for all lines.
My requirement is to process one files with 100000 lines in one second.
so what i need to understand is what is the maximum number of file open-wirte-close operations per second in ubuntu 16.04?
if the answer does not satisfy my requirement, How should I properly do this?
so its better to say if the OS limitation does not allow me to do such a huge amount of open-write-close operations, is there any second way to do this?
Programming language: c++
OS: ubuntu-16.04 4.4.0-62-generic