Extracting CRU precipitation data from netcdf into separate GeoTIFF files - cdo-climate

I would like to extract CRU precipitation data in netcdf format into separate GeoTIFF files. Usually if the netcdf file only have variables: lon, lat, time and pre
I can manage to extract it using below script:
for t in `cdo showdate input.nc`; do
cdo seldate,$t input.nc dummy.nc
gdal_translate -of GTiff -a_ullr <top_left_lon> <top_left_lat> <bottom_right_lon> <bottom_right_lat> -a_srs EPSG:4326 dummy.nc $t.tif
done
The CRU precipitation data have variables: lon, lat, time, pre and stn
I can't use above script because it contains 2 subdataset, got message from CDO: Input file contains subdatasets. Please, select one of them for reading.
How to select pre variables in CDO and applying into above script?

If you mean that the files have more than one variable then you can select the variable "pre" using the command selvar, which you can then pipe to seldate:
cdo seldate,$t -selvar,pre input.nc dummy.nc

Related

How to extract all timesteps except the specified timesteps from a netcdf file using cdo

We usually use 'cdo seltimestep' to select the days we want to extract from a netcdf file into an output file that includes only the days we specified.
cdo seltimestep,6,10,11 2000-03.nc 03-days.nc
but now we want to do the opposite, we want to select all the days except the ones specified. Is there a way to do that in cdo?
You can use the operator delete in the following way:
cdo delete,timestep=6,10,11 ifile ofile

How can I read 7z files directly from SAS?

I'm a SAS beginner programmer who has a lot of compressed files under 7z format. Because of lack of space in the server I work, I need to open files directly from their compressed form. I've found the following SAS documentation about Reading Compressed Text Files:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/155-31.pdf
However, I do not obtain any result using the next code, for example:
FileName Com7zipa Pipe '7za e "rie_mbco_matriz_07.7z" "rie_mbco_matriz_07.sas7bdat" -y -so';
Data DataSet07;
infile Com7zipa;
Input NRO_DOC;
run;
I hope you can help me.
Best regards,
Jean Pierre
As far as I am aware, gzipis supported in 9.4M5 or above but 7z is not. Although server space is limited, you will likely have at least a fair amount of SAS WORK directory space allocated. You can use x commands to unzip the file to the WORK directory instead and read it from there. Note that you will need to enable x commands first using this method.
Let's assume a file named myfile.7z has a csv file in it named myfile.csv.
/* Create a macro variable holding the location of the WORK directory */
%let workdir = %sysfunc(getoption(work));
x "7za e /my/dir/myfile.7z -o&workdir.";
proc import
file = "&workdir./myfile.csv"
out = myfile
dbms = csv
replace;
run;

BigQuery export to CSV file via SQL

I want to create a CSV file which contains the results of query.
This CSV file will live in Google Cloud Storage. (This query is around 15GB) I need it to be a single file. Is it possible, if so how?
CREATE OR REPLACE TABLE `your-project.your-dataset.chicago_taxitrips_mod` AS (
WITH
taxitrips AS (
SELECT
trip_start_timestamp,
trip_end_timestamp,
trip_seconds,
trip_miles,
pickup_census_tract,
dropoff_census_tract,
pickup_community_area,
dropoff_community_area,
fare,
tolls,
extras,
trip_total,
payment_type,
company,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
IF((tips/fare >= 0.2),
1,
0) AS tip_bin
FROM
`bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE
trip_miles > 0
AND fare > 0)
SELECT
trip_start_timestamp,
trip_end_timestamp,
trip_seconds,
trip_miles,
pickup_census_tract,
dropoff_census_tract,
pickup_community_area,
dropoff_community_area,
fare,
tolls,
extras,
trip_total,
payment_type,
company,
tip_bin,
ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
pickup_latitude), 0.1)) AS pickup_grid,
ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
dropoff_latitude), 0.1)) AS dropoff_grid,
ST_Distance(ST_GeogPoint(pickup_longitude,
pickup_latitude),
ST_GeogPoint(dropoff_longitude,
dropoff_latitude)) AS euclidean,
CONCAT(ST_AsText(ST_SnapToGrid(ST_GeogPoint(pickup_longitude,
pickup_latitude), 0.1)), ST_AsText(ST_SnapToGrid(ST_GeogPoint(dropoff_longitude,
dropoff_latitude), 0.1))) AS loc_cross
FROM
taxitrips
LIMIT
100000000
)
If BigQuery needs to output multiple files, you can then concatenate them into a single one with a gsutil operation for files in GCS:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
https://cloud.google.com/storage/docs/gsutil/commands/compose
Note that there is a limit (currently 32) to the number of components that can be composed in a single operation.
Exporting 15GB to a single CSV file is not possible (to multiple files is possible). I tried your same query (Bytes processed 15.66 GB) then tried to export it to a CSV file in GCS but failed with this error
Table gs://[my_bucket]/bq_export/test.csv too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data.
BQ Documentation only allows you to export up to 1 GB of table data to a single file. Since the table exceeds 1GB then you have to use a wildcard like:
gs://your-bucket-name/csvfilename*.csv
Not sure why would you like the export csv file to be in a single file but IMHO it's too large to be in a single file. writing it to multiple files will be a lot faster since BQ would use its parallelism to write the output using multiple threads.

PDI - Multiple file input based on date in filename

I'm working with a project using Kettle (PDI).
I have to input multiple file of .csv or .xls and insert it into DB.
The file name are AAMMDDBBBB, where AA is code for city and BBBB is code for shop. MMDD is date format like MM-DD. For example LA0326F5CA.csv.
The Regexp I use in the Input file steps look like LA.\*\\.csv or DT.*\\.xls, which is return all files to insert it into DB.
Can you indicate me how to select the files the file just for yesterday (based on the MMDD of the file name).
As you need some "complex" logic in your selection, you cannot filter based only on regexp. I suggest you first read all filenames, then filter the filenames based on their "age", then read the file based on the selected filenames.
In detail:
Use the Get File Names step with the same regexp you currently use (LA.*\.csv or DT.*\.xls). You may be more restrictive at that stage with a Regexp like LA\d\d\d\d.....csv, to ensure MM and DD are numbers, and DDDD is exactly 4 characters.
Filter based on the date. You can do this with a Java Filter, but it would be an order of magnitude easier to use a Javascript Script to compute the "age" of you file and then to use a Filter rows to keep only the file of yesterday.
To compute the age of the file, extract the MM and DD, you can use (other methods are available):
var regexp = filename.match(/..(\d\d)(\d\d).*/);
if(regexp){
var age = new Date() - new Date(2018, regexp[1], regexp[2]);
age = age /1000 /60 /60 /24;
};
If you are not familiar with Javascript regexp: the match will test
the filename against the regexp and keep the values of the parenthesis
in an array. If the test succeed (which you must explicitly check to
avoid run time failure), use the values of the match to compute the
corresponding date, and subtract the date of today to get the age.
This age is in milliseconds, which is converted in days.
Use the Text File Input and Excel Input with the option Accept file from previous step. Note that CSV Input does not have this option, but the more powerful Text File Input has.
well I change the Java Filter with Modified Java Script Value and its work fine now.
Another question, how can I increase the Performance and Speed of my current transformation(now I have 2 trans. for 2 cities)? My insert update make my tranformation to slow and need almost 1hour and 30min to process 500k row of data with alot of field(300mb) and my data not only this, if is it work more fast and my company like to used it, im gonna do it with 10TB of data/years and its alot of trans and rows. I need sugestion about it

DO File export Command

I was wondering if there was a way in which I could export the data I create in Stata do file to the Stata file format of .dta. I'm using the following code to create the dataset.
use data1, clear
foreach num of numlist 2/30 {
append using data`num'
}
In order to specify a name and working directory as part of your Stata Do-File, save "C:\Users\user\Desktop\name.dta"should do the job.