hight resolution Reanalysis data - python-2.7

When I extract data from a netCDF file Reanalysis (variable pressure (SLP), 01/01/2014) the data is very high resolution (9km grid) which makes the resulting image quite noisy. I would like to put the data into a lower resolution grid (e.g. 1 degree). I'm trying to use the functions meshgrid and gridata, but inexperience am unable to make it work. Does anyone know how to solve? Thank you.
from netCDF4 import Dataset
import numpy as np
from scipy.interpolate import griddata
file = Dataset('slp_2014_01_01.nc', 'r')
# Printing variables
print ' '
print ' '
print '----------------------------------------------------------'
for i,variable in enumerate(file.variables):
print ' '+str(i),variable
if i == 2:
current_variable = variable
print ' '
print 'Variable: ', current_variable.upper()
print 'File name: ', file_name
lat = file.variables['lat'][:]
lon = file.variables['lon'][:]
slp = file.variables['slp'][:]
lon_i = np.linspace(lon[0], lon[len(REANALYSIS_lon)-1], num=len(lon)*2, endpoint=True, retstep=False)
lat_i = np.linspace(lat[0], lat[len(lat)-1], num=len(lat)*2, endpoint=True, retstep=False)
lon_grid, lat_grid = np.meshgrid(lon_i,lat_i)
temp_slp = np.asarray(slp).squeeze()
new_slp = temp_slp.reshape(temp_slp.size)
slp_grid = griddata((lon, lat), new_slp, (lon_grid, lat_grid),method='cubic')
As I mentioned, I tried to use the meshgrid and datagrid functions, but produced the following error:
Traceback (most recent call last):
File "REANALYSIS_LOCAL.py", line 346, in
lon,lat,time,var,variavel_atual=netCDF_builder_local(caminho_netcdf_local,nome_arquivo,dt)
File "REANALYSIS_LOCAL.py", line 143, in netCDF_builder_local
slp_grid = griddata((lon, lat), new_slp, (lon_grid, lat_grid),method='cubic')
File "/home/carlos/anaconda/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", line 182, in griddata
points = _ndim_coords_from_arrays(points)
File "interpnd.pyx", line 176, in scipy.interpolate.interpnd._ndim_coords_from_arrays (scipy/interpolate/interpnd.c:4064)
File "/home/carlos/anaconda/lib/python2.7/site-packages/numpy/lib/stride_tricks.py", line 101, in broadcast_arrays
"incompatible dimensions on axis %r." % (axis,))
ValueError: shape mismatch: two or more arrays have incompatible dimensions on axis 0.
The dimensions of variables are:
lon: (144,)
lat: (73,)
lon_i: (288,)
lat_i: (146,)
lon_grid: (146, 288)
lat_grid: (146, 288)
new_slp: (10512,)
The values in new_slp are:
new_slp: [ 102485. 102485. 102485. ..., 100710. 100710. 100710.]
The purpose is increase the values in the variables (lon, lat and slp), because the Reanalysis resolution is highter. Then, the resolution could be most detailed (more points).
For example: the variable lat have the points:
Original dimension variable lat: (73,)
lat: [ 90. 87.5 85. 82.5 80. 77.5 75. 72.5 70. 67.5 65. 62.5
60. 57.5 55. 52.5 50. 47.5 45. 42.5 40. 37.5 35. 32.5
30. 27.5 25. 22.5 20. 17.5 15. 12.5 10. 7.5 5. 2.5
0. -2.5 -5. -7.5 -10. -12.5 -15. -17.5 -20. -22.5 -25. -27.5
-30. -32.5 -35. -37.5 -40. -42.5 -45. -47.5 -50. -52.5 -55. -57.5
-60. -62.5 -65. -67.5 -70. -72.5 -75. -77.5 -80. -82.5 -85. -87.5
-90. ]
When I define the code line: lat_i = np.linspace(lat[0], lat[len(lat)-1], num=len(lat)*2, endpoint=True, retstep=False) I doubled the values of the lat variable la_i(146,)
lat _i: [ 90. 88.75862069 87.51724138 86.27586207 85.03448276 83.79310345 82.55172414 81.31034483 80.06896552 78.82758621 77.5862069
...
-78.82758621 -80.06896552 -81.31034483 -82.55172414 -83.79310345 -85.03448276 -86.27586207 -87.51724138 -88.75862069 -90. ]
The idea that I need is the same is available in this code, where x is lon, y is lat and slp is z.
from scipy.interpolate import griddata
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(1.,10.,20)
y=np.linspace(1.,10.,20)
z=z = np.random.random(20)
xi=np.linspace(1.,10.,40)
yi=np.linspace(1.,10.,40)
X,Y= np.meshgrid(xi,yi)
Z = griddata((x, y), z, (X, Y),method='nearest')
plt.contourf(X,Y,Z)

Depending on Your final purpose, You may use cdo to regrid the whole file
cdo remapbil,r360x180 infile outfile
or just plot every second or third value from original file like this:
plt.pcolormesh(lon[::2,::2],lat[::2,::2],var1[::2,::2])
The error message You show just says that dimensions do not much, just print the shape of your variables before the error appears and try to get it working.
Why Your code does not work?
Your chosen method requires input coordinates as lon,lat pairs for data points, not mesh coordinates. If You have data points with shape 10000, your coordinates must be with the shape (10000,2), not (100,100).
But as griddata is meant for unstructured data, it will not be efficient for Your purpose, I suggest using something like scipy.interpolate.RegularGridInterpolator
But anyway, if You need to use the interpolated data more than once, I suggest creating new netCDF files with cdo and process them, instead of interpolating data each time You run Your script.

Thanks for your help. Really, my problem was about dimensions. I'm learning to work with oceanographic data. So, I solved the problem with this code.
lonbounds = [25,59]
latbounds = [-10,-33]
#longitude lower and upper index
lonli = np.argmin(np.abs(lon - lonbounds[0]))
lonui = np.argmin(np.abs(lon - lonbounds[1]))
#latitude lower and upper index
latli = np.argmin(np.abs(lat - latbounds[0]))
latui = np.argmin(np.abs(lat - latbounds[1]))
#limiting of the interest region/data
lon_f = file.variables['lon'][lonli:lonui]
lat_f = file.variables['lat'][latli:latui]
slp_f = file.variables['slp'][0,latli:latui,lonli:lonui]
#creating a matrix with the filtered data (area to be searched) for use in gridData function of python
lon_f_grid, lat_f_grid = np.meshgrid(lon_f,lat_f)
#adjusting the data (size 1) for use in gridData function of python
lon_f1 = lon_f_grid.reshape(lon_f_grid.size)
lat_f1 = lat_f_grid.reshape(lat_f_grid.size)
slp_f1 = slp_f.reshape(slp_f.size)
#increasing the resolution of data (1000 points) of longitude and latitude for the data to be more refined
lon_r = np.linspace(lon_f[0], lon_f[len(lon_f)-1], num=1000, endpoint=True, retstep=False)
lat_r = np.linspace(lat_f[0], lat_f[len(lat_f)-1], num=1000, endpoint=True, retstep=False)
#creating a matrix with the filtered data (area to be searched) and higher resolution for use in gridData function of python
lon_r_grid, lat_r_grid = np.meshgrid(lon_r,lat_r)
#applying gridata that can be generated since pressure (SLP) with higher resolution.
slp_r = griddata((lon_f1,lat_f1),slp_f1,(lon_r_grid,lat_r_grid),method='cubic')
Hugs,
Carlos.

Related

How do I run a Pchip or interp1d in python for a cubic function?

i cant get this code to work and im going crazy, i have some data(time vs current) and its a logarithmic graph, i want to use pchip interpolation or at least a cubic one between points as the data is real measurements and just assuming log gives too much error.
import os
import numpy as np
import pandas as pd
import sys
import xlrd
from scipy.interpolate import interp1d
df = pd.read_excel("re.xlsx")
array = df.values
ans = []
maxy = array.shape[0]
x = []
y = []
count = 0
while count < maxy:
x.append(float(array[count,0]))
y.append(float(array[count,1]))
count = count +1
print(x)
time = 0.001
while(time < 5):
f = interp1d(x, y, kind='cubic', copy=True, bounds_error=True, fill_value=np.nan)
print(time,f(time))
ans.append(f(time))
time = (time *1.1)
Here is the output, i dont get the error? the array is correct is it not? this is the cubic interpolation attempt but when i tried pchip i had a similar error saying x wast necessarily acceding
[0.0837, 0.0841, 0.0845, 0.0853, 0.0856, 0.0866, 0.0881, 0.0882, 0.09,
0.0921, 0.0921, 0.0947, 0.0973, 0.0977, 0.1042, 0.1122, 0.1202, 0.1233,
0.1304, 0.1365, 0.1415, 0.1432, 0.147, 0.1531, 0.1595, 0.1598, 0.1689,
0.1772, 0.1792, 0.191, 0.1999, 0.206, 0.2239, 0.2274, 0.2533, 0.2539, 0.2934,
0.294, 0.346, 0.3462, 0.4201, 0.4428, 0.5215, 0.5947, 0.6346, 0.7889, 0.8605,
0.9382, 0.9846, 1.128, 1.261, 1.4086, 1.5932, 1.6089, 1.8511, 2.0602, 2.167,
2.56, 2.6284, 3.228, 3.2321, 4.0363, 4.0959, 5.1183]
Traceback (most recent call last):
File "C:/Python27/projects/interpolationFromExcel.py", line 25, in <module>
f = interp1d(x, y, kind='cubic', copy=True, bounds_error=True,
fill_value=np.nan)
File "C:\Python27\lib\site-packages\scipy\interpolate\interpolate.py", line
535, in __init__
check_finite=False)
File "C:\Python27\lib\site-packages\scipy\interpolate\_bsplines.py", line
777, in make_interp_spline
raise ValueError("Expect x to be a 1-D sorted array_like.")
ValueError: Expect x to be a 1-D sorted array_like.

Plotting Elevation in Python

I'm trying to create a map of Malawi with altitude shown. Something like this, but of Malawi of course:
I have downloaded some elevation data from here: http://research.jisao.washington.edu/data_sets/elevation/
This is a print of that data after I created a cube:
meters, from 5-min data / (unknown) (time: 1; latitude: 360; longitude: 720)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Attributes:
history:
Elevations calculated from the TBASE 5-minute
latitude-longitude resolution...
invalid_units: meters, from 5-min data
I started with importing my data, forming a cube, removing the extra variables (time and history) and limiting my data to the latitudes and longitudes for Malawi.
import matplotlib.pyplot as plt
import matplotlib.cm as mpl_cm
import numpy as np
import iris
import cartopy
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
import iris.analysis.cartography
def main():
#bring in altitude data
Elev = '/exports/csce/datastore/geos/users/s0899345/Climate_Modelling/Actual_Data/elev.0.5-deg.nc'
Elev= iris.load_cube(Elev)
#remove variable for time
del Elev.attributes['history']
Elev = Elev.collapsed('time', iris.analysis.MEAN)
Malawi = iris.Constraint(longitude=lambda v: 32.0 <= v <= 36., latitude=lambda v: -17. <= v <= -8.)
Elev = Elev.extract(Malawi)
print 'Elevation'
print Elev.data
print 'latitude'
print Elev.coord('latitude')
print 'longitude'
print Elev.coord('longitude')
This works well and the output is as follows:
Elevation
[[ 978. 1000. 1408. 1324. 1080. 1370. 1857. 1584.]
[ 1297. 1193. 1452. 1611. 1354. 1480. 1350. 627.]
[ 1418. 1490. 1625. 1486. 1977. 1802. 1226. 482.]
[ 1336. 1326. 1405. 728. 1105. 1559. 1139. 789.]
[ 1368. 1301. 1463. 1389. 671. 942. 947. 970.]
[ 1279. 1116. 1323. 1587. 839. 1014. 1071. 1003.]
[ 1096. 969. 1179. 1246. 855. 979. 927. 638.]
[ 911. 982. 1235. 1324. 681. 813. 814. 707.]
[ 749. 957. 1220. 1198. 613. 688. 832. 858.]
[ 707. 1049. 1037. 907. 624. 771. 1142. 1104.]
[ 836. 1044. 1124. 1120. 682. 711. 1126. 922.]
[ 1050. 1204. 1199. 1161. 777. 569. 999. 828.]
[ 1006. 869. 1183. 1230. 1354. 616. 762. 784.]
[ 838. 607. 883. 1181. 1174. 927. 591. 856.]
[ 561. 402. 626. 775. 1053. 726. 828. 733.]
[ 370. 388. 363. 422. 508. 471. 906. 1104.]
[ 504. 326. 298. 208. 246. 160. 458. 682.]
[ 658. 512. 334. 309. 156. 162. 123. 340.]]
latitude
DimCoord(array([ -8.25, -8.75, -9.25, -9.75, -10.25, -10.75, -11.25, -11.75,
-12.25, -12.75, -13.25, -13.75, -14.25, -14.75, -15.25, -15.75,
-16.25, -16.75], dtype=float32), standard_name='latitude', units=Unit('degrees'), var_name='lat', attributes={'title': 'Latitude'})
longitude
DimCoord(array([ 32.25, 32.75, 33.25, 33.75, 34.25, 34.75, 35.25, 35.75], dtype=float32), standard_name='longitude', units=Unit('degrees'), var_name='lon', attributes={'title': 'Longitude'})
However when I try to plot it, it doesn't work... this is what I did:
#plot map with physical features
ax = plt.axes(projection=cartopy.crs.PlateCarree())
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS)
ax.add_feature(cartopy.feature.LAKES, alpha=0.5)
ax.add_feature(cartopy.feature.RIVERS)
#plot altitude data
plot=ax.plot(Elev, cmap=mpl_cm.get_cmap('YlGn'), levels=np.arange(0,2000,150), extend='both')
#add colour bar index and a label
plt.colorbar(plot, label='meters above sea level')
#set map boundary
ax.set_extent([32., 36., -8, -17])
#set axis tick marks
ax.set_xticks([33, 34, 35])
ax.set_yticks([-10, -12, -14, -16])
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
#save the image of the graph and include full legend
plt.savefig('Map_data_boundary', bbox_inches='tight')
plt.show()
The error I get is 'Attribute Error: Unknown property type cmap' and the following map of the whole world...
Any ideas?
I'll prepare the data the same as you, except to remove the time dimension I'll use iris.util.squeeze, which removes any length-1 dimension.
import iris
elev = iris.load_cube('elev.0.5-deg.nc')
elev = iris.util.squeeze(elev)
malawi = iris.Constraint(longitude=lambda v: 32.0 <= v <= 36.,
latitude=lambda v: -17. <= v <= -8.)
elev = elev.extract(malawi)
As #ImportanceOfBeingErnest says, you want a contour plot. When unsure what plotting function to use, I recommend browsing the matplotlib gallery to find something that looks similar to what you want to produce. Click on an image and it shows you the code.
So, to make the contour plot you can use the matplotlib.pyplot.contourf function, but you have to get the relevant data from the cube in the form of numpy arrays:
import matplotlib.pyplot as plt
import matplotlib.cm as mpl_cm
import numpy as np
import cartopy
cmap = mpl_cm.get_cmap('YlGn')
levels = np.arange(0,2000,150)
extend = 'max'
ax = plt.axes(projection=cartopy.crs.PlateCarree())
plt.contourf(elev.coord('longitude').points, elev.coord('latitude').points,
elev.data, cmap=cmap, levels=levels, extend=extend)
However, iris provides a shortcut to the maplotlib.pyplot functions in the form of iris.plot. This automatically sets up an axes instance with the right projection, and passes the data from the cube through to matplotlib.pyplot. So the last two lines can simply become:
import iris.plot as iplt
iplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
There is also iris.quickplot, which is basically the same as iris.plot, except that it automatically adds a colorbar and labels where appropriate:
import iris.quickplot as qplt
qplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
Once plotted, you can get hold of the axes instance and add your other items (for which I simply copied your code):
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
qplt.contourf(elev, cmap=cmap, levels=levels, extend=extend)
ax = plt.gca()
ax.add_feature(cartopy.feature.COASTLINE)
ax.add_feature(cartopy.feature.BORDERS)
ax.add_feature(cartopy.feature.LAKES, alpha=0.5)
ax.add_feature(cartopy.feature.RIVERS)
ax.set_xticks([33, 34, 35])
ax.set_yticks([-10, -12, -14, -16])
lon_formatter = LongitudeFormatter(zero_direction_label=True)
lat_formatter = LatitudeFormatter()
ax.xaxis.set_major_formatter(lon_formatter)
ax.yaxis.set_major_formatter(lat_formatter)
It seems you want something like a contour plot. So instead of
plot = ax.plot(...)
you probably want to use
plot = ax.contourf(...)
Most probably you also want to give latitude and longitude as arguments to contourf,
plot = ax.contourf(longitude, latitude, Elev, ...)
You can try to add this:
import matplotlib.colors as colors
color = plt.get_cmap('YlGn') # and change cmap=mpl_cm.get_cmap('YlGn') to cmap=color
And also try to update your matplotlib:
pip install --upgrade matplotlib
EDIT
color = plt.get_cmap('YlGn') # and change cmap=mpl_cm.get_cmap('YlGn') to cmap=color

printing output as a table in python terminal and saving output as a .txt with proper headings

I have written a code to find approximated sum of an exponential function, which should run iteration till N-1 terms, then return the iteration no, sum, abs error and relative error for each iteration step.
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import math
N = input ("Please enter an integer at which term you want to turncate your summation")
x = input ("please enter a number for which you want to run the exponential summation e^{x}")
function= math.exp(x)
exp_sum = 0.0
abs_err = 0.0
rel_err = 0.0
for n in range (0, N):
factorial = math.factorial(n)
power = x**n
nth_term = power/factorial
exp_sum = exp_sum + nth_term
abs_err = abs(function - exp_sum)
rel_err = abs(abs_err)/abs(function)
print "The exponential function which has %d-term expansion, returns the approximated sum to be %.16f." % (n, exp_sum)
print "This approximated sum has an absolute error to be %.25f" % abs_err
print "and a relative error to be %.25f" % rel_err
right now, it actually looks silly printing values at each iteration and it only looks good till a few iteration, my plan is to get the output as a table with proper column headings (iteration, sum, abs err, rel err) in the terminal after I execute the .py file.
also I wish to save a .txt file of the output, if anyone has idea how to do that in python, I would very much appreciate the help and thanks.
You might use a pretty_table() function in order to pretty print tabular data, like this:
def pretty_table(rows, column_count, column_spacing=4):
aligned_columns = []
for column in range(column_count):
column_data = list(map(lambda row: row[column], rows))
aligned_columns.append((max(map(len, column_data)) + column_spacing, column_data))
for row in range(len(rows)):
aligned_row = map(lambda x: (x[0], x[1][row]), aligned_columns)
yield ''.join(map(lambda x: x[1] + ' ' * (x[0] - len(x[1])), aligned_row))
This little function, given a list of rows and the number of columns, will yield pretty-formatted table data, line by line. You can even adjust the spacing between columns if you wish.
In your particular code, you may do the following:
# At first, contains just the header columns.
rows = [['Term', 'Exponential sum', 'Absolute error', 'Relative error']]
for n in range (0, N):
factorial = math.factorial(n)
power = x**n
nth_term = power/factorial
exp_sum = exp_sum + nth_term
abs_err = abs(function - exp_sum)
rel_err = abs(abs_err)/abs(function)
rows.append((str(n), str(exp_sum), str(abs_err), str(rel_err)))
for line in pretty_table(rows, 4):
print(line)
For an input of N = 10, X = 5, this code outputs:
Term Exponential sum Absolute error Relative error
0 1.0 147.413159103 0.993262053001
1 6.0 142.413159103 0.959572318005
2 18.5 129.913159103 0.875347980517
3 39.3333333333 109.079825769 0.734974084703
4 65.375 83.0381591026 0.559506714935
5 91.4166666667 56.9964924359 0.384039345167
6 113.118055556 35.295103547 0.237816537027
7 128.619047619 19.7941114835 0.13337167407
8 138.307167659 10.1059914438 0.0680936347218
9 143.68945657 4.72370253291 0.0318280573062
If you want to redirect it into a file, do this instead of the last for loop:
with open('my_file.txt', 'w') as output:
for line in pretty_table(rows, 4):
print >> output, line

Rehaspe a 2D matrix into a 3D ? (x, y) -> (x/72,72,y)

I have a text file from which I load the original matrix.
The text file has comments with # and it basically has multiple matrices of 77*44.
I would like to read this file and store each matrix from this complete number of mats.
import os
import sys
import numpy as np
from numpy import zeros, newaxis
import io
#read the txt file and store all vaules into a np.array
f = open('/path/to/akiyo_cif.txt','r')
x = np.loadtxt(f,dtype=np.uint8,comments='#',delimiter='\t')
nof = x.shape[0]/72
print ("shape after reading the file is "+ str(x.shape) )
#example program that works
newmat =np.zeros((nof+1,72,44))
for i in range(0,nof+1):
newmat[i,:,:]= x[i*72 : (i*72)+72 , :]
print ("Shape after resizing the file is "+ str(newmat.shape) )
Output :-Shape after reading the file is (21240, 44)
Shape after resizing the file is (274, 72, 44)
If I run this
newmat=x.reshape((nof,72,44))
newmat = x.reshape((nof,72,44))
ValueError: total size of new array must be unchanged
I would like to re size this matrix to (21240/72,72,44).
Where the first 77 lines corresponds to newmat[0,:,:] and the next 77 lines to newmat[1,:,:].
Use x.reshape(-1, 72, 44):
In [146]: x = np.loadtxt('data' ,dtype=np.uint8, comments='#', delimiter='\t')
In [147]: x = x.reshape(-1, 72, 44)
In [148]: x.shape
Out[148]: (34, 72, 44)
When you specify one of the dimensions as -1, np.reshape replaces the -1 with a value inferred from the length of the array and the remaining dimensions.

tuple index error while doing regression fit

I'm writing a code to do linear single variate regression analysis of data using numpy. I know that fit() function in Python uses np.array() but the program is throwing me tuple index error and I'm at my wit's end now. Here is my code:
def linear_model_main(X_parameter, Y_parameter, prediction_value):
regression = linear_model.LinearRegression()
regression.fit(X_parameter, Y_parameter, prediction_value)
prediction_outcome = regression.predict(prediction_value)
predictions = {}
predictions['intercept'] = regression.intercept_
predictions['coefficient'] = regression.coef_
predictions['predicted_value'] = prediction_outcome
return predictions
X, Y = get_data(filename)
Xarr = np.array(X)
Yarr = np.array(Y)
predictionvalue = 70
result = linear_model_main(Xarr, Yarr, predictionvalue)
Xarr and Yarr are np.arrays of separate columns of data taken from a csv file and are basically the X and Y coordinate values in the regression. When printed, they look this:
[ 7. 73. 49. ..., 56. 56. 56.]
[ 5863. 5860. 5860. ..., 5860. 5860. 5860.]
It is a huge dataset (about 130,000 rows and 35 columns).