Related
I am creating a prediction machine with four variables. When I add the variables it all messes up and gives me:
ValueError: shapes (1,4) and (5,4) not aligned: 4 (dim 1) != 5 (dim 0)
code
import pandas as pd
from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk
import statsmodels.api as sm
# Approach 1: Import the data into Python
Stock_Market = pd.read_csv(r'Training_Nis_New2.csv')
df = DataFrame(Stock_Market,columns=['Month 1','Month 2','Month 3','Month
4','Month 5','Month 6','Month 7','Month 8',
'Month 9','Month 10','Month 11','Month
12','FSUTX','MMUKX','FUFRX','RYUIX','Interest R','Housing
Sale','Unemployement Rate','Conus Average Temperature
Rank','30FSUTX','30MMUKX','30FUFRX','30RYUIX'])
X = df[['Month 1','Interest R','Housing Sale','Unemployement Rate','Conus Average Temperature Rank']]
# here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df[['30FSUTX','30MMUKX','30FUFRX','30RYUIX']]
# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# prediction with sklearn
# prediction with sklearn
HS=5.5
UR=6.7
CATR=8.9
New_Interest_R = 4.6
print('Predicted Stock Index Price: \n', regr.predict([[UR ,HS ,CATR
,New_Interest_R]]))
# with statsmodel
X = df[['Month 1','Interest R','Housing Sale','Unemployement Rate','Conus Average Temperature Rank']]
Y = df['30FSUTX']
print('\n\n*** Fund = FSUTX')
X = sm.add_constant(X) # adding a constant
model = sm.OLS(Y, X).fit()
predictions = model.predict(X)
print_model = model.summary()
print(print_model)
I'm trying to use PVLIB to estimate output power for a PV System installed in the west of my country.
As an example I've got 2 days of hourly GHI, 2m Temperature and 10m wind speed from MERRA2 reanalysis.
I want to estimate how much power a fixed PV System or 1 axis tracking system would generate using the forementioned dataset, and ModelChain function from PVLIB. I first estimate DNI and DHI from GHI data using DISC model to obtain DNI and then DHI is the difference between GHI and DNI*cos(Z)
a) First behaviour I am not completely sure if it is Ok. Here is the plot of GHI, DNI , DHI, T2m and Wind Speed. It seems that DNI is shifted with its maximum occurring 1 hour before GHI maximum.
Weather Figure
After preparing irradiance data I calculated AC using Model Chain, specifying the fixed PV System and 1 axis single tracking system.
The thing is that I don't trust in the AC output for a 1-single axis system. I expected a plateau shape of AC output and i found a kind of weird behaviour.
Here is the otuput values of power generation i expected to see:
Expectation
And here is the estimated output by PVLIB
Reality
I hope someone can help me to find the error on my proccedure.
Here is the code:
# =============================================================================
# Example of using MERRA2 data and PVLIB
# =============================================================================
import numpy as np
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt
import pvlib
from pvlib.pvsystem import PVSystem
from pvlib.location import Location
from pvlib.modelchain import ModelChain
# =============================================================================
# 1) Create small data set extracted from MERRA
# =============================================================================
GHI = np.array([0,0,0,0,0,0,0,0,0,10.8,148.8,361,583,791.5,998.5,1105.5,1146.5,1118.5,1023.5,
860.2,650.2,377.1,165.1,16,0,0,0,0,0,0,0,0,0,11.3,166.2,395.8,624.5,827,986,
1065.5,1079,1025.5,941.5,777,581.5,378.9,156.2,20.6,0,0,0,0])
temp_air = np.array([21.5,20.5,19.7,19.6,18.8,17.9,17.1,16.5,16.2,16.2,17,21.3,24.7,26.9,28.8,30.5,
31.6,32.4,33,33.3,32.9,32,30.6,28.7,25.4,23.9,22.6,21.2,20.3,19.9,19.5,19.1,18.4,
17.7,18.3,23,25.1,27.3,29.5,31.2,32.1,32.6,32.6,32.5,31.8,30.7,29.6,28.1,24.6,22.9,
22.3,23.2])
wind_speed = np.array([3.1,2.7,2.5,2.6,2.8,3,3,3,2.8,2.5,2.1,1,2.2,3.7,4.8,5.6,6.1,6.4,6.5,6.6,6.3,5.8,5.3,
3.7,3.9,4,3.6,3.4,3.4,3,2.6,2.3,2.1,2,2.2,2.7,3.2,4.3,5.1,5.6,5.7,5.8,5.8,5.7,5.4,4.8,
4.4,3.1,2.7,2.3,1.1,0.6])
local_timestamp = pd.DatetimeIndex(start='1979-12-31 21:00', end='1980-01-03 00:00', freq='1h',tz='America/Argentina/Buenos_Aires')
d = {'ghi':GHI,'temp_air':temp_air,'wind_speed':wind_speed}
data = pd.DataFrame(data=d)
data.index = local_timestamp
lat = -31.983
lon = -68.530
location = Location(latitude = lat,
longitude = lon,
tz = 'America/Argentina/Buenos_Aires',
altitude = 601)
# =============================================================================
# 2) SOLAR POSITION AND ATMOSPHERIC MODELING
# =============================================================================
solpos = pvlib.solarposition.get_solarposition(time = local_timestamp,
latitude = lat,
longitude = lon,
altitude = 601)
# DNI and DHI calculation from GHI data
DNI = pvlib.irradiance.disc(ghi = data.ghi,
solar_zenith = solpos.zenith,
datetime_or_doy = local_timestamp)
DHI = data.ghi - DNI.dni*np.cos(np.radians(solpos.zenith.values))
d = {'ghi': data.ghi,'dni': DNI.dni,'dhi': DHI,'temp_air':data.temp_air,'wind_speed':data.wind_speed }
weather = pd.DataFrame(data=d)
plt.plot(weather)
# =============================================================================
# 3) SYSTEM SPECIFICATIONS
# =============================================================================
# load some module and inverter specifications
sandia_modules = pvlib.pvsystem.retrieve_sam('SandiaMod')
cec_inverters = pvlib.pvsystem.retrieve_sam('cecinverter')
sandia_module = sandia_modules['Canadian_Solar_CS5P_220M___2009_']
cec_inverter = cec_inverters['Power_Electronics__FS2400CU15__645V__645V__CEC_2018_']
# Fixed system with tilt=abs(lat)-10
f_system = PVSystem( surface_tilt = abs(lat)-10,
surface_azimuth = 0,
module = sandia_module,
inverter = cec_inverter,
module_parameters = sandia_module,
inverter_parameters = cec_inverter,
albedo = 0.20,
modules_per_string = 100,
strings_per_inverter = 100)
# 1 axis tracking system
t_system = pvlib.tracking.SingleAxisTracker(axis_tilt = 0, #abs(-33.5)-10
axis_azimuth = 0,
max_angle = 52,
backtrack = True,
module = sandia_module,
inverter = cec_inverter,
module_parameters = sandia_module,
inverter_parameters = cec_inverter,
name = 'tracking',
gcr = .3,
modules_per_string = 100,
strings_per_inverter = 100)
# =============================================================================
# 4) MODEL CHAIN USING ALL THE SPECIFICATIONS for a fixed and 1 axis tracking systems
# =============================================================================
mc_f = ModelChain(f_system, location)
mc_t = ModelChain(t_system, location)
# Next, we run a model with some simple weather data.
mc_f.run_model(times=weather.index, weather=weather)
mc_t.run_model(times=weather.index, weather=weather)
# =============================================================================
# 5) Get only AC output form a fixed and 1 axis tracking systems and assign
# 0 values to each NaN
# =============================================================================
d = {'fixed':mc_f.ac,'tracking':mc_t.ac}
AC = pd.DataFrame(data=d)
i = np.isnan(AC.tracking)
AC.tracking[i] = 0
i = np.isnan(AC.fixed)
AC.fixed[i] = 0
plt.plot(AC)
I hope anyone could help me with the intepretation of the results and debugging of the code.
Thanks a lot!
I suspect your issue is due to the way the hourly data is treated. Be sure that you're consistent with the interval labeling (beginning/end) and treatment of instantaneous vs. average data. One likely cause is using hourly average GHI data to derive DNI data. pvlib.solarposition.get_solarposition returns the solar position at the instants in time that are passed to it. So you're mixing up hourly average GHI values with instantaneous solar position values when you use pvlib.irradiance.disc to calculate DNI and when you calculate DHI. Shifting your time index by 30 minutes will reduce, but not eliminate, the error. Another approach is to resample the input data to be of 1-5 minute resolution.
I'm trying to plot and compare the frequency spectrum of two .wav files. I wrote the following in python for that:
import pylab
import time
from scipy import fft, arange
from numpy import linspace
from scipy.io.wavfile import read
import gc
import sys
params = {'figure.figsize': (20, 15)}
pylab.rcParams.update(params)
def plotSpec(y, Fs):
n = len(y) # lungime semnal
k = arange(n)
T = n / Fs
frq = k / T # two sides frequency range
frq = frq[range(n / 2)] # one side frequency range
ff_valu = fft(y) / n # fft computing and normalization
ff_valu = ff_valu[range(n / 2)]
pylab.plot(frq, abs(ff_valu), 'r') # plotting the spectrum
pylab.tick_params(axis='x', labelsize=8)
pylab.tick_params(axis='y', labelsize=8)
pylab.tick_params()
pylab.xticks(rotation=45)
pylab.xlabel('Frequency')
pylab.ylabel('Power')
del frq, ff_valu, n, k, T, y
gc.collect()
return
def graph_plot(in_file, graph_loc, output_folder, count, func_type):
graph_loc = int(graph_loc)
rate = 0
data = 0
rate, data = read(in_file)
dlen = len(data)
print "dlen=", dlen
lungime = dlen
timp = dlen / rate
print "timp=", timp
t = linspace(0, timp, dlen)
pylab.subplot(3, 2, graph_loc)
pylab.plot(t, data)
fl = in_file.split('/')
file_name = fl[len(fl) - 1]
pylab.title(file_name)
pylab.tick_params(axis='x', labelsize=8)
pylab.tick_params(axis='y', labelsize=8)
pylab.xticks(rotation=45)
pylab.xlabel('Time')
pylab.ylabel('Numerical level')
pylab.subplot(3, 2, graph_loc + 2)
plotSpec(data, rate)
pylab.subplot(3, 2, graph_loc + 4)
if rate == 16000:
frq = 16
else:
frq = 8
pylab.specgram(data, NFFT=128, noverlap=0, Fs=frq)
pylab.tick_params(axis='x', labelsize=8)
pylab.tick_params(axis='y', labelsize=8)
pylab.xticks(rotation=45)
pylab.xlabel('Time')
pylab.ylabel('Frequency')
if graph_loc == 2:
name = in_file.split("/")
lnth = len(name)
name = in_file.split("/")[lnth - 1].split(".")[0]
print "File=", name
if func_type == 'a':
save_file = output_folder + 'RESULT_' + name + '.png'
else:
save_file = output_folder + 'RESULT_graph.png'
pylab.savefig(save_file)
pylab.gcf()
pylab.gca()
pylab.close('all')
del in_file, graph_loc, output_folder, count, t, rate, data, dlen, timp
gc.get_referrers()
gc.collect()
def result_plot(orig_file, rec_file, output_folder, seq):
graph_loc = 1
graph_plot(orig_file, graph_loc, output_folder, seq, 'a')
graph_loc = 2
graph_plot(rec_file, graph_loc, output_folder, seq, 'a')
sys.exit()
save_file="~/Documents/Output/"
o_file='~/Documents/audio/orig_8sec.wav'
#o_file='~/Documents/audio/orig_4sec.wav'
r_file='~/Documents/audio/rec_8sec.wav'
#r_file='~/Documents/audio/rec_4sec.wav'
print 10*"#"+"Start"+10*"#"
result_plot(o_file, r_file,save_file, 'a')
print 10*"#"+"End"+10*"#"
pylab.close('all')
With the above code, I see that the scale of y-axis appear different:
It clearly shows an automatically assigned scale. With this any amplification or attenuation with respect to the original file is difficult to be made obvious unless the person looks up the values.
Since I cannot really predict what would be the max amplitude among either files when I use multiple samples, how can I make both y-axis on each subplot set to the max of either so that the scale is the same and amplification is more clear?
I am adding my explanation you asked for in the comments above as an answer below. The idea is to selectively modify the x-axis limits for some particular subplots
fig, axes = plt.subplots(2,3,figsize=(16,8))
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)
for i, row in enumerate(axes):
for j, col in enumerate(row):
col.plot(x, y)
col.set_title("Title here", fontsize=18)
if i == 1 and (j == 1 or j == 2):
col.set_xlim(0, np.pi)
plt.tight_layout()
Output
An alternative to setting the limits yourself is to create the figure and axes first using
fig, axes = plt.subplots(3, 2)
This has an optional argument sharex. From the docs
sharex, sharey : bool or {'none', 'all', 'row', 'col'}, default: False
Controls sharing of properties among x (sharex) or y (sharey) axes:
True or 'all': x- or y-axis will be shared among all subplots.
False or 'none': each subplot x- or y-axis will be independent.
'row': each subplot row will share an x- or y-axis.
'col': each subplot column will share an x- or y-axis.
Therefore, we can make sure the rows share the same x axis values as each other by using the argument sharex="row":
fig, axes = plt.subplots(3, 2, sharex="row")
If you want the y axis to be shared you can use sharey="row" instead/aswell.
Taking cues from other answers, I happened to make it work the following way:
import matplotlib.pyplot as pl
import time
from scipy import fft, arange
from numpy import linspace
from scipy.io.wavfile import read
import gc
import sys
def plotWavAmplLev(in_file, sub_graph):
print "Printing Signal graph (amplitude vs seconds)...."
rate, data = read(in_file)
dlen = len(data)
timp = dlen / rate
t = linspace(0,timp,dlen)
sub_graph.plot(t, data)
fl = in_file.split('/')
file_name = fl[len(fl) - 1]
sub_graph.set_title(file_name)
sub_graph.tick_params(axis='x', labelsize=10)
sub_graph.tick_params(axis='y', labelsize=10)
sub_graph.set_xlabel('Time')
sub_graph.set_ylabel('Numerical level')
def plotSpectralDensity(y, fs, sub_graph):
print "Printing Power Spectral Density (dB vs Hz)...."
n = len(y) # lungime semnal
k = arange(n)
T = n / fs
frq = k / T # two sides frequency range
frq = frq[range(n / 2)] # one side frequency range
ff_valu = fft(y) / n # fft computing and normalization
ff_valu = ff_valu[range(n / 2)]
sub_graph.plot(frq, abs(ff_valu), 'r') # plotting the spectrum
sub_graph.tick_params(axis='x', labelsize=10)
sub_graph.tick_params(axis='y', labelsize=10)
sub_graph.tick_params()
sub_graph.set_xlabel('Frequency')
sub_graph.set_ylabel('Power')
del frq, ff_valu, n, k, T, y
gc.collect()
return
def plotSpectrogram(rate, data, sub_graph):
print "Plotting Spectrogram (kHz vs seconds)...."
if rate == 16000:
frq = 16
else:
frq = 8
sub_graph.specgram(data, NFFT=128, noverlap=0, Fs=frq)
sub_graph.tick_params(axis='x', labelsize=10)
sub_graph.tick_params(axis='y', labelsize=10)
sub_graph.set_xlabel('Time')
sub_graph.set_ylabel('Frequency')
def graph_plot(in_file_list, output_folder, func_type):
orig_file = in_file_list[0]
rec_file = in_file_list[1]
g_index = 1
g_rows = 3
g_cols = 2
fig, axes = pl.subplots(g_rows, g_cols, figsize=(20,15), sharex="row", sharey="row")
for i, row in enumerate(axes):
for j, col in enumerate(row):
if i == 0 :
if j == 0:
print "Source file waveform is being plotted...."
rate, data = read(orig_file)
plotWavAmplLev(orig_file, col)
continue
elif j == 1:
print "Recorded file waveform is being plotted...."
rate, data = read(rec_file)
plotWavAmplLev(rec_file, col)
continue
elif i == 1:
if j == 0:
print "Source file PSD is being plotted...."
rate, data = read(orig_file)
plotSpectralDensity(data, rate, col)
continue
elif j == 1:
print "Recorded file PSD is being plotted...."
rate, data = read(rec_file)
plotSpectralDensity(data, rate, col)
continue
elif i == 2:
if j == 0:
print "Source file Spectrogram is being plotted...."
rate, data = read(orig_file)
plotSpectrogram(rate, data, col)
continue
elif j == 1:
print "Recorded file Spectrogram is being plotted...."
rate, data = read(rec_file)
plotSpectrogram(rate, data, col)
continue
pl.tight_layout()
name = in_file_list[1].split("/")
lnth = len(name)
name = in_file_list[1].split("/")[lnth - 1].split(".")[0]
print "File=", name
if func_type == 'a':
save_file = output_folder + 'RESULT_' + name + '.png'
else:
save_file = output_folder + 'RESULT_graph.png'
pl.savefig(save_file)
pl.gcf()
pl.gca()
pl.close('all')
del in_file_list, output_folder, rate, data
gc.get_referrers()
gc.collect()
def result_plot(orig_file, rec_file, output_folder, seq):
flist = [orig_file, rec_file]
graph_plot(flist, output_folder, 'a')
s_file="/<path>/Output/"
#o_file='/<path>/short_orig.wav'
o_file='/<path>/orig.wav'
#r_file='/<path>/short_rec.wav'
r_file='/<path>/rec.wav'
print 10*"#"+"Start"+10*"#"
result_plot(o_file, r_file,s_file, 'a')
print 10*"#"+"End"+10*"#"
pl.close('all')
Now, I got the y-axis scales fixed and get the output as follows:
This makes comparison a lot easier now.
I am trying to extract data from an excel spreadsheet, then find a percent change between adjacent rows. The columns that I would like to do this manipulation on is column 1 and 4. I would like to then graph these percent changes in two different bar charts using subplots using column 0 as the x axis.
I am able to do everything except extract the data and formulate a percent change between adjacent rows. The formula for the percent change is Current/previous-1 or (r,0)/(r-1,0)-1. Below is my current script:
import xlrd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
import matplotlib.dates as mdates
import datetime
from matplotlib import rc
rc('mathtext', default='regular')
file_location = "/Users/adampatel/Desktop/psw01.xls"
workbook = xlrd.open_workbook(file_location, on_demand = False)
worksheet = workbook.sheet_by_name('Data 1')
x = [worksheet.cell_value(i+1699, 0) for i in range(worksheet.nrows-1699)]
y1 = [worksheet.cell_value(i+1699, 1) for i in range(worksheet.nrows-1699)]
y2 = [worksheet.cell_value(i+1699, 4) for i in range(worksheet.nrows-1699)]
fig = plt.figure()
ax1 = fig.add_subplot(211)
ax2 = fig.add_subplot(212, sharex = ax1)
start_date = datetime.date(1899, 12, 30)
dates=[start_date + datetime.timedelta(xval) for xval in x]
ax1.xaxis.set_major_locator(mdates.MonthLocator((), bymonthday=1, interval=2))
ax1.xaxis.set_minor_locator(mdates.MonthLocator((), bymonthday=1, interval=1))
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b'%y"))
ly1 = ax1.bar(dates, y1, 0.9)
ly2 = ax2.bar(dates, y2, 0.9)
ax1.grid()
ax2.grid()
ax1.set_ylim(-3,3)
ax2.set_ylim(-3,3)
fig.text(0.5, 0.04, 'Inventory Weekly Percent Change', ha='center', va='center', size = '14')
fig.text(0.06, 0.5, 'Weekly Percent Change', ha='center', va='center', size = '14', rotation='vertical')
ax1.set_title('Oil', size = '12')
ax2.set_title('Gasoline', size = '12')
plt.savefig('Gasoline Inventories Weekly Percent Change.png', bbox_inches='tight', dpi=300)
plt.show()
Given list of values:
y1 = [1000,1010,950,1050,1100,1030]
Pure python solution:
Use the zip function to create tuples of the numerator and denominator. Then use list comprehension to get a list of the percent changes.
pct_chg = [1.0*num / den - 1 for num, den in zip(y1[1:], y1)]
Numpy solution:
Convert list to numpy array, then perform computation using array slices.
a1 = np.array(y1)
pct_chg = np.divide(a1[1:],a1[:-1])-1
Pandas package solution:
Convert list to Pandas series and use the built-in percent change function
s1 = pd.Series(y1)
pct_chg = s1.pct_change()
Now, pct_chg is a series too. You can get its values in a numpy array via pct_chg.values. Matplotlib should accept numpy arrays as containers in most cases.
I have a point cloud in 4 dimensions, where each point in the cloud has a location and a value (x,y,z,Value). In addition, I have a 'special' point, S0, within the 3d point cloud; I've used this example to find the closest 10 points in the cloud, relative to S0. Now, I have a numpy array for each of the 10 closest points and their values. How can I interpolate these 10 points, to find the interpolated value at point S0? Example code is shown below:
import numpy as np
import matplotlib.pyplot as plt
numpoints = 20
linexs = 320
lineys = 40
linezs = 60
linexe = 20
lineye = 20
lineze = 0
# Create vectors of points
xpts = np.linspace(linexs, linexe, numpoints)
ypts = np.linspace(lineys, lineye, numpoints)
zpts = np.linspace(linezs, lineze, numpoints)
lin = np.dstack((xpts,ypts,zpts))
# Image line of points
fig = plt.figure()
ax = fig.add_subplot(211, projection='3d')
ax.set_xlim(0,365); ax.set_ylim(-85, 85); ax.set_zlim(0, 100)
ax.plot_wireframe(xpts, ypts, zpts)
ax.view_init(elev=12, azim=78)
def randrange(n, vmin, vmax):
return (vmax - vmin)*np.random.rand(n) + vmin
n = 10
for n in range(21):
xs = randrange(n, 0, 350)
ys = randrange(n, -75, 75)
zs = randrange(n, 0, 100)
ax.scatter(xs, ys, zs)
dat = np.dstack((xs,ys,zs))
ax.set_xlabel('X Label')
ax.set_xlim(0,350)
ax.set_ylabel('Y Label')
ax.set_ylim(-75,75)
ax.set_zlabel('Z Label')
ax.set_zlim(0,100)
ax = fig.add_subplot(212, projection='3d')
ax.set_xlim(0,365); ax.set_ylim(-85, 85); ax.set_zlim(0, 100)
ax.plot_wireframe(xpts,ypts,zpts)
ax.view_init(elev=12, azim=78)
plt.show()
dist = []
# Calculate distance from first point to all other points in cloud
for l in range(len(xpts)):
aaa = lin[0][0]-dat
dist.append(np.sqrt(aaa[0][l][0]**2+aaa[0][l][1]**2+aaa[0][l][2]**2))
full = np.dstack((dat,dist))
aaa = full[0][full[0][:,3].argsort()]
print(aaa[0:10])
A basic example. Note that the meshgrid is not needed for the interpolation, but only to make a fast ufunc to generate an example function A=f(x,y,z), here A=x+y+z.
from scipy.interpolate import interpn
import numpy as np
#make up a regular 3d grid
X=np.linspace(-5,5,11)
Y=np.linspace(-5,5,11)
Z=np.linspace(-5,5,11)
xv,yv,zv = np.meshgrid(X,Y,Z)
# make up a function
# see http://docs.scipy.org/doc/numpy/reference/ufuncs.html
A = np.add(xv,np.add(yv,zv))
#this one is easy enough for us to know what to expect at (.5,.5,.5)
# usage : interpn(points, values, xi, method='linear', bounds_error=True, fill_value=nan)
interpn((X,Y,Z),A,[0.5,0.5,0.5])
Output:
array([ 1.5])
If you pass in an array of points of interest, it will give you multiple answers.