Model Chain example PVLIB - don't trust in 1 axis tracking AC output - pvlib

I'm trying to use PVLIB to estimate output power for a PV System installed in the west of my country.
As an example I've got 2 days of hourly GHI, 2m Temperature and 10m wind speed from MERRA2 reanalysis.
I want to estimate how much power a fixed PV System or 1 axis tracking system would generate using the forementioned dataset, and ModelChain function from PVLIB. I first estimate DNI and DHI from GHI data using DISC model to obtain DNI and then DHI is the difference between GHI and DNI*cos(Z)
a) First behaviour I am not completely sure if it is Ok. Here is the plot of GHI, DNI , DHI, T2m and Wind Speed. It seems that DNI is shifted with its maximum occurring 1 hour before GHI maximum.
Weather Figure
After preparing irradiance data I calculated AC using Model Chain, specifying the fixed PV System and 1 axis single tracking system.
The thing is that I don't trust in the AC output for a 1-single axis system. I expected a plateau shape of AC output and i found a kind of weird behaviour.
Here is the otuput values of power generation i expected to see:
Expectation
And here is the estimated output by PVLIB
Reality
I hope someone can help me to find the error on my proccedure.
Here is the code:
# =============================================================================
# Example of using MERRA2 data and PVLIB
# =============================================================================
import numpy as np
import pandas as pd
import pandas as pd
import matplotlib.pyplot as plt
import pvlib
from pvlib.pvsystem import PVSystem
from pvlib.location import Location
from pvlib.modelchain import ModelChain
# =============================================================================
# 1) Create small data set extracted from MERRA
# =============================================================================
GHI = np.array([0,0,0,0,0,0,0,0,0,10.8,148.8,361,583,791.5,998.5,1105.5,1146.5,1118.5,1023.5,
860.2,650.2,377.1,165.1,16,0,0,0,0,0,0,0,0,0,11.3,166.2,395.8,624.5,827,986,
1065.5,1079,1025.5,941.5,777,581.5,378.9,156.2,20.6,0,0,0,0])
temp_air = np.array([21.5,20.5,19.7,19.6,18.8,17.9,17.1,16.5,16.2,16.2,17,21.3,24.7,26.9,28.8,30.5,
31.6,32.4,33,33.3,32.9,32,30.6,28.7,25.4,23.9,22.6,21.2,20.3,19.9,19.5,19.1,18.4,
17.7,18.3,23,25.1,27.3,29.5,31.2,32.1,32.6,32.6,32.5,31.8,30.7,29.6,28.1,24.6,22.9,
22.3,23.2])
wind_speed = np.array([3.1,2.7,2.5,2.6,2.8,3,3,3,2.8,2.5,2.1,1,2.2,3.7,4.8,5.6,6.1,6.4,6.5,6.6,6.3,5.8,5.3,
3.7,3.9,4,3.6,3.4,3.4,3,2.6,2.3,2.1,2,2.2,2.7,3.2,4.3,5.1,5.6,5.7,5.8,5.8,5.7,5.4,4.8,
4.4,3.1,2.7,2.3,1.1,0.6])
local_timestamp = pd.DatetimeIndex(start='1979-12-31 21:00', end='1980-01-03 00:00', freq='1h',tz='America/Argentina/Buenos_Aires')
d = {'ghi':GHI,'temp_air':temp_air,'wind_speed':wind_speed}
data = pd.DataFrame(data=d)
data.index = local_timestamp
lat = -31.983
lon = -68.530
location = Location(latitude = lat,
longitude = lon,
tz = 'America/Argentina/Buenos_Aires',
altitude = 601)
# =============================================================================
# 2) SOLAR POSITION AND ATMOSPHERIC MODELING
# =============================================================================
solpos = pvlib.solarposition.get_solarposition(time = local_timestamp,
latitude = lat,
longitude = lon,
altitude = 601)
# DNI and DHI calculation from GHI data
DNI = pvlib.irradiance.disc(ghi = data.ghi,
solar_zenith = solpos.zenith,
datetime_or_doy = local_timestamp)
DHI = data.ghi - DNI.dni*np.cos(np.radians(solpos.zenith.values))
d = {'ghi': data.ghi,'dni': DNI.dni,'dhi': DHI,'temp_air':data.temp_air,'wind_speed':data.wind_speed }
weather = pd.DataFrame(data=d)
plt.plot(weather)
# =============================================================================
# 3) SYSTEM SPECIFICATIONS
# =============================================================================
# load some module and inverter specifications
sandia_modules = pvlib.pvsystem.retrieve_sam('SandiaMod')
cec_inverters = pvlib.pvsystem.retrieve_sam('cecinverter')
sandia_module = sandia_modules['Canadian_Solar_CS5P_220M___2009_']
cec_inverter = cec_inverters['Power_Electronics__FS2400CU15__645V__645V__CEC_2018_']
# Fixed system with tilt=abs(lat)-10
f_system = PVSystem( surface_tilt = abs(lat)-10,
surface_azimuth = 0,
module = sandia_module,
inverter = cec_inverter,
module_parameters = sandia_module,
inverter_parameters = cec_inverter,
albedo = 0.20,
modules_per_string = 100,
strings_per_inverter = 100)
# 1 axis tracking system
t_system = pvlib.tracking.SingleAxisTracker(axis_tilt = 0, #abs(-33.5)-10
axis_azimuth = 0,
max_angle = 52,
backtrack = True,
module = sandia_module,
inverter = cec_inverter,
module_parameters = sandia_module,
inverter_parameters = cec_inverter,
name = 'tracking',
gcr = .3,
modules_per_string = 100,
strings_per_inverter = 100)
# =============================================================================
# 4) MODEL CHAIN USING ALL THE SPECIFICATIONS for a fixed and 1 axis tracking systems
# =============================================================================
mc_f = ModelChain(f_system, location)
mc_t = ModelChain(t_system, location)
# Next, we run a model with some simple weather data.
mc_f.run_model(times=weather.index, weather=weather)
mc_t.run_model(times=weather.index, weather=weather)
# =============================================================================
# 5) Get only AC output form a fixed and 1 axis tracking systems and assign
# 0 values to each NaN
# =============================================================================
d = {'fixed':mc_f.ac,'tracking':mc_t.ac}
AC = pd.DataFrame(data=d)
i = np.isnan(AC.tracking)
AC.tracking[i] = 0
i = np.isnan(AC.fixed)
AC.fixed[i] = 0
plt.plot(AC)
I hope anyone could help me with the intepretation of the results and debugging of the code.
Thanks a lot!

I suspect your issue is due to the way the hourly data is treated. Be sure that you're consistent with the interval labeling (beginning/end) and treatment of instantaneous vs. average data. One likely cause is using hourly average GHI data to derive DNI data. pvlib.solarposition.get_solarposition returns the solar position at the instants in time that are passed to it. So you're mixing up hourly average GHI values with instantaneous solar position values when you use pvlib.irradiance.disc to calculate DNI and when you calculate DHI. Shifting your time index by 30 minutes will reduce, but not eliminate, the error. Another approach is to resample the input data to be of 1-5 minute resolution.

Related

Why do loss valued increase after some epochs in sampled_softmax_loss

I'm using Tensorflow to train a word2vec skip gram model. The computation graph is in the code below:
# training data
self.dataset = tf.data.experimental.make_csv_dataset(file_name, batch_size=self.batch_size, column_names=['input', 'output'], header=False, num_epochs=self.epochs)
self.datum = self.dataset.make_one_shot_iterator().get_next()
self.inputs, self.labels = self.datum['input'], self.datum['output']
# embedding layer
self.embedding_g = tf.Variable(tf.random_uniform((self.n_vocab, self.n_embedding), -1, 1))
self.embed = tf.nn.embedding_lookup(self.embedding_g, self.inputs)
# softmax layer
self.softmax_w_g = tf.Variable(tf.truncated_normal((self.n_context, self.n_embedding)))
self.softmax_b_g = tf.Variable(tf.zeros(self.n_context))
# Calculate the loss using negative sampling
self.labels = tf.reshape(self.labels, [-1, 1])
self.loss = tf.nn.sampled_softmax_loss(
weights=self.softmax_w_g,
biases=self.softmax_b_g,
labels=self.labels,
inputs=self.embed,
num_sampled=self.n_sampled,
num_classes=self.n_context)
self.cost = tf.reduce_mean(self.loss)
self.optimizer = tf.train.AdamOptimizer().minimize(self.cost)
But after 25 epochs, loss values begin to increase. Is there any reason for this?

My neural network takes too much time to train one epoch

I am training a neural network which tries to classify a traffic signs, but it takes too much time to train only one epoch, maybe 30+ mins for just one epoch, I have set the batch size to 64 and the learning rate to be 0.002, the input is 20x20 pixels with 3 channels, and the model summary shows that it is training 173,931 parameters, is that too much or good?
Here is the network architecture
import torch.nn as nn
import torch.nn.functional as F
from torchsummary import summary
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
#Convolutional Layers
self.conv1 = nn.Conv2d(3,16,3,padding=1)
self.conv2 = nn.Conv2d(16,32,3,padding=1)
#Max Pooling Layers
self.pool = nn.MaxPool2d(2,2)
#Linear Fully connected layers
self.fc1 = nn.Linear(32*5*5,200)
self.fc2 = nn.Linear(200,43)
#Dropout
self.dropout = nn.Dropout(p=0.25)
def forward(self,x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1,32*5*5)
x = self.dropout(x)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
Here is the optimizer instance
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optim = optim.SGD(model.parameters(),lr = 0.002)
Here is the training code
epochs = 20
valid_loss_min = np.Inf
print("Training the network")
for epoch in range (1,epochs+1):
train_loss = 0
valid_loss = 0
model.train()
for data,target in train_data:
if gpu_available:
data,target = data.cuda(),target.cuda()
optim.zero_grad()
output = model(data)
loss = criterion(output,target)
loss.backward()
optim.step()
train_loss += loss.item()*data.size(0)
#########################
###### Validate #########
model.eval()
for data,target in valid_data:
if gpu_available:
data,target = data.cuda(),target.cuda()
output = model(data)
loss = criterion(output,target)
valid_loss += loss.item()*data.size(0)
train_loss = train_loss/len(train_data.dataset)
valid_loss = train/len(valid_data.dataset)
print("Epoch {}.....Train Loss = {:.6f}....Valid Loss = {:.6f}".format(epoch,train_loss,valid_loss))
if valid_loss < valid_loss_min:
torch.save(model.state_dict(), 'model_traffic.pt')
print("Valid Loss min {:.6f} >>> {:.6f}".format(valid_loss_min, valid_loss))
I am using GPU through google colab

QuantLib (Python) ZeroCouponBond. Appropriate yield curve

I want to find the NPV of a ZeroCouponBond in Quantlib. I am adapting the code from https://quant.stackexchange.com/q/32539 for FixedRateBonds. The code below runs (82.03), but I am not sure which compoundingFrequency to set for the term structure in the case of a zero coupon bond.
The only thing that makes sense to me is to set the discount factors to annual compouding. Or is there anything particular about using ZeroCouponBond together with ZeroCurve that I am overlooking?
from QuantLib import *
# Construct yield curve
calc_date = Date(1, 1, 2017)
Settings.instance().evaluationDate = calc_date
spot_dates = [Date(1,1,2017), Date(1,1,2018), Date(1,1,2027)]
spot_rates = [0.04, 0.04, 0.04]
day_count = SimpleDayCounter()
calendar = NullCalendar()
interpolation = Linear()
compounding = Compounded
compounding_frequency = Annual
spot_curve = ZeroCurve(spot_dates, spot_rates, day_count, calendar,
interpolation, compounding,
compounding_frequency)
spot_curve_handle = YieldTermStructureHandle(spot_curve)
# Construct bond schedule
issue_date = Date(1, 1, 2017)
maturity_date = Date(1, 1, 2022)
settlement_days = 0
face_value = 100
bond = ZeroCouponBond(settlement_days,
# calendar
calendar,
# faceamout
face_value,
# maturity_date
maturity_date,
# paymentconvention
Following,
# redemption
face_value,
# issue date
issue_date
)
# Set Valuation engine
bond_engine = DiscountingBondEngine(spot_curve_handle)
bond.setPricingEngine(bond_engine)
# Calculate present value
value = bond.NPV()
The frequency doesn't depend on the fact that the bond is a zero-coupon bond; it depends on how the rates you're using were calculated or quoted. If the 4% was calculated or quoted as an annually compounded rate, that's what you should use; otherwise, you'll have to determine what "4%" means.

Joining of curve fitting models

I have this 7 quasi-lorentzian curves which are fitted to my data.
and I would like to join them, to make one connected curved line. Do You have any ideas how to do this? I've read about ComposingModel at lmfit documentation, but it's not clear how to do this.
Here is a sample of my code of two fitted curves.
for dataset in [Bxfft]:
dataset = np.asarray(dataset)
freqs, psd = signal.welch(dataset, fs=266336/300, window='hamming', nperseg=16192, scaling='spectrum')
plt.semilogy(freqs[0:-7000], psd[0:-7000]/dataset.size**0, color='r', label='Bx')
x = freqs[100:-7900]
y = psd[100:-7900]
# 8 Hz
model = Model(lorentzian)
params = model.make_params(amp=6, cen=5, sig=1, e=0)
result = model.fit(y, params, x=x)
final_fit = result.best_fit
print "8 Hz mode"
print(result.fit_report(min_correl=0.25))
plt.plot(x, final_fit, 'k-', linewidth=2)
# 14 Hz
x2 = freqs[220:-7780]
y2 = psd[220:-7780]
model2 = Model(lorentzian)
pars2 = model2.make_params(amp=6, cen=10, sig=3, e=0)
pars2['amp'].value = 6
result2 = model2.fit(y2, pars2, x=x2)
final_fit2 = result2.best_fit
print "14 Hz mode"
print(result2.fit_report(min_correl=0.25))
plt.plot(x2, final_fit2, 'k-', linewidth=2)
UPDATE!!!
I've used some hints from user #MNewville, who posted an answer and using his code I got this:
So my code is similar to his, but extended with each peak. What I'm struggling now is replacing ready LorentzModel with my own.
The problem is when I do this, the code gives me an error like this.
C:\Python27\lib\site-packages\lmfit\printfuncs.py:153: RuntimeWarning:
invalid value encountered in double_scalars [[Model]] spercent =
'({0:.2%})'.format(abs(par.stderr/par.value))
About my own model:
def lorentzian(x, amp, cen, sig, e):
return (amp*(1-e)) / ((pow((1.0 * x - cen), 2)) + (pow(sig, 2)))
peak1 = Model(lorentzian, prefix='p1_')
peak2 = Model(lorentzian, prefix='p2_')
peak3 = Model(lorentzian, prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peak2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amp=6, p1_cen=8, p1_sig=1, p1_e=0,
p2_ampe=16, p2_cen=14, p2_sig=3, p2_e=0,
p3_amp=16, p3_cen=21, p3_sig=3, p3_e=0,)
rest of the code is similar like at #MNewville
[![enter image description here][3]][3]
A composite model for 3 Lorentzians would look like this:
from lmfit import Model, LorentzianModel
peak1 = LorentzianModel(prefix='p1_')
peak2 = LorentzianModel(prefix='p2_')
peak3 = LorentzianModel(prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peaks2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amplitude=10, p1_center=8, p1_sigma=3,
p2_amplitude=10, p2_center=15, p2_sigma=3,
p3_amplitude=10, p3_center=20, p3_sigma=3)
# perhaps set bounds to prevent peaks from swapping or crazy values
params['p1_amplitude'].min = 0
params['p2_amplitude'].min = 0
params['p3_amplitude'].min = 0
params['p1_sigma'].min = 0
params['p2_sigma'].min = 0
params['p3_sigma'].min = 0
params['p1_center'].min = 2
params['p1_center'].max = 11
params['p2_center'].min = 10
params['p2_center'].max = 18
params['p3_center'].min = 17
params['p3_center'].max = 25
# then do a fit over the full data range
result = model.fit(y, params, x=x)
I think the key parts you were missing were: a) just add models together, and b) use prefix to avoid name collisions of parameters.
I hope that is enough to get you started...

Recursively select elements in numpy array

I have a text file containing latitude and temperature values for points around the globe. I would like to take the average of all the temperature points between a specified latitude interval (i.e. every degree, from the South to the North Pole). This is the code I have so far:
data_in = genfromtxt('temperatures.txt', usecols = (0,1))
lat = data_in[:,0]
temp = data_in[:,1]
in_1N = np.where((lat>=0) & (lat<=1)) # outputs an array of indexes for all latitudes between 0°-1° North
temp_1N = temp[in_1N] # outputs an array of temperature values between 0°-1° North
avg_1N = np.nanmean(temp_1N) # works out the average of temperatures between 0°-1° North
plt.scatter(1, avg_1N) # plots the average temperature against the latitude interval
plt.show()
How could I improve this code, so it can be implemented 180 times to cover the Earth between -90°S and 90°N?
Thanks
You could use np.histogram to put the latitudes into bins. Usually, np.histogram would merely count the number of latitudes in each bin. But if you weight the latitudes by the associated temp value, then instead of a count you get the sum of the temps. If you divide the sum of temps by the bin count, you get the average temp in each bin:
import numpy as np
import matplotlib.pyplot as plt
# N = 100
# lat = np.linspace(-90, 90, N)
# temp = 50*(1-np.cos(np.linspace(0, 2*np.pi, N)))
# temp[::5] = np.nan
# np.savetxt(filename, np.column_stack([lat, temp]))
lat, temp = np.genfromtxt('temperatures.txt', usecols = (0,1), unpack=True)
valid = np.isfinite(temp)
lat = lat[valid]
temp = temp[valid]
grid = np.linspace(-90, 90, 40)
count, bin_edges = np.histogram(lat, bins=grid)
temp_sum, bin_edges = np.histogram(lat, bins=grid, weights=temp)
temp_avg = temp_sum / count
plt.plot(bin_edges[1:], temp_avg, 'o')
plt.show()
Note that if you have scipy installed, then you could replace
the two calls to np.histogram:
count, bin_edges = np.histogram(lat, bins=grid)
temp_sum, bin_edges = np.histogram(lat, bins=grid, weights=temp)
with one call to stats.binned_statistic:
import scipy.stats as stats
temp_avg, bin_edges, binnumber = stats.binned_statistic(
x=lat, values=temp, statistic='mean', bins=grid)