I recently ran into an issue with calculating the clearness index and the extraterrestrial irradiance using the PVLIB functions. Basically, the numbers do not tally up.
This is my raw data that I ran the function for (my Datetime is already timezone aware):
I then ran the below code to get the clearness index:
latitude = -22.24
longitude = 114.09
times = pd.DatetimeIndex(test_data['Datetime']) #Converting to Datetime Index
solpos = solarposition.get_solarposition(times, latitude, longitude)
test_data['apparent_elevation'] = solpos.apparent_elevation.to_numpy()
test_data['zenith'] = solpos.zenith.to_numpy()
# data_file = data_file[data_file.apparent_elevation > 0] #For now, not removing data during night time
test_data.reset_index(drop = True, inplace = True)
extr_irr = pvlib.irradiance.get_extra_radiation(pd.DatetimeIndex(test_data['Datetime']))
clearness_index = pvlib.irradiance.clearness_index(ghi = np.array(test_data['GHI']), solar_zenith = np.array(test_data['zenith']), extra_radiation = np.array(extr_irr), min_cos_zenith=0.065, max_clearness_index=2.0)
test_data = pd.DataFrame(test_data)
test_data['clearness_index'] = clearness_index
test_data['GHI_ET'] = extr_irr.to_numpy()
Now analysing the calculated data:
Here we can see that for example in the first row,
GHI/GHI_ET does not equal the clearness_index value.
Could anyone point out what I am missing or doing wrong here?
Thanks
The problem is in this line:
test_data['GHI_ET'] = extr_irr.to_numpy()
You need to multiply the extraterrestrial irradiance by the cos(zenith) in order to obtain the value for the horizontal plane. If you look inside the clearness_index function you'll see it's done there as well.
Related
I'm trying to practice using pymc3 on the kinds of data I come across in my research, but I'm having trouble thinking through how to fit the model when each person gives me multiple data points, and each person comes from a different group (so trying a hierarchical model).
Here's the practice scenario I'm using: Suppose we have 2 groups of people, N = 30 in each group. All 60 people go through a 10 question survey, where each person can response ("1") or not respond ("0") to each question. So, for each person, I have an array of length 10 with 1's and 0's.
To model these data, I assume each person has some latent trait "theta", and each item has a "discrimination" a and a "difficulty" b (this is just a basic item response model), and the probability of responding ("1") is given by: (1 + exp(-a(theta - b)))^(-1). (Logistic applied to a(theta - b) .)
Here is how I tried to fit it using pymc3:
traces = {}
for grp in range(2):
group = prac_data["Group ID"] == grp
data = prac_data[group]["Response"]
with pm.Model() as irt:
# Priors
a_tmp = pm.Normal('a_tmp',mu=0, sd = 1, shape = 10)
a = pm.Deterministic('a', np.exp(a_tmp))
# We do this transformation since we must have a >= 0
b = pm.Normal('b', mu = 0, sd = 1, shape = 10)
# Now for the hyperpriors on the groups:
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0)
theta = pm.Normal('theta', mu = theta_mu,
sd = theta_sigma, shape = N)
p = getProbs(Disc, Diff, theta, N)
y = pm.Bernoulli('y', p = p, observed = data)
traces[grp] = pm.sample(1000)
The function "getProbs" is supposed to give me an array of probabilities for the Bernoulli random variable, as the probability of responding 1 changes across trials/survey questions for each person. But this method gives me an error because it says to "specify one of p or logit_p", but I thought I did with the function?
Here's the code for "getProbs" in case it's helpful:
def getProbs(Disc, Diff, THETA, Nprt):
# Get a large array of probabilities for the bernoulli random variable
n = len(Disc)
m = Nprt
probs = np.array([])
for th in range(m):
for t in range(n):
p = item(Disc[t], Diff[t], THETA[th])
probs = np.append(probs, p)
return probs
I added the Nprt parameter because if I tried to get the length of THETA, it would give me an error since it is a FreeRV object. I know I can try and vectorize the "item" function, which is just the logistic function I put above, instead of doing it this way, but that also got me an error when I tried to run it.
I think I can do something with pm.Data to fix this, but the documentation isn't exactly clear to me.
Basically, I'm used to building models in JAGS, where you loop through each data point, but pymc3 doesn't seem to work like that. I'm confused about how to build/index my random variables in the model to make sure that the probabilities change how I'd like them to from trial-to-trial, and to make sure that the parameters I'm estimating correspond to the right person in the right group.
Thanks in advance for any help. I'm pretty new to pymc3 and trying to get the hang of it, and wanted to try something different from JAGS.
EDIT: I was able to solve this by first building the array I needed by looping through the trials, then transforming the array using:
p = theano.tensor.stack(p, axis = 0)
I then put this new variable in the "p" argument of the Bernoulli instance and it worked! Here's the updated full model: (below, I imported theano.tensor as T)
group = group.astype('int')
data = prac_data["Response"]
with pm.Model() as irt:
# Priors
# Item parameters:
a = pm.Gamma('a', alpha = 1, beta = 1, shape = 10) # Discrimination
b = pm.Normal('b', mu = 0, sd = 1, shape = 10) # Difficulty
# Now for the hyperpriors on the groups: shape = 2 as there are 2 groups
theta_mu = pm.Normal('theta_mu', mu = 0, sd = 1, shape = 2)
theta_sigma = pm.Uniform('theta_sigma', upper = 2, lower = 0, shape = 2)
# Individual-level person parameters:
# group is a 2*N array that lets the model know which
# theta_mu to use for each theta to estimate
theta = pm.Normal('theta', mu = theta_mu[group],
sd = theta_sigma[group], shape = 2*N)
# Here, we're building an array of the probabilities we need for
# each trial:
p = np.array([])
for n in range(2*N):
for t in range(10):
x = -a[t]*(theta[n] - b[t])
p = np.append(p, x)
# Here, we turn p into a tensor object to put as an argument to the
# Bernoulli random variable
p = T.stack(p, axis = 0)
y = pm.Bernoulli('y', logit_p = p, observed = data)
# On my computer, this took about 5 minutes to run.
traces = pm.sample(1000, cores = 1)
print(az.summary(traces)) # Summary of parameter distributions
I have this 7 quasi-lorentzian curves which are fitted to my data.
and I would like to join them, to make one connected curved line. Do You have any ideas how to do this? I've read about ComposingModel at lmfit documentation, but it's not clear how to do this.
Here is a sample of my code of two fitted curves.
for dataset in [Bxfft]:
dataset = np.asarray(dataset)
freqs, psd = signal.welch(dataset, fs=266336/300, window='hamming', nperseg=16192, scaling='spectrum')
plt.semilogy(freqs[0:-7000], psd[0:-7000]/dataset.size**0, color='r', label='Bx')
x = freqs[100:-7900]
y = psd[100:-7900]
# 8 Hz
model = Model(lorentzian)
params = model.make_params(amp=6, cen=5, sig=1, e=0)
result = model.fit(y, params, x=x)
final_fit = result.best_fit
print "8 Hz mode"
print(result.fit_report(min_correl=0.25))
plt.plot(x, final_fit, 'k-', linewidth=2)
# 14 Hz
x2 = freqs[220:-7780]
y2 = psd[220:-7780]
model2 = Model(lorentzian)
pars2 = model2.make_params(amp=6, cen=10, sig=3, e=0)
pars2['amp'].value = 6
result2 = model2.fit(y2, pars2, x=x2)
final_fit2 = result2.best_fit
print "14 Hz mode"
print(result2.fit_report(min_correl=0.25))
plt.plot(x2, final_fit2, 'k-', linewidth=2)
UPDATE!!!
I've used some hints from user #MNewville, who posted an answer and using his code I got this:
So my code is similar to his, but extended with each peak. What I'm struggling now is replacing ready LorentzModel with my own.
The problem is when I do this, the code gives me an error like this.
C:\Python27\lib\site-packages\lmfit\printfuncs.py:153: RuntimeWarning:
invalid value encountered in double_scalars [[Model]] spercent =
'({0:.2%})'.format(abs(par.stderr/par.value))
About my own model:
def lorentzian(x, amp, cen, sig, e):
return (amp*(1-e)) / ((pow((1.0 * x - cen), 2)) + (pow(sig, 2)))
peak1 = Model(lorentzian, prefix='p1_')
peak2 = Model(lorentzian, prefix='p2_')
peak3 = Model(lorentzian, prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peak2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amp=6, p1_cen=8, p1_sig=1, p1_e=0,
p2_ampe=16, p2_cen=14, p2_sig=3, p2_e=0,
p3_amp=16, p3_cen=21, p3_sig=3, p3_e=0,)
rest of the code is similar like at #MNewville
[![enter image description here][3]][3]
A composite model for 3 Lorentzians would look like this:
from lmfit import Model, LorentzianModel
peak1 = LorentzianModel(prefix='p1_')
peak2 = LorentzianModel(prefix='p2_')
peak3 = LorentzianModel(prefix='p3_')
# make composite by adding (or multiplying, etc) components
model = peak1 + peaks2 + peak3
# make parameters for the full model, setting initial values
# using the prefixes
params = model.make_params(p1_amplitude=10, p1_center=8, p1_sigma=3,
p2_amplitude=10, p2_center=15, p2_sigma=3,
p3_amplitude=10, p3_center=20, p3_sigma=3)
# perhaps set bounds to prevent peaks from swapping or crazy values
params['p1_amplitude'].min = 0
params['p2_amplitude'].min = 0
params['p3_amplitude'].min = 0
params['p1_sigma'].min = 0
params['p2_sigma'].min = 0
params['p3_sigma'].min = 0
params['p1_center'].min = 2
params['p1_center'].max = 11
params['p2_center'].min = 10
params['p2_center'].max = 18
params['p3_center'].min = 17
params['p3_center'].max = 25
# then do a fit over the full data range
result = model.fit(y, params, x=x)
I think the key parts you were missing were: a) just add models together, and b) use prefix to avoid name collisions of parameters.
I hope that is enough to get you started...
I am just starting with Pyomo and I have a big problem.
I want to run an Abstract Model without using the terminal. I can do it with a concrete model but I have serious problems to do it in with the abstract one.
I just want to use F5 and run the code.
This ismy program:
import pyomo
from pyomo.environ import *
#
# Model
#
model = AbstractModel()
#Set: Indices
model.Unit = Set()
model.Block = Set()
model.DemBlock = Set()
#Parameters
model.EnergyBid = Param(model.Unit, model.Block)
model.PriceBid = Param(model.Unit, model.Block)
model.EnergyDem = Param(model.DemBlock)
model.PriceDem = Param(model.DemBlock)
model.Pmin = Param(model.Unit)
model.Pmax = Param(model.Unit)
#Variables definition
model.PD = Var(model.DemBlock, within=NonNegativeReals)
model.PG = Var(model.Unit,model.Block, within=NonNegativeReals)
#Binary variable
model.U = Var(model.Unit, within = Binary)
#Objective
def SocialWellfare(model):
SocialWellfare = sum([model.PriceDem[i]*model.PD[i] for i in model.DemBlock]) - sum([model.PriceBid[j,k]*model.PG[j,k] for j in model.Unit for k in model.Block ])
return SocialWellfare
model.SocialWellfare = Objective(rule=SocialWellfare, sense=maximize)
#Constraints
#Max and min Power generated
def PDmax_constraint(model,p):
return ((model.PD[p] - model.EnergyDem[p])) <= 0
model.PDmax = Constraint(model.DemBlock, rule=PDmax_constraint)
def PGmax_constraint(model,n,m):
return ((model.PG[n,m] - model.EnergyBid[n,m])) <= 0
model.PGmax = Constraint(model.Unit, model.Block,rule = PGmax_constraint)
def Power_constraintDW(model,i):
return ((sum(model.PG[i,k] for k in model.Block))-(model.Pmin[i] * model.U[i]) ) >= 0
model.LimDemandDw = Constraint(model.Unit, rule=Power_constraintDW)
def Power_constraintUP(model,i):
return ((sum(model.PG[i,k] for k in model.Block) - (model.Pmax[i])*model.U[i])) <= 0
model.LimDemandaUp = Constraint(model.Unit, rule=Power_constraintUP)
def PowerBalance_constraint(model):
return (sum(model.PD[i] for i in model.DemBlock) - sum(model.PG[j,k] for j in model.Unit for k in model.Block)) == 0
model.PowBalance = Constraint(rule = PowerBalance_constraint)
model.pprint()
instance = model.create('datos_transporte.dat')
## Create the ipopt solver plugin using the ASL interface
solver = 'ipopt'
solver_io = 'nl'
opt = SolverFactory(solver,solver_io=solver_io)
results = opt.solve(instance)
results.write()
Any help with the last part??
Thanks anyway,
I think your example is actually working. Starting in Pyomo 4.1, the solution returned from the solver is stored directly into the instance that was solved, and is not returned in the solver results object. This change was made because generating the representation of the solution in the results object was rather expensive and was not easily parsed by people. Working with the model instance directly is more natural.
It is unfortunate that the results object reports the number of solutions: 0, although this is technically correct: the results object holds no solutions ... but the Solver section should indicate that a solution was returned and stored into the model instance.
If you want to see the result returned by the solver, you can print out the current status of the model using:
instance.display()
after the call to solve(). That will report the current Var values that were returned from the solver. You will want to pay attention to the stale column:
False indicates the value is not "stale" ... that is, it was either set by the user (before the call to solve()), or it was returned from the solver (after the call to solve()).
True indicates the solver did not return a value for that variable. This is usually because the variable was not referenced by the objective or any enabled constraints, so Pyomo never sent the variable to the solver.
[Note: display() serves a slightly different role than pprint(): pprint() outputs the model structure, whereas display() outputs the model state. So, for example, where pprint() will output the constraint expressions, display() will output the numerical value of the expression (using the current variable/param values).]
Edited to expand the discussion of display() & pprint()
i want to get a python code to implement Fayad and Irani's Entropy based discretization.
please help me.
Three pythonic ways in which continuous variables/features can be discretized using a supervised method - MDLP by Fayyad, U.; Irani, K.
Assumptions
Input Data : X
target : y
categorical_features: Categorical columns that need not be discretized
Orange
import Orange
continuous_features = [*set(X.columns)-set(categorical_features)]
X[categorical_features] = X[categorical_features].astype('category')
data_df = pd.concat([X[continuous_features ], y.astype('category')], axis=1)
orange_table = table_from_frame(data_df, class_name=data_df.colum[-1])
disc = Orange.preprocess.Discretize()
disc.method = Orange.preprocess.discretize.EntropyMDL()
orange_table_discrete = disc(orange_table)
X[continuous_features ] = orange_table_discrete.X
Pypi package
from mdlp.discretization import MDLP
transformer = MDLP(continuous_features=[True if x not in categorical_features else False for x in X.columns])
X = pd.DataFrame(transformer.fit_transform(X, y), columns=X.columns)
Github repo
from entropymdlp import MDLP
mdlp = MDLP()
for col in X[continous_fetures].columns:
bins = mdlp.cut_points(X[col].values, y.values)
X[col] = mdlp.discretize_feature(X[col], bins)
X[col] = np.where(X[col].isnull(), np.nan, X[col])
To check discretization count of each column/feature
print(X.T.apply(lambda x: x.nunique(), axis=1))
Using this method in orange ML platform.
It seems to be based on Fayyad and Irani.
I think there is one here: https://github.com/navicto/Discretization-MDLPC
I haven't check it in detail, though.
This is much like a traveling salesman problem. I have a Listbox with college names in it(backed with coordinates I grabbed from the Facebook Graph). I have the selection mode set to multiple. I need to know the code that will allow me to use the colleges they selected so i can put them through a distance method. I only need to know the code to see what they selected. I tried using curselection() but I still do not understand it.
Here is some code:
self.listbox = Listbox(self.mid_frame,width = 42,selectmode ="multiple",
highlightcolor = "orange",
highlightthickness = "10",bd = "5")
coordinates = []
collegelist = []
f = open(sys.argv[1],'r')
# grab the college's lat and long from facebook graph
for identity in f:
urlquery='https://graph.facebook.com/'+identity
obj = json.load(urllib2.urlopen(urlquery))
college = obj["name"]
latitude = obj["location"]["latitude"]
longitude = obj["location"]["longitude"]
coordinates.append((college,latitude, longitude))
collegelist.append(college)
#sort the colleges so they appear alphabetical order
sortcollege = sorted(collegelist)
#fill Listbox with the College names imported from a text file
for college in sortcollege:
self.listbox.insert(END, college)
self.listbox.pack(side = LEFT)
#The label where I would put the total distance
self.output_totaldist_label = Label(self.mid_frame,
width = 11,
textvariable = self.totaldistance)
self.totaldistance = StringVar()
self.output_label = Label(self.mid_frame,
textvariable = self.totaldistance)
self.output_totaldist_label.pack(side = LEFT)
self.output_label.pack(side = LEFT)
It would have been nice to see how you tried curselection to see what went wrong.
Something like:
for idx in self.listbox.curselection():
selitem = self.listbox.get(idx)
should do the trick. Have you tried that?