genetic algorithm ( model fitting) - python-2.7

I have run genetic algorithm model on antimicrobacterial resistance (amr) but it turns out an error like this :amr() takes exactly 3 arguments (2 given). Hopefully someone can help me to trace the error as Im desperately need to solve this in two days time. Thank you. My code is embedded below :
# First code to run
# imports relevant modules
# then defines functions for the Levenberg-Marquardt algorithm
import numpy as np
import matplotlib.pyplot as plt
#from scipy.optimize import leastsq
%matplotlib inline
time = np.arange(0.0, 3000.1,1.0)
pop = np.array([2,27,43,36,39,32,27,22,10,14,14,4,4,7,3,3,1])
def amr(pars,t):
beta,gamma,sigma_A,A,H,MIC_S,MIC_R,H,r = pars
E_S = 1-Emax*A**H/(MIC_S**H + A**H)
E_R = 1-Emax*A**H/(MIC_R**H + A**H)
derivs = [r*(1-(R+S)/Nmax)*E_S*S - sigma_S*S - beta*S*R/(R+S),
r*(1-gamma)*(1-(R+S)/Nmax)*E_R*R - sigma_R*R + beta*S*R/(R+S),
-sigma_A*A]
return derivs
def amr_resid(pars,t,data):
return amr(pars,t)-data
# code for the genetic algorithm. Relies on data set up above
# define a sum of squares function that we will use as fitness
def amr_ss(pars,t,data):
return sum(amr_resid(pars,t,data)**2)
# Parameter values
npars = 3
popsize = 60 # this needs to be a multiple of 3
elitism = popsize/3
# strength of mutation: the higher the number the larger the mutations can be, and vice versa
psigma = 0.08
ngenerations = 100
#set up initial population with parameters at log normal distribution around (1,1,1)
population = 10**np.random.normal(0,psigma,size=(popsize,npars))
newpop = population
# Matrices into which we put results
best = np.zeros(shape=(ngenerations,npars))
bestfitness = np.zeros(ngenerations)
fitnesses = np.zeros(shape=(popsize,ngenerations))
# Genetic algorithm code
for j in range(ngenerations):
# work out fitness
for i in range(popsize):
fitnesses[i,j] = amr_ss(population[i,:],time,pop)
# find the best and copy them into the next generation: we use the top 1/3rd
newpop[range(elitism),:]=population[np.argsort(fitnesses[:,j])[range(elitism)],:]
best[j,:]=newpop[0,:]
bestfitness[j]=np.sort(fitnesses[:,j])[0]
#create some mutants
for i in range(elitism):
# mutants have multiplicative change so that the change is a fixed proportion of the parameter value
newpop[elitism+i,:] = newpop[i,:] * 10**np.random.normal(0,psigma,npars)
# now create some recombinants: the gene values also mutate
for i in range(elitism):
parents = np.random.choice(elitism,2,replace=False)
# first gene from first parent
newpop[2*elitism+i,0]=newpop[parents[0],0] * 10**np.random.normal(0,psigma)
# second gene at random from first or second parent: depends on recombination position
if (np.random.rand()<0.5):
newpop[2*elitism+i,1]=newpop[parents[0],1] * 10**np.random.normal(0,psigma)
else:
newpop[2*elitism+i,1]=newpop[parents[1],1] * 10**np.random.normal(0,psigma)
# third gene from second parent
newpop[2*elitism+i,2]=newpop[parents[1],2] * 10**np.random.normal(0,psigma)
#update population
population = newpop
plt.boxplot(fitnesses) ;

Related

How can I use the pvlib modules in pvmismatch?

Is it possible to use the cec modules of pvlib modules in pvmismatch?
I tried to make it using the values of the cec modules.
cell=pvcell.PVcell(
Rs=parametersw.R_s/parameters.N_s,
Rsh= parametersw.R_sh_ref*parameters.N_s,
Isat1_T0=parametersw.I_o_ref,
Isat2_T0=0,
Isc0_T0= parametersw.I_L_ref,
alpha_Isc=parametersw.alpha_sc,
#aRBD=0,
#bRBD=0,
#nRBD=0,
Eg=1.121,
Tcell=273.15
)
and putting this cells in a mismatch module, the values that I receive: voc isc vmp imp, are 3V less than expected.
The short answer is no, it is not possible. pvmismatch uses the two-diode model and you cannot simple put in the corresponding values from the one-diode model and omit the others.
Having said that, using the same (scaled) resistances might be ok, and you could try tweaking Isat1 and Isat2 to get closer to what you were expecting. This may or may not be good enough for your application.
If you are willing to do a little extra work, you can generate 2-diode model coefficients from an IV curve at STC using the gen_coeffs package in pvmismatch.contrib. See this discussion for an example using a Canadian Solar module from the Sandia array performance model library. All you need are the principle characteristics at STC, a little luck, and some elbow grease and you can do it.
For example, if using a CEC module:
"""
Making 2-diode modules from CEC in PVMismatch
"""
from matplotlib import pyplot as plt
import numpy as np
import pvlib
from pvmismatch import *
from pvmismatch.contrib import gen_coeffs
cecmod = pvlib.pvsystem.retrieve_sam('CECMod')
csmods = cecmod.columns[cecmod.T.index.str.startswith('Canadian')]
len(csmods) # 409 modules in CEC library!
cs6x_300m = cecmod[csmods[264]] # CS6X-300M Canadian Solar 300W mono-Si module
args = (
cs6x_300m.I_sc_ref,
cs6x_300m.V_oc_ref,
cs6x_300m.I_mp_ref,
cs6x_300m.V_mp_ref,
cs6x_300m.N_s, # number of series cells
1, # number of parallel sub-strings
25.0) # cell temperature
# try to solve using default coeffs
x, sol = gen_coeffs.gen_two_diode(*args)
# solver fails, so get the last guess before it quit
def last_guess(sol):
isat1 = np.exp(sol.x[0])
isat2 = np.exp(sol.x[1])
rs = sol.x[2] ** 2.0
rsh = sol.x[3] ** 2.0
return isat1, isat2, rs, rsh
x = last_guess(sol)
# the series and shunt resistance solver guess are way off, so reset them
# with something reasonable for a 2-diode model
x, sol = gen_coeffs.gen_two_diode(*args, x0=(x[0], x[1], 0.005, 10.0))
# Hooray, it worked! Note that the 1-diode and 2-diode parametres are so
# different! Anyway, let's make a cell and a module to check the solution.
pvc = pvcell.PVcell(
Isat1_T0=x[0],
Isat2_T0=x[1],
Rs=x[2],
Rsh=x[3],
Isc0_T0=cs6x_300m.I_sc_ref,
alpha_Isc=cs6x_300m.alpha_sc)
np.isclose(pvc.Isc, cs6x_300m.I_sc_ref) # ha-ha, this exact b/c we used it
# open circuit voltage within 1E-3: (45.01267251639085, 45.0)
np.isclose(pvc.Voc*cs6x_300m.N_s, cs6x_300m.V_oc_ref, rtol=1e-3, atol=1e-3)
# get index max power point
mpp = np.argmax(pvc.Pcell)
# max power voltage within 1E-3: (36.50580418834946, 36.5)
np.isclose(pvc.Vcell[mpp][0]*cs6x_300m.N_s, cs6x_300m.V_mp_ref, rtol=1e-3, atol=1e-3)
# max power current within 1E-3: (8.218687568902466, 8.22)
np.isclose(pvc.Icell[mpp][0], cs6x_300m.I_mp_ref, rtol=1e-3, atol=1e-3)
# use pvlib to get the full IV curve using CEC model
params1stc = pvlib.pvsystem.calcparams_cec(effective_irradiance=1000,
temp_cell=25.0, alpha_sc=cs6x_300m.alpha_sc, a_ref=cs6x_300m.a_ref,
I_L_ref=cs6x_300m.I_L_ref, I_o_ref=cs6x_300m.I_o_ref, R_sh_ref=cs6x_300m.R_sh_ref,
R_s=cs6x_300m.R_s, Adjust=cs6x_300m.Adjust)
iv_params1stc = pvlib.pvsystem.singlediode(*params1stc, ivcurve_pnts=100, method='newton')
# use pvmm to get full IV curve using 2-diode model parameters
pvm = pvmodule.PVmodule(cell_pos=pvmodule.STD72, pvcells=[pvc]*72)
# make some comparison plots
pvm.plotMod() # plot the pvmm module
plt.tight_layout()
# get axes for IV curve
f, ax = plt.gcf(), plt.gca()
ax0 = f.axes[0]
ax0.plot(iv_params1stc['v'], iv_params1stc['i'], '--')
ax0.plot(0, iv_params1stc['i_sc'], '|k')
ax0.plot(iv_params1stc['v_oc'], 0, '|k')
ax0.plot(iv_params1stc['v_mp'], iv_params1stc['i_mp'], '|k')
ax0.set_ylim([0, 10])
ax0.plot(0, pvm.Isc.mean(), '_k')
ax0.plot(pvm.Voc.sum(), 0, '|k')
mpp = np.argmax(pvm.Pmod)
ax0.plot(pvm.Vcell[mpp], mpp.Icell[mpp], '_k')
ax0.plot(pvm.Vmod[mpp], pvm.Imod[mpp], '_k')
iv_params1stc['p'] = iv_params1stc['v'] * iv_params1stc['i']
ax1.plot(iv_params1stc['v'], iv_params1stc['p'], '--')
ax1.plot(iv_params1stc['v_mp'], iv_params1stc['p_mp'], '|k')
ax1.plot(pvm.Vmod[mpp], pvm.Pmod[mpp], '_k')
you can get very close agreement:
You can not define as following:
Isat1_T0=parametersw.I_o_ref,
Isat2_T0=0,
The reason is that the ideal factor for PVmismatch is n1=1 and n2=2. However, if you want to use one diode model, you must input ideal factor n.
If you still want to use one diode model for PVmismatch, the right method is that you need to rewrite "PVcell" module in this pachage. I try to write it, you can refer.
enter code here
"""
This module contains the :class:`~pvmismatch.pvmismatch_lib.pvcell.PVcell`
object which is used by modules, strings and systems.
"""
from __future__ import absolute_import
from future.utils import iteritems
from pvmismatch.pvmismatch_lib.pvconstants import PVconstants
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import newton
# Defaults
RS = 0.0038 # [ohm] series resistance
RSH = 500 # [ohm] shunt resistance
Gamma = 0.97
ISAT1_T0 = 2.76E-11 # [A] diode one saturation current
ISC0_T0 = 9.87 # [A] reference short circuit current
TCELL = 298.15 # [K] cell temperature
ARBD = 2E-3 # reverse breakdown coefficient 1
BRBD = 0. # reverse breakdown coefficient 2
VRBD_ = -20 # [V] reverse breakdown voltage
NRBD = 3.28 # reverse breakdown exponent
EG = 1.1 # [eV] band gap of cSi
ALPHA_ISC = 0.0005 # [1/K] short circuit current temperature coefficient
EPS = np.finfo(np.float64).eps
class PVcell(object):
"""
Class for PV cells.
:param Rs: series resistance [ohms]
:param Rsh: shunt resistance [ohms]
:param Isat1_T0: first saturation diode current at ref temp [A]
:param Isat2_T0: second saturation diode current [A]
:param Isc0_T0: short circuit current at ref temp [A]
:param aRBD: reverse breakdown coefficient 1
:param bRBD: reverse breakdown coefficient 2
:param VRBD: reverse breakdown voltage [V]
:param nRBD: reverse breakdown exponent
:param Eg: band gap [eV]
:param alpha_Isc: short circuit current temp coeff [1/K]
:param Tcell: cell temperature [K]
:param Ee: incident effective irradiance [suns]
:param pvconst: configuration constants object
:type pvconst: :class:`~pvmismatch.pvmismatch_lib.pvconstants.PVconstants`
"""
_calc_now = False #: if True ``calcCells()`` is called in ``__setattr__``
def __init__(self, Rs=RS, Rsh=RSH, Isat1_T0=ISAT1_T0, Gamma=Gamma,
Isc0_T0=ISC0_T0, aRBD=ARBD, bRBD=BRBD, VRBD=VRBD_,
nRBD=NRBD, Eg=EG, alpha_Isc=ALPHA_ISC,
Tcell=TCELL, Ee=1., pvconst=PVconstants()):
# user inputs
self.Rs = Rs #: [ohm] series resistance
self.Rsh = Rsh #: [ohm] shunt resistance
self.Isat1_T0 = Isat1_T0 #: [A] diode one sat. current at T0
self.Gamma = Gamma #: ideal factor
self.Isc0_T0 = Isc0_T0 #: [A] short circuit current at T0
self.aRBD = aRBD #: reverse breakdown coefficient 1
self.bRBD = bRBD #: reverse breakdown coefficient 2
self.VRBD = VRBD #: [V] reverse breakdown voltage
self.nRBD = nRBD #: reverse breakdown exponent
self.Eg = Eg #: [eV] band gap of cSi
self.alpha_Isc = alpha_Isc #: [1/K] short circuit temp. coeff.
self.Tcell = Tcell #: [K] cell temperature
self.Ee = Ee #: [suns] incident effective irradiance on cell
self.pvconst = pvconst #: configuration constants
self.Icell = None #: cell currents on IV curve [A]
self.Vcell = None #: cell voltages on IV curve [V]
self.Pcell = None #: cell power on IV curve [W]
self.VocSTC = self._VocSTC() #: estimated Voc at STC [V]
# set calculation flag
self._calc_now = True # overwrites the class attribute
def __str__(self):
fmt = '<PVcell(Ee=%g[suns], Tcell=%g[K], Isc=%g[A], Voc=%g[V])>'
return fmt % (self.Ee, self.Tcell, self.Isc, self.Voc)
def __repr__(self):
return str(self)
def __setattr__(self, key, value):
# check for floats
try:
value = np.float64(value)
except (TypeError, ValueError):
pass # fail silently if not float, eg: pvconst or _calc_now
super(PVcell, self).__setattr__(key, value)
# recalculate IV curve
if self._calc_now:
Icell, Vcell, Pcell = self.calcCell()
self.__dict__.update(Icell=Icell, Vcell=Vcell, Pcell=Pcell)
def update(self, **kwargs):
"""
Update user-defined constants.
"""
# turn off calculation flag until all attributes are updated
self._calc_now = False
# don't use __dict__.update() instead use setattr() to go through
# custom __setattr__() so that numbers are cast to floats
for k, v in iteritems(kwargs):
setattr(self, k, v)
self._calc_now = True # recalculate
#property
def Vt(self):
"""
Thermal voltage in volts.
"""
return self.pvconst.k * self.Tcell / self.pvconst.q
#property
def Isc(self):
return self.Ee * self.Isc0
#property
def Aph(self):
"""
Photogenerated current coefficient, non-dimensional.
"""
# Aph is undefined (0/0) if there is no irradiance
if self.Isc == 0: return np.nan
# short current (SC) conditions (Vcell = 0)
Vdiode_sc = self.Isc * self.Rs # diode voltage at SC
Idiode1_sc = self.Isat1 * (np.exp(Vdiode_sc / (self.Gamma*self.Vt)) - 1.)
Ishunt_sc = Vdiode_sc / self.Rsh # diode voltage at SC
# photogenerated current coefficient
return 1. + (Idiode1_sc + Ishunt_sc) / self.Isc
#property
def Isat1(self):
"""
Diode one saturation current at Tcell in amps.
"""
_Tstar = self.Tcell ** 3. / self.pvconst.T0 ** 3. # scaled temperature
_inv_delta_T = 1. / self.pvconst.T0 - 1. / self.Tcell # [1/K]
_expTstar = np.exp(
self.Eg * self.pvconst.q / (self.pvconst.k * self.Gamma) * _inv_delta_T
)
return self.Isat1_T0 * _Tstar * _expTstar # [A] Isat1(Tcell)
#property
def Isc0(self):
"""
Short circuit current at Tcell in amps.
"""
_delta_T = self.Tcell - self.pvconst.T0 # [K] temperature difference
return self.Isc0_T0 * (1. + self.alpha_Isc * _delta_T) # [A] Isc0
#property
def Voc(self):
"""
Estimate open circuit voltage of cells.
Returns Voc : numpy.ndarray of float, estimated open circuit voltage
"""
return self.Vt * self.Gamma * np.log((self.Aph * self.Isc)/self.Isat1_T0 + 1)
def _VocSTC(self):
"""
Estimate open circuit voltage of cells.
Returns Voc : numpy.ndarray of float, estimated open circuit voltage
"""
Vdiode_sc = self.Isc0_T0 * self.Rs # diode voltage at SC
Idiode1_sc = self.Isat1_T0 * (np.exp(Vdiode_sc / (self.Gamma*self.Vt)) - 1.)
Ishunt_sc = Vdiode_sc / self.Rsh # diode voltage at SC
# photogenerated current coefficient
Aph = 1. + (Idiode1_sc + Ishunt_sc) / self.Isc0_T0
# estimated Voc at STC
return self.Vt * self.Gamma * np.log((Aph * self.Isc0_T0)/self.Isat1_T0 + 1)
#property
def Igen(self):
"""
Photovoltaic generated light current (AKA IL or Iph)
Returns Igen : numpy.ndarray of float, PV generated light current [A]
Photovoltaic generated light current is zero if irradiance is zero.
"""
if self.Ee == 0: return 0
return self.Aph * self.Isc
def calcCell(self):
"""
Calculate cell I-V curves.
Returns (Icell, Vcell, Pcell) : tuple of numpy.ndarray of float
"""
Vreverse = self.VRBD * self.pvconst.negpts
Vff = self.Voc
delta_Voc = self.VocSTC - self.Voc
# to make sure that the max voltage is always in the 4th quadrant, add
# a third set of points log spaced with decreasing density, from Voc to
# Voc # STC unless Voc *is* Voc # STC, then use an arbitrary voltage at
# 80% of Voc as an estimate of Vmp assuming a fill factor of 80% and
# Isc close to Imp, or if Voc > Voc # STC, then use Voc as the max
if delta_Voc == 0:
Vff = 0.8 * self.Voc
delta_Voc = 0.2 * self.Voc
elif delta_Voc < 0:
Vff = self.VocSTC
delta_Voc = -delta_Voc
Vquad4 = Vff + delta_Voc * self.pvconst.Vmod_q4pts
Vforward = Vff * self.pvconst.pts
Vdiode = np.concatenate((Vreverse, Vforward, Vquad4), axis=0)
Idiode1 = self.Isat1 * (np.exp(Vdiode / (self.Gamma*self.Vt)) - 1.)
Ishunt = Vdiode / self.Rsh
fRBD = 1. - Vdiode / self.VRBD
# use epsilon = 2.2204460492503131e-16 to avoid "divide by zero"
fRBD[fRBD == 0] = EPS
Vdiode_norm = Vdiode / self.Rsh / self.Isc0_T0
fRBD = self.Isc0_T0 * fRBD ** (-self.nRBD)
IRBD = (self.aRBD * Vdiode_norm + self.bRBD * Vdiode_norm ** 2) * fRBD
Icell = self.Igen - Idiode1 - Ishunt - IRBD
Vcell = Vdiode - Icell * self.Rs
Pcell = Icell * Vcell
return Icell, Vcell, Pcell
# diode model
# *-->--*--->---*--Rs->-Icell--+
# ^ | | ^
# | | | |
# Igen Idiode Ishunt Vcell
# | | | |
# | v v v
# *--<--*---<---*--<-----------=
# http://en.wikipedia.org/wiki/Diode_modelling#Shockley_diode_model
# http://en.wikipedia.org/wiki/Diode#Shockley_diode_equation
# http://en.wikipedia.org/wiki/William_Shockley
#staticmethod
def f_Icell(Icell, Vcell, Igen, Rs, Vt, Isat1, Rsh):
"""
Objective function for Icell.
:param Icell: cell current [A]
:param Vcell: cell voltage [V]
:param Igen: photogenerated current at Tcell and Ee [A]
:param Rs: series resistance [ohms]
:param Vt: thermal voltage [V]
:param Isat1: first diode saturation current at Tcell [A]
:param Isat2: second diode saturation current [A]
:param Rsh: shunt resistance [ohms]
:return: residual = (Icell - Icell0) [A]
"""
# arbitrary current condition
Vdiode = Vcell + Icell * Rs # diode voltage
Idiode1 = Isat1 * (np.exp(Vdiode / (Gamma * Vt)) - 1.) # diode current
Ishunt = Vdiode / Rsh # shunt current
return Igen - Idiode1 - Ishunt - Icell
def calcIcell(self, Vcell):
"""
Calculate Icell as a function of Vcell.
:param Vcell: cell voltage [V]
:return: Icell
"""
args = (np.float64(Vcell), self.Igen, self.Rs, self.Vt,
self.Rsh)
return newton(self.f_Icell, x0=self.Isc, args=args)
#staticmethod
def f_Vcell(Vcell, Icell, Igen, Rs, Vt, Isat1, Rsh):
return PVcell.f_Icell(Icell, Vcell, Igen, Rs, Vt, Isat1, Rsh)
def calcVcell(self, Icell):
"""
Calculate Vcell as a function of Icell.
:param Icell: cell current [A]
:return: Vcell
"""
args = (np.float64(Icell), self.Igen, self.Rs, self.Vt,
self.Isat1, self.Rsh)
return newton(self.f_Vcell, x0=self.Voc, args=args)

Working example of multi-stage model in Pyomo

This paper describes Pyomo's Differential and Algebraic Equations framework. It also mentions multi-stage problems; however, it does not show a complete example of such a problem. Does such an example exist somewhere?
The following demonstrates a complete minimum working example of a multi-stage optimization problem using Pyomo's DAE system:
#!/usr/bin/env python3
#http://www.gpops2.com/Examples/OrbitRaising.html
from pyomo.environ import *
from pyomo.dae import *
from pyomo.opt import SolverStatus, TerminationCondition
import random
import matplotlib.pyplot as plt
T = 10 #Maximum time for each stage of the model
STAGES = 3 #Number of stages
m = ConcreteModel() #Model
m.t = ContinuousSet(bounds=(0,T)) #Time variable
m.stages = RangeSet(0, STAGES) #Stages in the range [0,STAGES]. Can be thought of as an integer-valued set
m.a = Var(m.stages, m.t) #State variable defined for all stages and times
m.da = DerivativeVar(m.a, wrt=m.t) #First derivative of state variable with respect to time
m.u = Var(m.stages, m.t, bounds=(0,1)) #Control variable defined for all stages and times. Bounded to range [0,1]
#Setting the value of the derivative.
def eq_da(m,stage,t): #m argument supplied when function is called. `stage` and `t` are given values from m.stages and m.t (see below)
return m.da[stage,t] == m.u[stage,t] #Derivative is proportional to the control variable
m.eq_da = Constraint(m.stages, m.t, rule=eq_da) #Call constraint function eq_da for each unique value of m.stages and m.t
#We need to connect the different stages together...
def eq_stage_continuity(m,stage):
if stage==m.stages.last(): #The last stage doesn't connect to anything
return Constraint.Skip #So skip this constraint
else:
return m.a[stage,T]==m.a[stage+1,0] #Final time of each stage connects with the initial time of the following stage
m.eq_stage_continuity = Constraint(m.stages, rule=eq_stage_continuity)
#Boundary conditions
def _init(m):
yield m.a[0,0] == 0 #Initial value (at zeroth stage and zeroth time) of `a` is 0
yield ConstraintList.End
m.con_boundary = ConstraintList(rule=_init) #Repeatedly call `_init` until `ConstraintList.End` is returned
#Objective function: maximize `a` at the end of the final stage
m.obj = Objective(expr=m.a[STAGES,T], sense=maximize)
#Get a discretizer
discretizer = TransformationFactory('dae.collocation')
#Disrectize the model
#nfe (number of finite elements)
#ncp (number of collocation points within finite element)
discretizer.apply_to(m,nfe=30,ncp=6,scheme='LAGRANGE-RADAU')
#Get a solver
solver = SolverFactory('ipopt', keepfiles=True, log_file='/z/log', soln_file='/z/sol')
solver.options['max_iter'] = 100000
solver.options['print_level'] = 1
solver.options['linear_solver'] = 'ma27'
solver.options['halt_on_ampl_error'] = 'yes'
#Solve the model
results = solver.solve(m, tee=True)
print(results.solver.status)
print(results.solver.termination_condition)
#Retrieve the results in a pleasant format
r_t = [t for s in sorted(m.stages) for t in sorted(m.t)]
r_a = [value(m.a[s,t]) for s in sorted(m.stages) for t in sorted(m.t)]
r_u = [value(m.u[s,t]) for s in sorted(m.stages) for t in sorted(m.t)]
plt.plot(r_t, r_a, label="r_a")
plt.plot(r_t, r_u, label="r_u")
plt.legend()
plt.show()

Counting matrix pairs using a threshold

I have a folder with hundreds of txt files I need to analyse for similarity. Below is an example of a script I use to run similarity analysis. In the end I get an array or a matrix I can plot etc.
I would like to see how many pairs there are with cos_similarity > 0.5 (or any other threshold I decide to use), removing cos_similarity == 1 when I compare the same files, of course.
Secondly, I need a list of these pairs based on file names.
So the output for the example below would look like:
1
and
["doc1", "doc4"]
Will really appreciate your help as I feel a bit lost not knowing which direction to go.
This is an example of my script to get the matrix:
doc1 = "Amazon's promise of next-day deliveries could be investigated amid customer complaints that it is failing to meet that pledge."
doc2 = "The BBC has been inundated with comments from Amazon Prime customers. Most reported problems with deliveries."
doc3 = "An Amazon spokesman told the BBC the ASA had confirmed to it there was no investigation at this time."
doc4 = "Amazon's promise of next-day deliveries could be investigated amid customer complaints..."
documents = [doc1, doc2, doc3, doc4]
# In my real script I iterate through a folder (path) with txt files like this:
#def read_text(path):
# documents = []
# for filename in glob.iglob(path+'*.txt'):
# _file = open(filename, 'r')
# text = _file.read()
# documents.append(text)
# return documents
import nltk, string, numpy
nltk.download('punkt') # first-time use only
stemmer = nltk.stem.porter.PorterStemmer()
def StemTokens(tokens):
return [stemmer.stem(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def StemNormalize(text):
return StemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
nltk.download('wordnet') # first-time use only
lemmer = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
from sklearn.feature_extraction.text import CountVectorizer
LemVectorizer = CountVectorizer(tokenizer=LemNormalize, stop_words='english')
LemVectorizer.fit_transform(documents)
tf_matrix = LemVectorizer.transform(documents).toarray()
from sklearn.feature_extraction.text import TfidfTransformer
tfidfTran = TfidfTransformer(norm="l2")
tfidfTran.fit(tf_matrix)
tfidf_matrix = tfidfTran.transform(tf_matrix)
cos_similarity_matrix = (tfidf_matrix * tfidf_matrix.T).toarray()
from sklearn.feature_extraction.text import TfidfVectorizer
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
def cos_similarity(textlist):
tfidf = TfidfVec.fit_transform(textlist)
return (tfidf * tfidf.T).toarray()
cos_similarity(documents)
Out:
array([[ 1. , 0.1459739 , 0.03613371, 0.76357693],
[ 0.1459739 , 1. , 0.11459266, 0.19117117],
[ 0.03613371, 0.11459266, 1. , 0.04732164],
[ 0.76357693, 0.19117117, 0.04732164, 1. ]])
As I understood your question, you want to create a function that reads the output numpy array and a certain value (threshold) in order to return two things:
how many docs are bigger than or equal the given threshold
the names of these docs.
So, here I've made the following function which takes three arguments:
the output numpy array from cos_similarity() function.
list of document names.
a certain number (threshold).
And here it's:
def get_docs(arr, docs_names, threshold):
output_tuples = []
for row in range(len(arr)):
lst = [row+1+idx for idx, num in \
enumerate(arr[row, row+1:]) if num >= threshold]
for item in lst:
output_tuples.append( (docs_names[row], docs_names[item]) )
return len(output_tuples), output_tuples
Let's see it in action:
>>> docs_names = ["doc1", "doc2", "doc3", "doc4"]
>>> arr = cos_similarity(documents)
>>> arr
array([[ 1. , 0.1459739 , 0.03613371, 0.76357693],
[ 0.1459739 , 1. , 0.11459266, 0.19117117],
[ 0.03613371, 0.11459266, 1. , 0.04732164],
[ 0.76357693, 0.19117117, 0.04732164, 1. ]])
>>> threshold = 0.5
>>> get_docs(arr, docs_names, threshold)
(1, [('doc1', 'doc4')])
>>> get_docs(arr, docs_names, 1)
(0, [])
>>> get_docs(lst, docs_names, 0.13)
(3, [('doc1', 'doc2'), ('doc1', 'doc4'), ('doc2', 'doc4')])
Let's see how this function works:
first, I iterate over every row of the numpy array.
Second, I iterate over every item in the row whose index is bigger than the row's index. So, we are iterating in a traingular shape like so:
and that's because each pair of documents is mentioned twice in the whole array. We can see that the two values arr[0][1] and arr[1][0] are the same. You also should notice that the diagonal items arn't included because we knew for sure that they are 1 as evey document is very similar to itself :).
Finally, we get the items whose values are bigger than or equal the given threshold, and return their indices. These indices are used later to get the documents names.

Code does not iterate correctly

I know there is many similar problems on this site, but unfortunately I was not able to fix my own with the existing answer.. I hope you don't mind to help me.
I have a code and am trying to append values to a list while iterating through the code. However, something goes wrong in the process and I end up with value 0.
I am not very good at debugging yet so I can't find the problem.
I made a simpler version of the code with less variables below:
`import math
from pylab import *
import numpy as np
#---------------------------INITIALIZATION-----------------------
#initializing the parameters of the model
Dt = 1
time_step = 50
#initializing the connection weights
w1 = 1
#initializing parameter values for the alogistic function
steepness_SS_a=1
speed_SS_a=1
threshold_SS_a=1
#--------------------THE FUNCTIONS---------------------------
def initialize():
global t, timesteps, WS_a, SS_a, new_WS_a, new_SS_a
#initialize model states
WS_a=1
SS_a=0
# initialize lists to update all states
new_SS_a=np.array([SS_a])
#initializing time list
t=0.
timesteps=[t]
def observe():
global t, timesteps, WS_a, SS_a, new_WS_a, new_SS_a
np.append(SS_a, new_SS_a)
timesteps.append(t)
def update():
global t, timesteps, WS_a, SS_a, new_WS_a, new_SS_a
# for each state activation value, write how to update it in the form of new_WS(a)=function(WS(a))
SS_a = SS_a + speed_SS_a * (((1/(1+math.exp(-steepness_SS_a * (w1 * WS_a - threshold_SS_a))))-(1/(1+math.exp(steepness_SS_a *threshold_SS_a))))* (1+math.exp(-steepness_SS_a*threshold_SS_a))-SS_a)
# for each state activation value now move the state value to new value
SS_a=new_SS_a
#updating the timestep
t = t + Dt
#--------------------THE PROGRAM-----------------
initialize()
while t<30.:
update()
observe()
#--------------------PLOTTING--------------------
print SS_a
plot(new_SS_a)
show()`
how about this:
import math
import numpy as np
from pylab import plot, show
def model():
Dt = 1
time_step = 50
#initializing the connection weights
w1 = 1
#initializing parameter values for the alogistic function
steepness_SS_a=1
speed_SS_a=1
threshold_SS_a=1
#initialize model states
WS_a=1
SS_a=0
# initialize lists to update all states
new_SS_a=np.array([SS_a])
#initializing time list
t=0.0
timesteps=[t]
while t<30.0:
## update()
# for each state activation value, write how to update it in the form of new_WS(a)=function(WS(a))
new = SS_a + speed_SS_a * (
(
(1/(1+math.exp(-steepness_SS_a * (w1 * WS_a - threshold_SS_a))))
-
(1/(1+math.exp(steepness_SS_a *threshold_SS_a)))
)
*
(1+math.exp(-steepness_SS_a*threshold_SS_a))
-
SS_a
)
new_SS_a = np.append(new_SS_a,new)
# for each state activation value now move the state value to new value
SS_a=new
#updating the timestep
t = t + Dt
##observe()
#np.append(SS_a, new_SS_a)
timesteps.append(t)
print( SS_a )
print( new_SS_a )
print( timesteps )
plot(new_SS_a)
show()
model()
the mayor change is in SS_a = SS_a + speed_SS_a *... first I save that value like new = SS_a + speed_SS_a *... then I add that value to new_SS_a with new_SS_a = np.append(new_SS_a,new) (you need to catch the result of this, otherwise you lose it) and override the old one in SS_a=new
The other thing is that I put all together because the use of globals in this fashion is a bad practice, global should be used as constants and don't be modify, if you need to change it in base to a function result, then make that function return that value and then make the change to the global constant. Also import * is another bad practice because pollute your namespace with unneeded stuff and in some cases that extra stuff can override without your knowledge some other that you need, and is worse is you use multiple of those because make the debugging harder by hiding from were it come each stuff forcing you to make extra lookup to figuring which stuff belong to each library if you don't know or don't remember, so always use explicit imports

Gradient of kriged function in Openmdao

I am currently coding an Multiple Gradient Descent algorithm, where I use kriged functions.
My problem is that I can't find how to obtain the gradient of the kriged function (I tried to use linearize but I don't know how to make it work).
from __future__ import print_function
from six import moves
from random import shuffle
import sys
import numpy as np
from numpy import linalg as LA
import math
from openmdao.braninkm import F, G, DF, DG
from openmdao.api import Group, Component,IndepVarComp
from openmdao.api import MetaModel
from openmdao.api import KrigingSurrogate, FloatKrigingSurrogate
def rand_lhc(b, k):
# Calculates a random Latin hypercube set of n points in k dimensions within [0,n-1]^k hypercube.
arr = np.zeros((2*b, k))
row = list(moves.xrange(-b, b))
for i in moves.xrange(k):
shuffle(row)
arr[:, i] = row
return arr/b*1.2
class TrigMM(Group):
''' FloatKriging gives responses as floats '''
def __init__(self):
super(TrigMM, self).__init__()
# Create meta_model for f_x as the response
F_mm = self.add("F_mm", MetaModel())
F_mm.add_param('X', val=np.array([0., 0.]))
F_mm.add_output('f_x:float', val=0., surrogate=FloatKrigingSurrogate())
# F_mm.add_output('df_x:float', val=0., surrogate=KrigingSurrogate().linearize)
#F_mm.linearize('X', 'f_x:float')
#F_mm.add_output('g_x:float', val=0., surrogate=FloatKrigingSurrogate())
print('init ok')
self.add('p1', IndepVarComp('X', val=np.array([0., 0.])))
self.connect('p1.X','F_mm.X')
# Create meta_model for f_x as the response
G_mm = self.add("G_mm", MetaModel())
G_mm.add_param('X', val=np.array([0., 0.]))
G_mm.add_output('g_x:float', val=0., surrogate=FloatKrigingSurrogate())
#G_mm.add_output('df_x:float', val=0., surrogate=KrigingSurrogate().linearize)
#G_mm.linearize('X', 'g_x:float')
self.add('p2', IndepVarComp('X', val=np.array([0., 0.])))
self.connect('p2.X','G_mm.X')
from openmdao.api import Problem
prob = Problem()
prob.root = TrigMM()
prob.setup()
u=4
v=3
#training avec latin hypercube
prob['F_mm.train:X'] = rand_lhc(20,2)
prob['G_mm.train:X'] = rand_lhc(20,2)
#prob['F_mm.train:X'] = rand_lhc(10,2)
#prob['G_mm.train:X'] = rand_lhc(10,2)
#prob['F_mm.linearize:X'] = rand_lhc(10,2)
#prob['G_mm.linearize:X'] = rand_lhc(10,2)
datF=[]
datG=[]
datDF=[]
datDG=[]
for i in range(len(prob['F_mm.train:X'])):
datF.append(F(np.array([prob['F_mm.train:X'][i]]),u))
#datG.append(G(np.array([prob['F_mm.train:X'][i]]),v))
data_trainF=np.fromiter(datF,np.float)
for i in range(len(prob['G_mm.train:X'])):
datG.append(G(np.array([prob['G_mm.train:X'][i]]),v))
data_trainG=np.fromiter(datG,np.float)
prob['F_mm.train:f_x:float'] = data_trainF
#prob['F_mm.train:g_x:float'] = data_trainG
prob['G_mm.train:g_x:float'] = data_trainG
Are you going to be writing a Multiple Gradient Descent driver? If so, then OpenMDAO calculates the gradient from a param to an output at the Problem level using the calc_gradient method.
If you take a look at the source code for the pyoptsparse driver:
https://github.com/OpenMDAO/OpenMDAO/blob/master/openmdao/drivers/pyoptsparse_driver.py
The _gradfunc method is a callback function that returns the gradient of the constraints and objectives with respect to the design variables. The Metamodel component has built-in analytic gradients for all (I think) of our surrogates, so you don't even have to declare any there.
If this isn't what you are trying to do, then I may need a little more information about your application.