I am going to perform ShuffleSplit() method for California housing dataset (Source: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) to fit SGD regression.
However, the 'n_splits' error is occurred when method is applied.
The code is following:
from sklearn import cross_validation, grid_search, linear_model, metrics
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale
from sklearn.cross_validation import ShuffleSplit
housing_data = pd.read_csv('cal_housing.csv', header = 0, sep = ',')
housing_data.fillna(housing_data.mean(), inplace=True)
df=pd.get_dummies(housing_data)
y_target = housing_data['median_house_value'].values
x_features = housing_data.drop(['median_house_value'], axis = 1)
from sklearn.cross_validation import train_test_split
from sklearn import model_selection
train_x, test_x, train_y, test_y = model_selection.train_test_split(x_features, y_target, test_size=0.2, random_state=4)
reg = linear_model.SGDRegressor(random_state=0)
cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
The error is below:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-8f8760b04f8c> in <module>()
----> 1 cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
TypeError: __init__() got an unexpected keyword argument 'n_splits'
I updated scikit-learn with 0.18 version.
Anaconda version: 4.5.8
Could you please advise on this issue?
You are mixing up two different modules.
Before 0.18, cross_validation was used for ShuffleSplit. In that, n_splits was not present. n was used to define the number of splits
But since you have updated to 0.18 now, cross_validation and grid_search has been deprecated in favor of model_selection.
This is mentioned in docs here, and these modules will be removed from version 0.20
So instead of this:
from sklearn.cross_validation import ShuffleSplit
from sklearn.cross_validation import train_test_split
Do this:
from sklearn.model_selection import ShuffleSplit
fro
m sklearn.model_selection import train_test_split
Then you can use n_splits.
cv = ShuffleSplit(n_splits = 10, test_size = 0.2, random_state = 0)
Related
I have the following Sympy code that works as expected:
import numpy as np
from sympy.utilities.lambdify import lambdify
from sympy.core import sympify
from sympy import factorial
ex = sympify('-x**2 / cos(x)')
flam = lambdify(['x'], ex, "numpy")
flam(np.array(range(5)))
This returns:
array([ 0. , -1.85081572, 9.61199185, 9.09097799, 24.4781705 ])
Now, what I need to know is how to do the same for factorials, that is, using factorial(x) instead of cos(x). The code:
ex = sympify('-x**2 / factorial(x)')
flam = lambdify(['x'], ex, "numpy")
flam(np.array(range(5)))
raises a NameError
NameError: global name 'factorial' is not defined
What string should I use so that it gets converted to a factorial that can be evaluated after lambdify?
Thanks in advance for any help!
By tinkering I obtain the following code. It seems that the numpy factorial function do not works with ndarrays...
import numpy as np
from sympy.utilities.lambdify import lambdify
flam = lambdify(['x'], '-x**2 / cos(x)', "numpy")
flam(np.array(range(5)))
# >>> array([ 0. , -1.85081572, 9.61199185, 9.09097799, 24.4781705 ])
import scipy.special
flam = lambdify('x', 'factorial(x)', ['numpy', {'factorial':scipy.special.factorial}])
flam(np.array(range(5)))
# >>> array([ 1., 1., 2., 6., 24.])
I'm trying to implement a simple Bayesian Inference using a ODE model. I want to use the NUTS algorithm to sample but it gives me an initialization error. I do not know much about the PyMC3 as I'm new to this. Please take a look and tell me what is wrong.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import seaborn
import pymc3 as pm
import theano.tensor as T
from theano.compile.ops import as_op
#Actual Solution of the Differential Equation(Used to generate data)
def actual(a,b,x):
Y = np.exp(-b*x)*(a*np.exp(b*x)*(b*x-1)+a+b**2)/b**2
return Y
#Method For Solving the ODE
def lv(xdata, a=5.0, b=0.2):
def dy_dx(y, x):
return a*x - b*y
y0 = 1.0
Y, dict = odeint(dy_dx,y0,xdata,full_output=True)
return Y
#Generating Data for Bayesian Inference
a0, b0 = 5, 0.2
xdata = np.linspace(0, 21, 100)
ydata = actual(a0,b0,xdata)
# Adding some error to the ydata points
yerror = 10*np.random.rand(len(xdata))
ydata += np.random.normal(0.0, np.sqrt(yerror))
ydata = np.ravel(ydata)
#as_op(itypes=[T.dscalar, T.dscalar], otypes=[T.dvector])
def func(al,be):
Q = lv(xdata, a=al, b=be)
return np.ravel(Q)
# Number of Samples and Initial Conditions
nsample = 5000
y0 = 1.0
# Model for Bayesian Inference
model = pm.Model()
with model:
# Priors for unknown model parameters
alpha = pm.Uniform('alpha', lower=a0/2, upper=a0+a0/2)
beta = pm.Uniform('beta', lower=b0/2, upper=b0+b0/2)
# Expected value of outcome
mu = func(alpha,beta)
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd=yerror, observed=ydata)
trace = pm.sample(nsample, nchains=1)
pm.traceplot(trace)
plt.show()
The error that I get is
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Any help would be really appreciated
I am doing object detection and i am using keras over theano to build a model. Here is my code.
from keras.preprocessing import image
from scipy.misc import toimage
from keras.optimizers import Adadelta,SGD
from matplotlib import pyplot as plt
from keras.models import Sequential,load_model
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
import pickle
import numpy as np
X=pickle.load(open('Xvalues.p','rb'))
y=pickle.load(open('yvalues.p','rb'))
X_train=X[:1100,:,:]
y_train=y[:1100]
X_test=X[1100:,:,:]
y_test=y[1100:]
X_train = X_train.reshape(X_train.shape[0], 50, 50,1).astype('float32')
X_test = X_test.reshape(X_test.shape[0],50, 50,1).astype('float32')
#X_train=X_train[:,:,:,np.newaxis]
#X_test=X_test[:,:,:,np.newaxis]
X_train = X_train / 255
X_test = X_test / 255
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
print type(num_classes)
print y_train.shape
print y_test.shape
opt=Adadelta()
model=Sequential()
model.add(Conv2D(48,(5,5),input_shape=(50,50,1),activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.5))
model.add(Conv2D(64,(5,5),activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(500, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
model.fit(X_train,y_train,epochs=15, batch_size=32, verbose=1)
print model.summary()
model.save('train2.h5',overwrite=True)
scores = model.evaluate(X_test, y_test, verbose=1)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))
when i use coloured image it works well, but when i do it with gray scale image, it gives following error
File "D:/ML/classify.py", line 39, in <module>
model.add(Conv2D(48,(5,5),input_shape=(50,50,1),activation='relu'))
Exception: ('The following error happened while compiling the node', Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 2)](TensorConstant{(1L, 1L, 1..-0.0699854}, TensorConstant{(1L, 1L, 1..f 0.139971}, mrg_uniform{TensorType(float32, 4D),inplace}.1), '\n', 'Compilation failed (return status=1): C:\\Users\\DELL\\Anaconda2\\libs/python27.lib: error adding symbols: File in wrong format\r. collect2.exe: error: ld returned 1 exit status\r. ', '[Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 2)](TensorConstant{(1L, 1L, 1..-0.0699854}, TensorConstant{(1L, 1L, 1..f 0.139971}, <TensorType(float32, (False, False, True, False))>)]')
I am not able to understand whats wrong.
I am working on a python code to plot Eddy Kinetic Energy. I am fairly new to python and I'm confused about an error I have been getting. I'm not worried about plotting my data on a map just yet, I just want to see if I can get it to plot. Here is my code and error:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from pylab import *
from netCDF4 import Dataset
from mpl_toolkits.basemap import Basemap
import matplotlib.cm as cm
from mpl_toolkits.basemap import shiftgrid
test = Dataset('p.34331101.atmos_daily.nc', 'r')
lat = test.variables['lat'][:]
lon = test.variables['lon'][:]
level = test.variables['level'][5]
time = test.variables['time'][:]
u = test.variables['ucomp'][:]
v = test.variables['vcomp'][:]
temp = test.variables['temp'][:]
print(lat.shape)
print(u.shape)
#uz = np.reshape(u, (30, 26, 90))
uzm = np.nanmean(u, axis=3)
#vz = np.reshape(v, (30, 26, 90))
vzm = np.nanmean(v, axis=3)
print(uzm.shape)
ustar = u-uzm[:,:,:,np.newaxis]
vstar = v-vzm[:,:,:,np.newaxis]
EKE = np.nanmean(.5*(ustar**2 + vstar**2), axis=3)
EKE1 = np.asarray(EKE)
%matplotlib inline
print(EKE.shape)
levels=[-10, -5, 0, 5, 10]
plt.contour(EKE[1,1,:])
#EKE is time, level, lat and the shape is (30, 26, 90)
TypeError: Input must be a 2D array.
Bret, you would probably get more help if you included a bit more info with your error, did you not get a line number to look at?
I would hazard a guess that your problem is passing a 1D array to contour(). This sometimes seems counter-intuitive but numpy reduces the dimensions 'automatically' when you specify a single value in an index.
i.e. try
print(EKE.shape)
print(EKE[1,1,:].shape)
print(EKE[1:2,1:2,:].shape)
I am new to Logistic regression. The following is from package mypc example
project. It is still unclear to me for its purpose. More specifically,
variable n is [5, 5, 5, 5], which is used in the mode: pymc.Binomial.
I suppose it should have both 0 and 1 in binomial fitting. n is to represent
'1' cases?
Could you explain the idea of this example? Thanks,
The example is from:
www.map.ox.ac.uk/media/PDF/Patil_et_al_2010.pdf
.........
import pymc
import numpy as np
n = 5*np.ones(4,dtype=int)
x = np.array([-.86,-.3,-.05,.73])
alpha = pymc.Normal('alpha',mu=0,tau=.01)
beta = pymc.Normal('beta',mu=0,tau=.01)
#pymc.deterministic
def theta(a=alpha, b=beta):
"""theta = logit^{-1}(a+b)"""
return pymc.invlogit(a+b*x)
d = pymc.Binomial('d', n=n, p=theta, value=np.array([0.,1.,3.,5.]),\
observed=True)
.......
import pymc
import pymc.Matplot
import mymodel
S = pymc.MCMC(mymodel, db='pickle')
S.sample(iter=10000, burn=5000, thin=2)
pymc.Matplot.plot(S)
import matplotlib.pyplot as plt
plt.show()