Early stoping in Sklearn GradientBoostingRegressor - python-2.7

I am using a monitor-class as implemented here
class Monitor():
"""Monitor for early stopping in Gradient Boosting for classification.
The monitor checks the validation loss between each training stage. When
too many successive stages have increased the loss, the monitor will return
true, stopping the training early.
Parameters
----------
X_valid : array-like, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples
and n_features is the number of features.
y_valid : array-like, shape = [n_samples]
Target values (integers in classification, real numbers in
regression)
For classification, labels must correspond to classes.
max_consecutive_decreases : int, optional (default=5)
Early stopping criteria: when the number of consecutive iterations that
result in a worse performance on the validation set exceeds this value,
the training stops.
"""
def __init__(self, X_valid, y_valid, max_consecutive_decreases=5):
self.X_valid = X_valid
self.y_valid = y_valid
self.max_consecutive_decreases = max_consecutive_decreases
self.losses = []
def __call__(self, i, clf, args):
if i == 0:
self.consecutive_decreases_ = 0
self.predictions = clf._init_decision_function(self.X_valid)
predict_stage(clf.estimators_, i, self.X_valid, clf.learning_rate,
self.predictions)
self.losses.append(clf.loss_(self.y_valid, self.predictions))
if len(self.losses) >= 2 and self.losses[-1] > self.losses[-2]:
self.consecutive_decreases_ += 1
else:
self.consecutive_decreases_ = 0
if self.consecutive_decreases_ >= self.max_consecutive_decreases:
print("f"
"({}): s {}.".format(self.consecutive_decreases_, i)),
return True
else:
return False
params = { 'n_estimators': nEstimators,
'max_depth': maxDepth,
'min_samples_split': minSamplesSplit,
'min_samples_leaf': minSamplesLeaf,
'min_weight_fraction_leaf': minWeightFractionLeaf,
'min_impurity_decrease': minImpurityDecrease,
'learning_rate': 0.01,
'loss': 'quantile',
'alpha': alpha,
'verbose': 0
}
model = ensemble.GradientBoostingRegressor( **params )
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
It works very well. However, it is not clear for me what model this line
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
returns:
1) No model
2) The model trained before stopping
3) The model 25 iterations before ( note the parameter of the monitor )
If it is not (3), is it possible to make the estimator returning 3?
How can I do that?
It is worth mentioning that xgboost library does that, however it does allow to use the loss function that I need.

the model returns the fit before the "stopping rule" stops the model - means your answer No.2 is the right one.
the problem with this 'monitor code' is that the chosen model in the end will be the one that include the 25 extra iterations. the chosen model should be your NO.3 answer.
I think the easy (and stupid) way to do that is by running the same model (with seed - to have same results) but keep the model no of iterations equal to (i - max_consecutive_decreases)

Related

Using gradient descent to solve a nonlinear system

I have the following code, which uses gradient descent to find the global minimum of y = (x+5)^2:
cur_x = 3 # the algorithm starts at x=3
rate = 0.01 # learning rate
precision = 0.000001 # this tells us when to stop the algorithm
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 # iteration counter
df = lambda x: 2*(x+5) # gradient of our function
while previous_step_size > precision and iters < max_iters:
prev_x = cur_x # store current x value in prev_x
cur_x = cur_x - rate * df(prev_x) # grad descent
previous_step_size = abs(cur_x - prev_x) # change in x
iters = iters+1 # iteration count
print("Iteration",iters,"\nX value is",cur_x) # print iterations
print("The local minimum occurs at", cur_x)
The procedure is fairly simple, and among the most intuitive and brief for solving such a problem (at least, that I'm aware of).
I'd now like to apply this to solving a system of nonlinear equations. Namely, I want to use this to solve the Time Difference of Arrival problem in three dimensions. That is, given the coordinates of 4 observers (or, in general, n+1 observers for an n dimensional solution), the velocity v of some signal, and the time of arrival at each observer, I want to reconstruct the source (determine it's coordinates [x,y,z].
I've already accomplished this using approximation search (see this excellent post on the matter: ), and I'd now like to try doing so with gradient descent (really, just as an interesting exercise). I know that the problem in two dimensions can be described by the following non-linear system:
sqrt{(x-x_1)^2+(y-y_1)^2}+s(t_2-t_1) = sqrt{(x-x_2)^2 + (y-y_2)^2}
sqrt{(x-x_2)^2+(y-y_2)^2}+s(t_3-t_2) = sqrt{(x-x_3)^2 + (y-y_3)^2}
sqrt{(x-x_3)^2+(y-y_3)^2}+s(t_1-t_3) = sqrt{(x-x_1)^2 + (y-y_1)^2}
I know that it can be done, however I cannot determine how.
How might I go about applying this to 3-dimensions, or some nonlinear system in general?

How to Formulate a Piecewise Step Function in pyomo

I have a question regarding the correct formulation of a piecewise step function in pyomo. I want to include in my model a single piecewise function of the form:
/ 1 , 0 <= X(t) <= 1
Z(X) = \ 0 , 1 <= X(t) <= 2
Where X is being fit to data over taken over a time domain and Z acts like a binary variable. The most similar example in pyomo documentation is the step.py example using INC. However, when solving with this formulation I observe the problem of the domain variable x ‘sticking’ to the breakpoint at x=1. I assume this is because (as noted in the documentation) Z can solve to the entire vertical line if continuous or is doubly feasible at both 0 and 1 if binary. Other formulations offered via the piecewise function (i.e. dlog, dcc, log, etc.) experience similar issues (in fact, based on the output to GAMS I’m pretty sure they don’t support binary/integer variables at all).
Is there a ‘correct’ way to formulate a piecewise function in pyomo that avoids the multiple-feasibility issue at the breakpoint, thus avoiding the domain variable converging to the breakpoint? I am using BARON with solvers cplex and ipopt, however my gut tells me this formulation issue can’t be solved by simply changing solvers.
I can also send a document illustrating my observations on why the current pyomo piecewise formulations don’t support binary variables, if it would help.
Here's some sample code where we try to minimise the sum of the step function Z.
model = ConcreteModel()
model.A = Set(initialize=[1,2,3])
model.B = Set(initialize=['J', 'K'])
model.x = Var(model.A, model.B, bounds=(0, 2))
model.z = Var(model.A, model.B, domain = Binary)
DOMAIN_PTS = [0,1,1,2]
RANGE_PTS = [1,1,0,0]
model.z_constraint = Piecewise(
model.A, model.B,
model.z, model.x,
pw_pts=DOMAIN_PTS,
pw_repn='INC',
pw_constr_type = 'EQ',
f_rule = RANGE_PTS,
unbounded_domain_var = True)
def objective_rule(model):
return sum(model.z[a,b] for a in model.A for b in model.B)
model.objective = Objective(rule = objective_rule, sense=minimize)
If you set sense = minimize above, the program will solve and give x = 1 for each index value. If you set sense = maximize, the program will solve and give x = 0 for each index value. I'm not too sure what you mean by stickiness, but I don't think this program does it. and it implements the step function.
This assumes that your z is not also indexed by time. If so, I would need to edit this answer:
model.t = RangeSet(*time*)
model.x = Var(model.t, bounds=(0, 2))
model.z = Var(domain=Binary)
model.d = Disjunction(expr=[
[0 <= model.x[t] for t in model.t] + [model.x[t] <= 1 for t in model.t],
[1 <= model.x[t] for t in model.t] + [model.x[t] <= 2 for t in model.t]
])
TransformationFactory('gdp.bigm').apply_to(model)
SolverFactory('baron').solve(model)

Use Chi-Squared statistic in pymc3

I am trying to use PyMC3 to fit a model to some observed data. This model is based on external code (interfaced via theano.ops.as_op), and depends on multiple parameters that should be fit by the MCMC process. Since the gradient of the external code cannot be determined, I use the Metropolis-Hastings sampler.
I have established Uniform priors for my inputs, and generate a model using my custom code. However, I want to compare the simulated data to my observations (a 3D np.ndarray) using the chi-squared statistic (sum of the squares of data-model/sigma^2) to obtain a log-likelihood. When the MCMC samples are drawn, this should lead to the trace converging on the best values of each parameter.
My model is explained in the following semi-pseudocode (if that's even a word):
import pymc3 as pm
#Some stuff setting up the data, preparing some functions etc.
#theano.compile.ops.as_op(itypes=[input types],otypes = [output types])
def make_model(inputs):
#Wrapper to external code to generate simulated data
return simulated data
model = pm.model()
with model:
#priors for 13 input parameters
simData = make_model(inputs)
I now want to obtain the chi-squared logLikelihood for this model versus the data, which I think can be done using pm.ChiSquared, however I do not see how to combine the data, model and this distribution together to cause the sampler to perform correctly. I would guess it might look something like:
chiSq = pm.ChiSquared(nu=data.size, observed = (data-simData)**2/err**2)
trace = pm.sample(1000)
Is this correct? In running previous tests, I have found the samples appear to be simply drawn from the priors.
Thanks in advance.
Taking aloctavodia's advice, I was able to get parameter estimates for some toy exponential data using a pm.Normal likelihood. Using a pm.ChiSquared likelihood as the OP suggested, the model converged to correct values, but the posteriors on the parameters were roughly three times as broad. Here's the code for the model; I first generated data and then fit with PyMC3.
# Draw `nPoints` observed data points `y_obs` from the function
# 3. + 18. * numpy.exp(-.2 * x)
# with the points evaluated at `x_obs`
# x_obs = numpy.linspace(0, 100, nPoints)
# Add Normal(mu=0,sd=`cov`) noise to each point in `y_obs`
# Then instantiate PyMC3 model for fit:
def YModel(x, c, a, l):
# exponential model expected to describe the data
mu = c + a * pm.math.exp(-l * x)
return mu
def logp(y_mod, y_obs):
# Normal distribution likelihood
return pm.Normal.dist(mu = y_mod, sd = cov).logp(y_obs)
# Chi squared likelihood (to use, comment preceding line & uncomment next 2 lines)
#chi2 = chi2 = pm.math.sum( ((y_mod - y_obs)/cov)**2 )
#return pm.ChiSquared.dist(nu = nPoints).logp(chi2)
with pm.Model() as model:
c = pm.Uniform('constant', lower = 0., upper = 10., testval = 5.)
a = pm.Uniform('amplitude', lower = 0., upper = 50., testval = 25.)
l = pm.Uniform('lambda', lower = 0., upper = 10., testval = 5.)
y_mod = YModel(x_obs, c, a, l)
L = pm.DensityDist('L', logp, observed = {'y_mod': y_mod, 'y_obs': y_obs}, testval = {'y_mod': y_mod, 'y_obs': y_obs})
step = pm.Metropolis([c, a, l])
trace = pm.sample(draws = 10000, step = step)
The above model converged, but I found that success was sensitive to the bounds on the priors and the initial guesses on those parameters.
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
c 3.184397 0.111933 0.002563 2.958383 3.397741 1834.0 1.000260
a 18.276887 0.747706 0.019857 16.882025 19.762849 1343.0 1.000411
l 0.200201 0.013486 0.000361 0.174800 0.226480 1282.0 0.999991
(Edited: I had forgotten to sum the squares of the normalized residuals for chi2)

Return progress status when drawing a large NetworkX graph

I have a large graph that I'm drawing that is taking a long time to
process.
Is it possible to return a status, current_node, or percentage of the current status of the drawing?
I'm not looking to incrementally draw the network as all I'm doing it is saving it to a high dpi image.
Here's an example of the code I'm using:
path = nx.shortest_path(G, source=u'1234', target=u'98765')
path_edges = zip(path, path[1:])
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G,pos,nodelist=path,node_color='r')
nx.draw_networkx_edges(G,pos,edgelist=path_edges,edge_color='r',width=10)
plt.axis('equal')
plt.savefig('prototype_map.png', dpi=1000)
plt.show()
I believe the only way to do it is to accommodate the source code of draw function to print something saying 10%, 20% complete.... But when I checked the source code of draw_networkx_nodes & draw_networkx, I realized that it is not a straight forward task as the draw function stores the positions (nodes and edges) in a numpy array, send it to the ax.scatter function of matplotlib (sourcecode) which is a bit hard to manipulate without messing something up. The only thing I can think of is to change:
xy = numpy.asarray([pos[v] for v in nodelist]) # In draw_networkx_nodes function
To
xy = []
count = 0
for v in nodelist:
xy.append(pos[v])
count +=1
if (count == len(nodelist)):
print '50% of nodes completed'
print '100% of nodes completed'
xy = numpy.asarray(xy)
Similarly when draw_network_edges is called, to indicate progress in edges drawing. I am not sure how far from truth this will be because I do not know how much time is spent in the ax.scatter function. I also, looked in the source code of the scatter function but I could not pin point a loop or something to print an indication that some progress has been done.
Some layout functions accept pos argument to conduct incremental work. We can use this fact to split the computation into chunks and draw a progress bar using tqdm
def plot_graph(g, iterations=50, pos=None, k_numerator=None, figsize=(10, 10)):
if k_numerator is None:
k = None
else:
k = k_numerator / np.sqrt(g.number_of_nodes())
with tqdm(total=iterations) as pbar:
step = 5
iterations_done = 0
while iterations_done < iterations:
pos = nx.layout.fruchterman_reingold_layout(
g, iterations=step, pos=pos, k=k
)
iterations_done += step
pbar.update(step)
fig = plt.figure(figsize=figsize, dpi=120)
nx.draw_networkx(
g,
pos,
)
return fig, pos

Unit Testing probability

I have a method that creates a 2 different instances (M, N) in a given x of times (math.random * x) the method will create object M and the rest of times object N.
I have written unit-tests with mocking the random number so I can assure that the method behaves as expected. However I am not sure on how to (and if) to test that the probability is accurate, for example if x = 0.1 I expect 1 out of 10 cases to return instance M.
How do I test this functionality?
Split the test. The first test should allow you to define what the random number generator returns (I assume you already have that). This part of the test just satisfies the "do I get the expected result if the random number generator would return some value".
The second test should just run the random number generator using some statistical analysis function (like counting how often it returns each value).
I suggest to wrap the real generator with a wrapper that returns "create M" and "create N" (or possibly just 0 and 1). That way, you can separate implementation from the place where it's used (the code which creates the two different instance shouldn't need to know how the generator is initialized or how you turn the real result into "create X".
I'll do this in the form of Python.
First describe your functionality:
def binomial_process(x):
'''
given a probability, x, return M with that probability,
else return N with probability 1-x
maybe: return random.random() > x
'''
Then test for this functionality:
import random
def binom(x):
return random.random() > x
Then write your test functions, first a setup function to put together your data from an expensive process:
def setUp(x, n):
counter = dict()
for _ in range(n):
result = binom(x)
counter[result] = counter.get(result, 0) + 1
return counter
Then the actual test:
import scipy.stats
trials = 1000000
def test_binomial_process():
ps = (.01, .1, .33, .5, .66, .9, .99)
x_01 = setUp(.01, trials)
x_1 = setUp(.1, trials)
x_33 = setUp(.1, trials)
x_5 = setUp(.5, trials)
x_66 = setUp(.9, trials)
x_9 = setUp(.9, trials)
x_99 = setUp(.99, trials)
x_01_result = scipy.stats.binom_test(x_01.get(True, 0), trials, .01)
x_1_result = scipy.stats.binom_test(x_1.get(True, 0), trials, .1)
x_33_result = scipy.stats.binom_test(x_33.get(True, 0), trials, .33)
x_5_result = scipy.stats.binom_test(x_5.get(True, 0), trials)
x_66_result = scipy.stats.binom_test(x_66.get(True, 0), trials, .66)
x_9_result = scipy.stats.binom_test(x_9.get(True, 0), trials, .9)
x_99_result = scipy.stats.binom_test(x_99.get(True, 0), trials, .99)
setups = (x_01, x_1, x_33, x_5, x_66, x_9, x_99)
results = (x_01_result, x_1_result, x_33_result, x_5_result,
x_66_result, x_9_result, x_99_result)
print 'can reject the hypothesis that the following tests are NOT the'
print 'results of a binomial process (with their given respective'
print 'probabilities) with probability < .01, {0} trials each'.format(trials)
for p, setup, result in zip(ps, setups, results):
print 'p = {0}'.format(p), setup, result, 'reject null' if result < .01 else 'fail to reject'
Then write your function (ok, we already did):
def binom(x):
return random.random() > x
And run your tests:
test_binomial_process()
Which on last output gives me:
can reject the hypothesis that the following tests are NOT the
results of a binomial process (with their given respective
probabilities) with probability < .01, 1000000 trials each
p = 0.01 {False: 10084, True: 989916} 4.94065645841e-324 reject null
p = 0.1 {False: 100524, True: 899476} 1.48219693752e-323 reject null
p = 0.33 {False: 100633, True: 899367} 2.96439387505e-323 reject null
p = 0.5 {False: 500369, True: 499631} 0.461122365668 fail to reject
p = 0.66 {False: 900144, True: 99856} 2.96439387505e-323 reject null
p = 0.9 {False: 899988, True: 100012} 1.48219693752e-323 reject null
p = 0.99 {False: 989950, True: 10050} 4.94065645841e-324 reject null
Why do we fail to reject on p=0.5? Let's look at the help on scipy.stats.binom_test:
Help on function binom_test in module scipy.stats.morestats:
binom_test(x, n=None, p=0.5, alternative='two-sided')
Perform a test that the probability of success is p.
This is an exact, two-sided test of the null hypothesis
that the probability of success in a Bernoulli experiment
is `p`.
Parameters
----------
x : integer or array_like
the number of successes, or if x has length 2, it is the
number of successes and the number of failures.
n : integer
the number of trials. This is ignored if x gives both the
number of successes and failures
p : float, optional
The hypothesized probability of success. 0 <= p <= 1. The
default value is p = 0.5
alternative : {'two-sided', 'greater', 'less'}, optional
Indicates the alternative hypothesis. The default value is
'two-sided'.
So .5 is the default null hypothesis for test, and it makes sense not to reject the null hypothesis in this case.