ray.get() becomes very slow when i increase the number of epochs

ray.get() becomes very slow when i increase the number of epochs - ray

I use ray.get(current_weights) to get the final weights of my model after finishing all epochs. The problem is when i use for example 2 epochs, ray.get(current_weights) takes 17s, and with 100 epochs takes 1000s. I don’t know why ?
for epoch in range(n_epoch):
start_epoch = time.time()
for b in range(total_batch):
gradients = [worker.compute_gradients.remote(current_weights) for worker in workers]
current_weights = ps.apply_gradients.remote(*gradients)
if (epoch + 1) % n_epoch == 0:
weights = ray.get(current_weights) # This line is so slow if i increase n_epoch
model.set_weights(weights)

Related

ValueError: Target size (torch.Size([10, 1])) must be the same as input size (torch.Size([10, 2]))

A binary classification problem with Batch Size = 10. Trying to use torch.nn.BCEWithLogitsLoss().
~\Anaconda3\envs\notebook\lib\site-packages\torch\nn\functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
2578
2579 if not (target.size() == input.size()):
-> 2580 raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
2581
2582 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
ValueError: Target size (torch.Size([1, 10])) must be the same as input size (torch.Size([10, 2]))
Here is my training code:
def train(epochs):
print('Starting training..')
for e in range(0, epochs):
exp_lr_scheduler.step()
print('='*20)
print(f'Starting epoch {e + 1}/{epochs}')
print('='*20)
train_loss = 0.
val_loss = 0.
resnet18.train() # set model to training phase
for train_step, (images, labels) in enumerate(dl_train):
optimizer.zero_grad()
outputs = resnet18(images)
outputs = outputs.float()
loss = loss_fn(outputs, labels.unsqueeze(0))
loss.backward()
optimizer.step()
train_loss += loss.item()
if train_step % 20 == 0:
print('Evaluating at step', train_step)
accuracy = 0
resnet18.eval() # set model to eval phase
for val_step, (images, labels) in enumerate(dl_val):
outputs = resnet18(images)
outputs = outputs.float()
loss = loss_fn(outputs, labels.unsqueeze(0))
val_loss += loss.item()
_, preds = torch.max(outputs, 1)
accuracy += sum((preds == labels).numpy())
val_loss /= (val_step + 1)
accuracy = accuracy/len(val_dataset)
print(f'Validation Loss: {val_loss:.4f}, Accuracy: {accuracy:.4f}')
show_preds()
resnet18.train() #set model to training phase
if accuracy >= 0.95:
print('Performance condition satisfied, stopping..')
return
train_loss /= (train_step + 1)
print(f'Training Loss: {train_loss:.4f}')
print('Training complete..')**
train(epochs=30)

Target size (torch.Size([1, 10])) must be the same as input size (torch.Size([10, 2]))
Seems to me you have two issues:
target size (a.k.a. ground truth tensor) should have the batch on the first axis: (1, 10).
From what you've described you are dealing with a binary classification task not a multi-label (2-class) classification task. Therefore input size (a.k.a. model's output) should have a shape of (10, 1).
In a binary classification task you should only have a single logit coming out of your model, i.e. your last nn.Linear layer should have a single neuron. The output will define which class has been predicted. Since you are using nn.BCEWithLogitsLoss, the loss input should be the raw output (since it includes a Sigmoid layer, cf. documentation) and should have a shape matching (batch_size=10, 1). Similarly, the target tensor should have the same shape. Its content would be 0s and 1s in shape (batch_size=10, 1).

Using gradient descent to solve a nonlinear system

I have the following code, which uses gradient descent to find the global minimum of y = (x+5)^2:
cur_x = 3 # the algorithm starts at x=3
rate = 0.01 # learning rate
precision = 0.000001 # this tells us when to stop the algorithm
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 # iteration counter
df = lambda x: 2*(x+5) # gradient of our function
while previous_step_size > precision and iters < max_iters:
prev_x = cur_x # store current x value in prev_x
cur_x = cur_x - rate * df(prev_x) # grad descent
previous_step_size = abs(cur_x - prev_x) # change in x
iters = iters+1 # iteration count
print("Iteration",iters,"\nX value is",cur_x) # print iterations
print("The local minimum occurs at", cur_x)
The procedure is fairly simple, and among the most intuitive and brief for solving such a problem (at least, that I'm aware of).
I'd now like to apply this to solving a system of nonlinear equations. Namely, I want to use this to solve the Time Difference of Arrival problem in three dimensions. That is, given the coordinates of 4 observers (or, in general, n+1 observers for an n dimensional solution), the velocity v of some signal, and the time of arrival at each observer, I want to reconstruct the source (determine it's coordinates [x,y,z].
I've already accomplished this using approximation search (see this excellent post on the matter: ), and I'd now like to try doing so with gradient descent (really, just as an interesting exercise). I know that the problem in two dimensions can be described by the following non-linear system:
sqrt{(x-x_1)^2+(y-y_1)^2}+s(t_2-t_1) = sqrt{(x-x_2)^2 + (y-y_2)^2}
sqrt{(x-x_2)^2+(y-y_2)^2}+s(t_3-t_2) = sqrt{(x-x_3)^2 + (y-y_3)^2}
sqrt{(x-x_3)^2+(y-y_3)^2}+s(t_1-t_3) = sqrt{(x-x_1)^2 + (y-y_1)^2}
I know that it can be done, however I cannot determine how.
How might I go about applying this to 3-dimensions, or some nonlinear system in general?

Early stoping in Sklearn GradientBoostingRegressor

I am using a monitor-class as implemented here
class Monitor():
"""Monitor for early stopping in Gradient Boosting for classification.
The monitor checks the validation loss between each training stage. When
too many successive stages have increased the loss, the monitor will return
true, stopping the training early.
Parameters
----------
X_valid : array-like, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples
and n_features is the number of features.
y_valid : array-like, shape = [n_samples]
Target values (integers in classification, real numbers in
regression)
For classification, labels must correspond to classes.
max_consecutive_decreases : int, optional (default=5)
Early stopping criteria: when the number of consecutive iterations that
result in a worse performance on the validation set exceeds this value,
the training stops.
"""
def __init__(self, X_valid, y_valid, max_consecutive_decreases=5):
self.X_valid = X_valid
self.y_valid = y_valid
self.max_consecutive_decreases = max_consecutive_decreases
self.losses = []
def __call__(self, i, clf, args):
if i == 0:
self.consecutive_decreases_ = 0
self.predictions = clf._init_decision_function(self.X_valid)
predict_stage(clf.estimators_, i, self.X_valid, clf.learning_rate,
self.predictions)
self.losses.append(clf.loss_(self.y_valid, self.predictions))
if len(self.losses) >= 2 and self.losses[-1] > self.losses[-2]:
self.consecutive_decreases_ += 1
else:
self.consecutive_decreases_ = 0
if self.consecutive_decreases_ >= self.max_consecutive_decreases:
print("f"
"({}): s {}.".format(self.consecutive_decreases_, i)),
return True
else:
return False
params = { 'n_estimators': nEstimators,
'max_depth': maxDepth,
'min_samples_split': minSamplesSplit,
'min_samples_leaf': minSamplesLeaf,
'min_weight_fraction_leaf': minWeightFractionLeaf,
'min_impurity_decrease': minImpurityDecrease,
'learning_rate': 0.01,
'loss': 'quantile',
'alpha': alpha,
'verbose': 0
}
model = ensemble.GradientBoostingRegressor( **params )
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
It works very well. However, it is not clear for me what model this line
model.fit( XTrain, yTrain, monitor = Monitor( XTest, yTest, 25 ) )
returns:
1) No model
2) The model trained before stopping
3) The model 25 iterations before ( note the parameter of the monitor )
If it is not (3), is it possible to make the estimator returning 3?
How can I do that?
It is worth mentioning that xgboost library does that, however it does allow to use the loss function that I need.

the model returns the fit before the "stopping rule" stops the model - means your answer No.2 is the right one.
the problem with this 'monitor code' is that the chosen model in the end will be the one that include the 25 extra iterations. the chosen model should be your NO.3 answer.
I think the easy (and stupid) way to do that is by running the same model (with seed - to have same results) but keep the model no of iterations equal to (i - max_consecutive_decreases)

"IndexError: Index X is out of bounds" generated when converting 3.X python script to 2.7

I have a script I wrote for 3.X that runs great, however I needed to convert it to 2.7 and while doing so came across this error I don't know how to solve.
This function corrects data by dividing it into periods and finding a percentile for each period to apply to said data.
import numpy
import math
import random
from bokeh.plotting import *
from bokeh.layouts import *
from __future__ import division
def rs_percent_corr(start, end, rs, rso, thresh, period):
num_periods = int(math.ceil((end - start) / period))
rs_period = numpy.zeros(period)
rso_period = numpy.zeros(period)
period_corr = numpy.zeros(num_periods)
# Placing intervals in separate array for easy handling
rs_interval = numpy.array(rs[start:end])
rso_interval = numpy.array(rso[start:end])
# separate the interval into predefined periods and compute correction
count_one = 0 # index for full correction interval
count_two = 0 # index for within each period
count_three = 0 # index for number of periods
while count_one < len(rs_interval):
if (count_two < period) and count_one == len(rs_interval) - 1:
# if statement handles final period
rs_period[count_two] = rs_interval[count_one]
rso_period[count_two] = rso_interval[count_one]
count_one += 1
count_two += 1
while count_two < period:
# This fills out the rest of the final period with NaNs so
# they are not impacted by the remaining zeros
rs_period[count_two] = numpy.nan
rso_period[count_two] = numpy.nan
count_two += 1
ratio = numpy.divide(rs_period, rso_period)
period_corr[count_three] = numpy.nanpercentile(ratio, thresh)
elif count_two < period:
# haven't run out of data points, and period still hasn't been filled
rs_period[count_two] = rs_interval[count_one]
rso_period[count_two] = rso_interval[count_one]
count_one += 1
count_two += 1
else:
# end of a period
count_two = 0
ratio = numpy.divide(rs_period, rso_period)
period_corr[count_three] = numpy.nanpercentile(ratio, thresh)
count_three += 1
return period_corr
When running the script in 3.X it works, but trying to run it in 2.7 generates "IndexError: Index 110 is out of bounds for axis 0 with size 110" on the line period_corr[count_three] = numpy.nanpercentile(ratio, thresh)
What am I missing? Thank you in advance for your time.

The culprit is the line num_periods = int(math.ceil((end - start) / period)).
In Python 3 / is true division. 3 / 2 will return 1.5. In Python 2 this is not the case, / performs integer division so 3/2 will return 1.
If you need to support both versions in the same time, you can insert from __future__ import division in the first line of your script, then Python 2 / will behave like Python 3's.

Using Parfor to create matrix from a vector

I am new to Matlab and would appreciate any assistance possible!
I am running a simulation and so the results vary with each run of the simulation. I want to collect the results for analysis.
For example, during the first simulation run, the level of a plasma coagulation factor may vary over 5 hours as such:
R(1) = [1.0 0.98 0.86 0.96 0.89]
In the second run, the levels at each time period may be slightly different, eg.
R(2) = [1.0 0.95 0.96 0.89 0.86]
I would like to (perhaps by using the parfor function) to create a matrix eg.
R = [1.0 0.98 0.86 0.96 0.89
1.0 0.95 0.96 0.89 0.86]
I have encountered problems ranging from "In an assignment A(I) = B, the number of elements in B and I must be the same" to getting a matrix of zeros or ones (depending on what I use for the preallocation).
I will need the simulation to run about 10000 times in order to collect a meaningful amount of results.
Can anyone suggest how this might be achieved? A detailed guidance or (semi)complete code would be much appreciated for someone new to Matlab like me.
Thanks in advance!
This is my actual code, and as you can see, there are 4 variables that vary over 744 hours (31 days) which I would like to individually collect:
Iterations = 10000;
PGINR = zeros(Iterations, 744);
PGAmount = zeros(Iterations, 744);
CAINR = zeros(Iterations, 744);
CAAmount = zeros(Iterations, 744);
for iii = 1:Iterations
[{PGINR(iii)}, {PGAmount(iii)}, {CAINR(iii)}, {CAAmount(iii)}] = ChineseTTRSimulationB();
end
filename = 'ChineseTTRSimulationResults.xlsx';
xlswrite(filename, PGINR, 2)
xlswrite(filename, PGAmount, 3)
xlswrite(filename, CAINR, 5)
xlswrite(filename, CAAmount, 6)

Are you looking for something like this?
I simplified a little bit your code for better understanding and added some dummy data, function.
main.m
Iterations = 10;
PGINR = zeros(Iterations, 2);
PGAmount = zeros(Iterations, 2);
%fake data
x = rand(Iterations,1);
y = rand(Iterations,1);
parfor iii = 1:Iterations
[PGINR(iii,:), PGAmount(iii,:)] = ChineseTTRSimulationB(x(iii), y(iii));
end
ChineseTTRSimulationB.m
function [PGINRi, PGAmounti] = ChineseTTRSimulationB(x,y)
PGINRi = [x + y, x];
PGAmounti = [x*y, y];
end

save each parfor-result in cells, and combine them later.
Iterations = 10000;
PGINR = cell(1, Iterations);
PGAmount = cell(1, Iterations);
CAINR = cell(1, Iterations);
CAAmount = cell(1, Iterations);
parfor i = 1:Iterations
[PGINR{i}, PGAmount{i}, CAINR{i}, CAAmount{i}] = ChineseTTRSimulationB();
end
PGINR = cell2mat(PGINR); % 1x7440000 vector
%and so on...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ray.get() becomes very slow when i increase the number of epochs - ray

Related

ValueError: Target size (torch.Size([10, 1])) must be the same as input size (torch.Size([10, 2]))

Using gradient descent to solve a nonlinear system

Early stoping in Sklearn GradientBoostingRegressor

"IndexError: Index X is out of bounds" generated when converting 3.X python script to 2.7

Using Parfor to create matrix from a vector

Categories

Resources