This problem is an extension of an earlier question I aksed
Python pyomo : how and where to store sumproduct involving decision variables (1d array) and fixed data (matrix) (i have solved this piece).
Brief background : I am trying to solve an optimization problem where I need to select the best store from where an order can be fulfilled. For this illustration I have 2 orders (O1, O2) and 3 stores (str_1, str_2, str_3). While selecting the best store to fulfill an order, there are 4 factors : A, B, C and D. So for fulfilling order 1, each store will have 4 set of scores corresponding to each factor. Score will be between 0 and 1.
I need to determine the optimal weights for 4 factors (wtA, wtB, wtC, wtD - decision variables) such that sumproduct of weights and the score is maximum. (Weights should be between 0 and 100). For instance, say if we check if store 1 can service order 1, then sumproduct = wtA * score_O1_str_1_A + wtB * score_O1_str_1_B + wtC * score_O1_str_1_C + wtD * score_O1_str_1_D
I am holding the sumproduct in a dictionary now. The idea is to pick for each order the store with maximum score, and to maximize the sum of scores across all orders. E.g.
O1 -> str_1 = 88 ; O1 -> str_2 = 90 ; O1 -> str_3 = 86 ; O2 -> str_1 = 82 ; O2 -> str_2 = 92 ; O2 -> str_3 = 85 ;
Above are the weighted scores - sumproduct we get by multiplying weights (decision variables) and the scores (given to us). From above I would want to pick weights such that the sum of maximum score from fulfilling each order is maximized. So in this case O1 -> str_2 (90) and O2 -> str_2 (92). Sum of maximums = 90 + 92 = 182. I need to maximize this. I have written the code but doesn't seem to give me the right answer. Your help is much appreciated!
Please see the below code to see what I have done and where I am stuck:
from pyomo.environ import *
model = ConcreteModel(name="(weights)")
# scores for factors A, B, C and D for each order and store combination
order_str_scores = {
('O1', 'str_1') : [0.88, 0.85, 0.88, 0.93], # if order 1 is fulfilled from store 2 then these are the scores
('O1', 'str_2'): [0.93, 0.91, 0.95, 0.86],
('O1', 'str_3') : [0.83, 0.83, 0.87, 0.9],
('O2', 'str_1') : [0.85, 0.86, 0.84, 0.98],
('O2', 'str_2') : [0.87, 0.8, 0.85, 0.87],
('O2', 'str_3') : [0.91, 0.87, 0.95, 0.83],
}
model.orders = list(set([i[0] for i in order_str_scores.keys()]))
model.stores = list(set([i[1] for i in order_str_scores.keys()]))
# 4 factors (A, B, C & D) whose scores are mentioned in 'order_str_wts' dictionary
model.factors = ['A', 'B', 'C','D']
# below 4 decision variables (one for each factor) will hold the optimal number between 0 - 100
def dv_bounds(m, i):
return (0, 100)
model.x1 = Var(model.factors, within=NonNegativeReals, bounds=dv_bounds)
#Sum of these 4 decision variables should be equal to 100
def sum_wts(m):
return sum(m.x1[i] for i in model.factors) == 100
model.sum_wts = Constraint(rule=sum_wts)
# here I hold the sumproduct of scores and corresponding weights for each factor
D = {}
model.k = [('O1', 'str_1'), ('O1', 'str_2'), ('O1', 'str_3'), ('O2', 'str_1'), ('O2', 'str_2'), ('O2', 'str_3')]
for i in model.k:
D[i] = sum(model.x1[n] * order_str_scores[i][q] for n,q in zip(model.factors,range(4)))
# BELOW I GET STUCK. IDEA IS TO SELECT FOR EACH ORDER THE STORE WHICH HAS THE SUM OF WEIGHTED SCORES AND THE SUM OF SCORES ACROSS ORDERS SHOULD BE MAXIMUM
# create decision variable for each order, which will hold the maximum weighted score
model.x3 = Var(model.orders, within=NonNegativeReals, bounds=dv_bounds)
# add constraints
model.cons = ConstraintList()
for i in model.k:
model.cons.add(model.x3[i[0]] >= D[i])
model.obj = Objective(rule=obj_rule, sense=maximize)
opt = SolverFactory('glpk')
result_obj = opt.solve(model, tee=True)
model.display()
What solver does is - and it makes sense is to pick a value of model.x3['O1'] and model.x3['O2'] = 100. This is because the upper bound I have set is 100 for each and since it needs to maximize the sum it picks 100 for each.
However, what I want is model.x3['O1'] and model.x3['O2'] to pick the actual maximum value corresponding to the store from dictionary 'D' where I hold the sumproducts (weighted scores). And pick optimal weights such that the sum of the weighted scores - corresponding to the store with maximum weights is maximixed.
UPDATE : I was able to solve the problem, please look at the below complete code listing
from pyomo.environ import *
import itertools
model = ConcreteModel(name="(weights)")
# scores for factors 'A', 'B', 'C','D' for each order - store combination
order_str_scores = {
('O1', 'str_1') : [0.90, 0.90, 0.71, 0.93],
('O1', 'str_2'): [0.83, 0.91, 0.95, 0.86],
('O1', 'str_3') : [0.83, 0.83, 0.87, 0.9],
('O2', 'str_1') : [0.71, 0.56, 0.84, 0.55],
('O2', 'str_2') : [0.97, 0.9, 0.95, 0.87],
('O2', 'str_3') : [0.91, 0.87, 0.95, 0.83],
('O3', 'str_1') : [0.81, 0.86, 0.84, 0.85],
('O3', 'str_2') : [0.89, 0.84, 0.95, 0.87],
('O3', 'str_3') : [0.97, 0.87, 0.95, 0.86],
('O4', 'str_1') : [0.95, 0.96, 0.84, 0.85],
('O4', 'str_2') : [0.89, 0.74, 0.95, 0.87],
('O4', 'str_3') : [0.87, 0.77, 0.85, 0.83],
('O5', 'str_1') : [0.61, 0.86, 0.94, 0.85],
('O5', 'str_2') : [0.99, 0.84, 0.98, 0.97],
('O5', 'str_3') : [0.77, 0.87, 0.95, 0.83],
}
# bounds for each of the factors
factor_bounds = {
'A' : [80, 99],
'B' : [0, 20],
'C': [0, 20],
'D' : [0, 20],
}
# list of unique orders and stores. These will act as indices
model.orders = sorted(list(set([i[0] for i in order_str_scores.keys()])))
model.stores = sorted(list(set([i[1] for i in order_str_scores.keys()])))
# 4 factors 'availability', 'distance', 'storeCapacity','smoothing' whose scores are mentioned in 'order_str_wts' dictionary
model.factors = ['A', 'B', 'C','D']
# below 4 decision variables (one for each factor) will hold the optimal number between 0 - 100
def dv_bounds(m, i):
return (factor_bounds[i][0], factor_bounds[i][1])
model.x1 = Var(model.factors, within=NonNegativeReals, bounds=dv_bounds)
#Sum of these 4 decision variables should be equal to 100
def sum_wts(m):
return sum(m.x1[i] for i in model.factors) == 100
model.sum_wts = Constraint(rule=sum_wts)
# Hold the sumproduct of scores and corresponding weights for each factor
D = {}
model.k = list(itertools.product(model.orders, model.stores))
for i in model.k:
D[i] = sum(model.x1[n] * order_str_scores[i][q] for n,q in zip(model.factors,range(4)))
# DV : Binary auxiliary variable to help find the store with max weighted score
model.is_max = Var(model.orders, model.stores, within=Binary)
# DV : Variable to hold the maximum weighted score for each order
model.max_value = Var(model.orders, model.stores, within=NonNegativeReals)
# 1st helper constraint : or each order sum of binary DV variable == 1
def is_max_1(m, i):
return sum(m.is_max[i, j] for j in model.stores) == 1
model.is_max_1 = Constraint(model.orders, rule=is_max_1)
# 1st helper constraint to find the maximum weighted score and the corresponding store for each order
def is_max_const(m,i,j):
return m.max_value[i,j] <= 1000 * m.is_max[i,j]
model.is_max_const = Constraint(model.orders, model.stores, rule=is_max_const)
# 2nd helper constraint to find the maximum weighted score and the corresponding store for each order
def const_2(m,i,j,k):
return m.max_value[i, j] + (1000 * (1 - m.is_max[i, j])) >= D[i, k]
model.const_2 = Constraint(model.orders, model.stores, model.stores, rule=const_2)
# 3rd helper constraint to ensure that the selected max_value is greater than the D
def const_3(m,i,j):
return m.max_value[i,j] <= D[i, j]
model.const_3 = Constraint(model.orders, model.stores,rule=const_3)
# Define the objective function
def obj_rule(m):
return sum(m.max_value[i,j] for i in m.orders for j in m.stores)
model.obj = Objective(rule=obj_rule, sense=maximize)
opt = SolverFactory('glpk')
result_obj = opt.solve(model, tee=True)
model.display()
Related
I am using GLPK and I am struggling in understanding why an upper bound constraint is not respected.
I have something like this:
param n0;
param n1;
param n2;
param start_0{i in 0..n0};
param end_0{i in 0..n0};
param start_1{i in 0..n1};
param end_1{i in 0..n1};
param start_2{i in 0..n2};
param end_2{i in 0..n2};
var y0 {k in 0..n0} binary;
var y1 {k in 0..n1} binary;
var y2 {k in 0..n2} binary;
[...]
minimize obj: (sum{k in 0..n2}(end_2[k]*y2[k] + 600*y2[k]) - (sum{k in 0..n0} (start_0[k]*y0[k])));
[...]
s.t. c1: (sum{k in 0..n2} (end_2[k]*y2[k] ) - (sum{k in 0..n0} (start_0[k]*y0[k] ))) <= 7000;
s.t. c2_1: sum{k in 0..n0} y0[k] = 1 ;
s.t. c2_2: sum{k in 0..n1} y1[k] = 1 ;
s.t. c2_3: sum{k in 0..n2} y2[k] = 1 ;
[...]
solve;
[...]
printf (sum{k in 0..n2} (end_2[k]*y2[k] ) - (sum{k in 0..n0} (start_0[k]*y0[k] )));
The last printf gives me 7200. But the constraint c1 above should ensure that the difference is not greater than 7000.
The solver output is the following:
GLPK Integer Optimizer, v4.65
7 rows, 353 columns, 1332 non-zeros
353 integer variables, all of which are binary
Preprocessing...
6 rows, 353 columns, 1059 non-zeros
353 integer variables, all of which are binary
Scaling...
A: min|aij| = 1.000e+00 max|aij| = 1.625e+09 ratio = 1.625e+09
GM: min|aij| = 9.998e-01 max|aij| = 1.000e+00 ratio = 1.000e+00
EQ: min|aij| = 9.996e-01 max|aij| = 1.000e+00 ratio = 1.000e+00
2N: min|aij| = 7.561e-01 max|aij| = 1.000e+00 ratio = 1.323e+00
Constructing initial basis...
Size of triangular part is 6
Solving LP relaxation...
GLPK Simplex Optimizer, v4.65
6 rows, 353 columns, 1059 non-zeros
0: obj = -2.232000000e+05 inf = 1.329e-04 (2)
2: obj = 6.210000000e+04 inf = 0.000e+00 (0)
* 5: obj = 7.097321429e+03 inf = 0.000e+00 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
Long-step dual simplex will be used
+ 5: mip = not found yet >= -inf (1; 0)
+ 272: >>>>> 1.320000000e+04 >= 1.140000000e+04 13.6% (237; 2)
+ 304: mip = 1.320000000e+04 >= tree is empty 0.0% (0; 477)
INTEGER OPTIMAL SOLUTION FOUND
Time used: 0.0 secs
Memory used: 0.8 Mb (851464 bytes)
Display statement at line 22
[...]
Model has been successfully processed
What I am doing wrong?
Thanks a lot for your help,
I have encountered several optimization problems that involve identifying one or more indices in a vector that maximizes or minimizes a cost. Is there a way to identify such indices in linear programming? I'm open to solutions in mathprog, CVXR, CVXPY, or any other API.
For example, identifying an index is needed for change point problems (find the index at which the function changes), putting distance constraints on the traveling salesman problem (visit city X before cumulative distance Y).
As a simple example, suppose we want to identify the location in a vector where the sum on either side is the most equal (their difference is smallest). In this example, the solution is index 5:
x = c(1, 3, 6, 4, 7, 9, 6, 2, 3)
Attempt 1
Using CVXR, I tried declaring split_index and using that as an index (e.g., x[1:split]):
library(CVXR)
split_index = Variable(1, integer = TRUE)
objective = Minimize(abs(sum(x[1:split_index]) - sum(x[(split_index+1):length(x)])))
result = solve(objective)
It errs 1:split_index with NA/NaN argument.
Attempt 2
Declare an explicit index-vector (indices) and do an elementwise logical test whether split_index <= indices. Then element-wise-multiply that binary vector with x to select one or the other side of the split:
indices = seq_along(x)
split_index = Variable(1, integer = TRUE)
is_first = split_index <= indices
objective = Minimize(abs(sum(x * is_first) - sum(x * !is_first)))
result = solve(objective)
It errs in x * is_first with non-numeric argument to binary operator. I suspect that this error arises because is_first is now an IneqConstraint object.
Symbols in red are decision variables and symbols in blue are constants.
R code:
> library(Rglpk)
> library(CVXR)
>
> x <- c(1, 3, 6, 4, 7, 9, 6, 2, 3)
> n <- length(x)
> delta <- Variable(n, boolean=T)
> y <- Variable(2)
> order <- list()
> for (i in 2:n) {
+ order[[as.character(i)]] <- delta[i-1] <= delta[i]
+ }
>
>
> problem <- Problem(Minimize(abs(y[1]-y[2])),
+ c(order,
+ y[1] == t(1-delta) %*% x,
+ y[2] == t(delta) %*%x))
> result <- solve(problem,solver = "GLPK", verbose=T)
GLPK Simplex Optimizer, v4.47
30 rows, 12 columns, 60 non-zeros
0: obj = 0.000000000e+000 infeas = 4.100e+001 (2)
* 7: obj = 0.000000000e+000 infeas = 0.000e+000 (0)
* 8: obj = 0.000000000e+000 infeas = 0.000e+000 (0)
OPTIMAL SOLUTION FOUND
GLPK Integer Optimizer, v4.47
30 rows, 12 columns, 60 non-zeros
9 integer variables, none of which are binary
Integer optimization begins...
+ 8: mip = not found yet >= -inf (1; 0)
+ 9: >>>>> 1.000000000e+000 >= 0.000000000e+000 100.0% (2; 0)
+ 9: mip = 1.000000000e+000 >= tree is empty 0.0% (0; 3)
INTEGER OPTIMAL SOLUTION FOUND
> result$getValue(delta)
[,1]
[1,] 0
[2,] 0
[3,] 0
[4,] 0
[5,] 0
[6,] 1
[7,] 1
[8,] 1
[9,] 1
> result$getValue(y)
[,1]
[1,] 21
[2,] 20
>
The absolute value is automatically linearized by CVXR.
At the end of the day, if you are selecting things by index, I think you need to work this with a set of corresponding binary selection variables. The fact that you are selecting "things in a row" as in your example problem is just something that needs to be handled with constraints on the binary variables.
To solve the problem you posed, I made a set of binary selection variables, call it s[i] where i = {0, 1, 2, ..., len(x)} and then constrained:
s[i] <= s[i-1] for i = {1, 2, ..., len(x)}
which enforces the "continuity" from the start up to the first non-selection and then thereafter.
My solution is in Python. LMK if you'd like me to post. The concept above, I think, is what you are asking about.
This is essentially the "Multiple Coins from Multiple Mints / Baseball Players" example from Doing Bayesian Data Analysis, Second Edition (DBDA2). I believe I have PyMC3 code which is functionally equivalent, but one works and the other does not. This is with PyMC version 3.5. In more detail,
Let's say I have the following data. Each row is an observation:
observations_dict = {
'mint': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'coin': [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 7],
'outcome': [1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1]
}
observations = pd.DataFrame(observations_dict)
observations
One Mint, Several Coins
The below, which implements DBDA2 Figure 9.7, runs just fine:
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model:
# mint is characterized by omega and kappa
omega = pm.Beta('omega', 1., 1.)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# each coin is described by a theta
theta = pm.Beta('theta', alpha=omega*(kappa-2)+1, beta=(1-omega)*(kappa-2)+1, shape=num_coins)
# define the likelihood
y = pm.Bernoulli('y', theta[coin_idx], observed=observations['outcome'])
Many Mints, Many Coins
However, once this is turned into a hierarchical model (as seen in DBDA2 Figure 9.13):
num_mints = observations['mint'].nunique()
mint_idx = observations['mint']
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameters for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Bernoulli('y2', p=theta[coin_idx], observed=observations['outcome'])
The error is:
ValueError: operands could not be broadcast together with shapes (8,) (20,)
as the model has 8 thetas for 8 coins but sees 20 rows of data.
However, if the data is grouped such that each line represents the final statistics of an individual coin, as with the following
grouped = observations.groupby(['mint', 'coin']).agg({'outcome': [np.sum, np.size]}).reset_index()
grouped.columns = ['mint', 'coin', 'heads', 'total']
And the final likelihood variable is changed to a Binomial, as follows
num_mints = grouped['mint'].nunique()
mint_idx = grouped['mint']
num_coins = grouped['coin'].nunique()
coin_idx = grouped['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameter for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Binomial('y2', n=grouped['total'], p=theta, observed=grouped['heads'])
Everything works. Now, the latter form is more efficient and generally preferred, but I believe the former should work as well. So I believe this is primarily a PyMC3 issue (or even more likely, a user error).
To quote DBDA Edition 1,
"The BUGS model uses a binomial likelihood distribution for total
correct, instead of using the Bernoulli distribution for individual
trials. This use of the binomial is just a convenience for shortening
the program. If the data were specified as trial-by-trial outcomes
instead of as total correct, then the model could include a
trial-by-trial loop and use a Bernoulli likelihood function"
What bothers me is that in the very first example (One Mint, Several Coins), it looks like PyMC3 can handle individual observations instead of aggregated observations just fine. So I believe the first form should work, but doesn't.
Code
http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%209.ipynb
References
PyMC3 - Differences in ways observations are passed to model -> difference in results?
https://discourse.pymc.io/t/pymc3-differences-in-ways-observations-are-passed-to-model-difference-in-results/501
http://www.databozo.com/deep-in-the-weeds-complex-hierarchical-models-in-pymc3
https://stats.stackexchange.com/questions/157521/is-this-correct-hierarchical-bernoulli-model
The length of mint_idx was 20 (one for each observation), but it should have been 8 (one for each coin).
Working answer, notice the mint_idx recalculation (rest remains the same):
grouped = observations.groupby(['mint', 'coin']).agg({'outcome': [np.sum, np.size]}).reset_index()
grouped.columns = ['mint', 'coin', 'heads', 'total']
num_mints = grouped['mint'].nunique()
mint_idx = grouped['mint']
num_coins = observations['coin'].nunique()
coin_idx = observations['coin']
with pm.Model() as hierarchical_model2:
# Hyper parameters
omega = pm.Beta('omega', 1, 1)
kappa_minus2 = pm.Gamma('kappa_minus2', 0.01, 0.01)
kappa = pm.Deterministic('kappa', kappa_minus2 + 2)
# Parameters for mints
omega_c = pm.Beta('omega_c',
omega*(kappa-2)+1, (1-omega)*(kappa-2)+1,
shape = num_mints)
kappa_c_minus2 = pm.Gamma('kappa_c_minus2',
0.01, 0.01,
shape = num_mints)
kappa_c = pm.Deterministic('kappa_c', kappa_c_minus2 + 2)
# Parameters for coins
theta = pm.Beta('theta',
omega_c[mint_idx]*(kappa_c[mint_idx]-2)+1,
(1-omega_c[mint_idx])*(kappa_c[mint_idx]-2)+1,
shape = num_coins)
y2 = pm.Bernoulli('y2', p=theta[coin_idx], observed=observations['outcome'])
Many thanks to #junpenglao!!
https://discourse.pymc.io/t/why-cant-i-use-a-bernoulli-as-a-likelihood-variable-in-a-hierarchical-model-in-pymc3/2022/2
I am attempting to implement a perceptron. I have loaded a 100x2 array of values between 0 and 100. Each item in the array has a label of either -1 or 1.
I believe the perceptron is working, however I cannot plot decision boundary as shown here: plot decision boundary matplotlib
When I run my code I only see a single color background. I would expect to see two colors, one color for each label in my data set (-1 and 1).
My current output, I expect to see 2 colors for the background (-1 or 1)
An example of what I hope to see, from the sklearn documentation
import numpy as np
from matplotlib import pyplot as plt
def generate_data():
#generate a dataset that is linearly seperable
group_1 = np.random.randint(50, 100, size=(50,2))
group_1_labels = np.full((50,1), 1)
group_2 = np.random.randint(0, 49, size =(50,2))
group_2_labels = np.full((50,1), -1)
#add a bias value of -1
bias = np.full((50,1), -1)
#add labels, upper right quadrant are 1, lower left are -1
group_1_with_bias = np.hstack((group_1, bias))
group_2_with_bias = np.hstack((group_2, bias))
group_1_labeled = np.hstack((group_1_with_bias, group_1_labels))
group_2_labeled = np.hstack((group_2_with_bias, group_2_labels))
#merge our labeled data and shuffle!
merged_data = np.vstack((group_1_labeled, group_2_labeled))
np.random.shuffle(merged_data)
return merged_data
data = generate_data()
#load data, strip labels, add a -1 bias value
X = data[:, :3]
#create labels matrix
l = np.ravel(data[:, 3:])
def perceptron_sgd(X, l, c, epochs):
#initialize weights
w = np.zeros(3)
errors = []
for epoch in range(epochs):
total_error = 0
for i, x in enumerate(X):
if (np.dot(x, w) * l[i]) <= 0:
total_error += (np.dot(x, w) * l[i])
w = w + c * (x * l[i])
errors.append(total_error * -1)
print "epoch " + str(epoch) + ": " + str(w)
return w, errors
def classify(X, l, w):
z = np.dot(X, w)
print z
z[z <= 0] = -1
z[z > 0] = 1
#return a matrix of predicted labels
return z
w, errors = perceptron_sgd(X, l, .001, 36)
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .2), np.arange(y_min, y_max, .2))
# here "model" is your model's prediction (classification) function
Z = classify(np.c_[xx.ravel(), yy.ravel()], l, w[:-1]) #strip the bias from weights
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis('off')
#Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=l, cmap=plt.cm.Paired)
I got it to work.
Standardized your X
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X[:, :-1])
X_trans = np.column_stack((scaler.transform(X[:, :-1]), X[:, -1]))
Better initialization than zero.
#initialize weights
r = np.sqrt(2)
w = np.random.uniform(-r, r, (3,))
Add learned biases during prediction
z = np.dot(X, w[:-1]) + w[-1]
Standardize during prediction as well (using standardization learned from input)
Z = classify(scaler.transform(np.c_[xx.ravel(), yy.ravel()]),
l, w) #strip the bias from weights
Generally, always a good idea to standardize the inputs.
Entire code:
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
def generate_data():
#generate a dataset that is linearly seperable
group_1 = np.random.randint(50, 100, size=(50,2))
group_1_labels = np.full((50,1), 1)
group_2 = np.random.randint(0, 49, size =(50,2))
group_2_labels = np.full((50,1), -1)
#add a bias value of -1
bias = np.full((50,1), -1)
#add labels, upper right quadrant are 1, lower left are -1
group_1_with_bias = np.hstack((group_1, bias))
group_2_with_bias = np.hstack((group_2, bias))
group_1_labeled = np.hstack((group_1_with_bias, group_1_labels))
group_2_labeled = np.hstack((group_2_with_bias, group_2_labels))
#merge our labeled data and shuffle!
merged_data = np.vstack((group_1_labeled, group_2_labeled))
np.random.shuffle(merged_data)
return merged_data
data = generate_data()
#load data, strip labels, add a -1 bias value
X = data[:, :3]
#create labels matrix
l = np.ravel(data[:, 3:])
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X[:, :-1])
X_trans = np.column_stack((scaler.transform(X[:, :-1]), X[:, -1]))
def perceptron_sgd(X, l, c, epochs):
#initialize weights
r = np.sqrt(2)
w = np.random.uniform(-r, r, (3,))
errors = []
for epoch in range(epochs):
total_error = 0
for i, x in enumerate(X):
if (np.dot(x, w) * l[i]) <= 0:
total_error += (np.dot(x, w) * l[i])
w = w + c * (x * l[i])
errors.append(total_error * -1)
print("epoch " + str(epoch) + ": " + str(w))
return w, errors
def classify(X, l, w):
z = np.dot(X, w[:-1]) + w[-1]
print(z)
z[z <= 0] = -1
z[z > 0] = 1
#return a matrix of predicted labels
return z
w, errors = perceptron_sgd(X_trans, l, .01, 25)
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .1), np.arange(y_min, y_max, .1))
# here "model" is your model's prediction (classification) function
Z = classify(scaler.transform(np.c_[xx.ravel(), yy.ravel()]), l, w) #strip the bias from weights
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4)
#plt.axis('off')
#Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=l, cmap=plt.cm.Paired)
given 1xN dataframe table, need to pick 5 largest values from the row and return the corresponding column names into a list.
this is the dataframe sample:
5 2 13 15 37 8 89
PageRank 0.444384 0.44453 0.444695 0.444882 0.444759 0.44488 0.444648
Have tried this,
r = list(pr.loc['PageRank'].nlargest(5))
but the list created has only the values in rows, not the column names.
How to get the column names of 5 largest cell values?
for example, in the given dataframe, it should return
[15,37,13,89,5]
You can get some added performance by using Numpy's np.argpartition. I'll use it on the negative of the values in order to get the correct direction.
I wanted to use np.argpartition instead of sorting because it is O(n) rather than sorting which is O(nlogn).
cols = pr.columns.values
rnks = -pr.values[0]
cols[np.argpartition(rnks, 5)[:5]].tolist()
['37', '15', '13', '8', '89']
Timing
You'll notice that pir1 outperforms. But also notice that nlargest asymptotically approaches the performance of pir1 because they are both O(n).
jez1 = lambda d: list(d.loc['PageRank'].nlargest(5).index)
jez2 = lambda d: d.columns[d.loc['PageRank'].values.argsort()[::-1]][:5].tolist()
jez3 = lambda d: d.columns[d.loc['PageRank'].values.argsort()[-1:-6:-1]].tolist()
pir1 = lambda d: d.columns.values[np.argpartition(-d.values[0], 5)[:5]].tolist()
res = pd.DataFrame(
index=[10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000, 1000000],
columns='jez1 jez2 jez3 pir1'.split(),
dtype=float
)
for i in res.index:
d = pd.DataFrame(dict(PageRank=np.random.rand(i))).T
for j in res.columns:
stmt = '{}(d)'.format(j)
setp = 'from __main__ import d, {}'.format(j)
res.at[i, j] = timeit(stmt, setp, number=200)
res.plot(loglog=True)
Timing Ratio
This table shows the ratio of each method's time relative to the minimum time taken for that particular length of array.
res.div(res.min(1), 0)
jez1 jez2 jez3 pir1
10 20.740497 8.666576 6.738210 1.0
30 39.325125 11.962184 10.987012 1.0
100 30.121521 10.184435 10.173252 1.0
300 58.544734 11.963354 12.563072 1.0
1000 63.643729 9.361290 8.547374 1.0
3000 22.041026 15.977949 18.803516 1.0
10000 9.254778 11.620570 11.681464 1.0
30000 2.838243 7.522210 7.120721 1.0
100000 1.814005 7.486602 6.995017 1.0
300000 1.920776 13.213261 12.423890 1.0
1000000 1.332265 7.872120 7.225150 1.0
Use index:
r1 = list(pr.loc['PageRank'].nlargest(5).index)
print (r1)
[15, 8, 37, 13, 89]
Or:
r1 = pr.columns[pr.loc['PageRank'].values.argsort()][-1:-6:-1].tolist()
print (r1)
[15, 8, 37, 13, 89]