How can you appropriately index a pyomo parameter that is initialized with a list by a multi-dimensional pyomo set? - pyomo

For my application, I am trying to initialize a parameter for a concrete model using a function rule that outputs a list. The pyomo set I am using to index this parameter is multi-dimensional. It is important for this set to be multi-dimensional because it makes accessing the data structure I am pulling values from for the parameter much simpler. However, when I attempt to index with the set this way, I receive an index error.
Here is a simple test code that illustrates my issue.
First, I have my imports as necessary
## Testing pyomo sets and indexing
import pyomo.environ as pyo
import numpy as np
import itertools as iter
Then I define a concrete model and some numpy arrays
M = pyo.ConcreteModel()
a = np.arange(3)
b = np.arange(3)
Using an itertools product, I generate a list of two-dimensional tuples
T = list(iter.product(a,b))
M.T = pyo.Set(initialize = T)
M.T.pprint()
The output of this print is
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 2 : Any : 9 : {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)}
I define a function that returns a list and define a pyomo parameter that initializes with this function.
def paramListInitialize(M):
List = []
for i in M.T:
List.append(i)
return List
M.Param3 = pyo.Param(M.T, initialize = paramListInitialize(M))
I receive the following error.
ERROR: Rule failed for Param 'Param3' with index 0: KeyError: "Index '0' is
not valid for indexed component 'Param3'"
ERROR: Constructing component 'Param3' from data=None failed: KeyError: "Index
'0' is not valid for indexed component 'Param3'"
KeyError: "Index '0' is not valid for indexed component 'Param3'"
I am confused because I am able to define pyomo variable with this multi-dimensional index set and I can initialize parameters with lists if the index set in one dimensional; however, I am unable to get the pyomo object to associate the tuples in the pyomo set to the values in the initialization list.
It would be very helpful to know why this is not working for me or if there is another way to generate a multi-dimensional pyomo set that is not in the form of a list of tuples.

You need to look more closely at the dox on how to use a rule to initialize here.
The rule needs to return a single value for the index or indices passed in.
Here is an example using your 2-dim set:
import pyomo.environ as pyo
import numpy as np
import itertools as iter
m = pyo.ConcreteModel()
a = np.arange(3)
b = np.arange(3)
T = list(iter.product(a,b))
m.T = pyo.Set(initialize = T)
def example(m, a, b):
return a + b
m.p = pyo.Param(m.T, initialize=example)
m.pprint()
Yields:
1 Set Declarations
T : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 2 : Any : 9 : {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)}
1 Param Declarations
p : Size=9, Index=T, Domain=Any, Default=None, Mutable=False
Key : Value
(0, 0) : 0
(0, 1) : 1
(0, 2) : 2
(1, 0) : 1
(1, 1) : 2
(1, 2) : 3
(2, 0) : 2
(2, 1) : 3
(2, 2) : 4
2 Declarations: T p

Related

How to include take the absolute value in objective function solving using pyomo and glpk

I have to find the optimum cost of building links between nodes. In my objective function, I am trying to minimise the cost. The problem can be solved to determine the variable, however the optimal value of my cost is incorrect as I want it to take the absolute value of each cost. How can I modify my codes as I cannot use abs() in the objective function?
cost+=(model.a[i,k]-model.a[j,k])model.cmodel.d[i,j].
This value can be negative if model.a[j,k]=1 or positive if model.a[i,k]=1
from pyomo.environ import *
# Creation of a Concrete Model
model = ConcreteModel()
# Sets
model.i = Set(initialize=[1,2,3,4,5,6,7,8,9,10,11,12,13], doc='Nodes')
model.k = Set(initialize=['Orange','SFR', 'Bouygues'], doc='Companies')
# Parameters
model.c = Param(initialize=25, doc='Cost of transforming an existing link into a backbone link in euro/km')
links={
(1, 2) : 1.8,
(1, 7) : 1,
(1, 13) : 5.4,
(2, 8) : 2.3,
(2, 3) : 1.7,
(2, 5) : 7,
(2, 7) : 2,
(2, 12) : 3,
(3, 4) : 2,
(3, 10) : 6.5,
(4, 5) : 1,
(4, 6) : 2,
(5, 8) : 5,
(5, 10) : 1,
(5, 11) : 1.5,
(6, 11) : 2.1,
(7, 12) : 2,
(8, 9) : 2,
(8, 13) : 0.7,
(9, 10) : 1.1,
(10, 11) : 1,
(12, 13) : 2.5,
}
model.d=Param(model.i, model.i,default=0, initialize=links, doc='distance in 10 km between nodes')
# Variables
model.a = Var(model.i, model.k, within=Binary, doc='Binary variable indicating whether node i belongs to company k (0 if it does not belong and 1 if it belongs)')
#Contraints#
def allocation_rule(model, i):
return sum(model.a[i,k] for k in model.k) == 1
model.allocation = Constraint(model.i, rule=allocation_rule, doc='Each node can only belong to one company')
def minimum_rule(model, k):
return sum(model.a[i,k] for i in model.i) >= 2
model.minimum = Constraint(model.k, rule=minimum_rule, doc='Each company must have at least 2 nodes')
#objective
def totalcost(model):
cost=0
for i in model.i:
for j in model.i:
if model.d[i,j]!=0:
for k in model.k:
cost+=(model.a[i,k]-model.a[j,k])*model.c*model.d[i,j]
return cost
model.z = Objective(rule=totalcost, sense=minimize, doc='Minimize the cost of implementing a backbone connecting the three sub-networks')
def total(model):
return model.cost_postive-model.cost_negative
## Display of the output ##
optimizer = SolverFactory("glpk",executable='/usr/bin/glpsol') #creates an optimizer object that uses the glpk package installed to your usr/bin.
optimizer.solve(model) #tells your optimizer to solve the model object
model.display()
I have tried using the cost+=abs((model.a[i,k]-model.a[j,k])model.cmodel.d[i,j]) but this makes the problem non-linear so it cannot be solved.
edited to introduce a new variable p, and added 2 constraints to p>=(model.a[i,k]-model.a[j,k])model.cmodel.d[i,j]) and
p>=-(model.a[i,k]-model.a[j,k])model.cmodel.d[i,j]). However, it returns with error: ERROR:pyomo.core:Rule failed for Param 'd' with index (1, 2):
from pyomo.environ import *
# Creation of a Concrete Model
model = ConcreteModel()
# Sets
model.i = Set(initialize=[1,2,3,4,5,6,7,8,9,10,11,12,13],
doc='Nodes')
model.i = Set(initialize=['Orange','SFR', 'Bouygues'],
doc='Companies')
# Parameters
model.c = Param(initialize=25, doc='Cost of transforming an
existing link into a backbone link in euro/km')
links={
(1, 2) : 1.8,
(1, 7) : 1,
(2, 3) : 1.7,
(2, 5) : 7,
(2, 7) : 2,
(2, 12) : 3,
(3, 4) : 2,
(3, 10) : 6.5,
(4, 5) : 1,
(4, 6) : 2,
(5, 8) : 5,
(5, 10) : 1,
(5, 11) : 1.5,
(6, 11) : 2.1,
(7, 12) : 2,
(8, 9) : 2,
(8, 13) : 0.7,
(9, 10) : 1.1,
(10, 11) : 1,
(12, 13) : 2.5,
(1, 13) : 5.4,
(2, 8) : 2.3,
}
model.d=Param(model.i, model.i,default=0, initialize=links, doc='distance in 10 km between nodes')
# Variables
model.a = Var(model.i, model.k, within=Binary, doc='Binary variable indicating whether node i belongs to company k (0 if it does not belong and 1 if it belongs)')
model.p = Var(model.i,model.k, within=(0.0,None), doc='Cost of building backbone link p_ij')
#Contraints#
def allocation_rule(model, i):
return sum(model.a[i,k] for k in model.k) == 1
model.allocation = Constraint(model.i, rule=allocation_rule, doc='Each node can only belong to one company')
def minimum_rule(model, k):
return sum(model.a[i,k] for i in model.i) >= 2
model.minimum = Constraint(model.k, rule=minimum_rule, doc='Each company must have at least 2 nodes')
def absolute_rule1(model):
return model.p >=(model.a[i,k]-
model.a[j,k])*model.c*model.d[i,j]
model.absolute1 = Constraint(model.i, rule=absolute_rule1, doc='To take the positive cost')
def absolute_rule2(model):
for i in model.i:
for j in model.i:
if model.d[i,j]!=0:
for k in model.k:
return model.p >=-(model.a[i,k]-
model.a[j,k])*model.c*model.d[i,j]
model.absolute2 = Constraint(model.i, rule=absolute_rule2, doc='To take the positive cost')
#objective
def totalcost(model):
cost=0
for i in model.i:
for j in model.i:
if model.d[i,j]!=0:
for k in model.k:
cost+=model.p
return cost
model.z = Objective(rule=totalcost, sense=minimize, doc='Minimize the cost of implementing a backbone connecting the three sub-networks')
Below is a slightly modified approach.
You could put in the helper variables to get to absolute value, but I think that might lead you a bit astray in your objective, as I mentioned in the comment. Specifically, if you have 3 companies, the best you could do for "ownership" would be 1 company owning it, so as you summed over all three companies, you would get one "zero" cost and two actual costs, which is probably not desired.
I reformulated a bit to something which kinda does the same thing with a couple new variables. Realize there is "upward pressure" in the model for link ownership... cost is reduced (good) if more links are owned, so the variable I put in assesses each link by company and only allows ownership if they own both nodes.
The other new variable indicates whether a link is owned or not, independent of company. I think you could probably do without that, but it adds a little clarity. You could get the same thing (remove the variable, I think) by observing:
build_link >= 1 - sum(own_link)
Also, a reminder... I didn't see in your original code that you were inspecting the solver results. Always, always, always do that to ensure the status is "optimal" or you are looking at junk response.
Code:
from pyomo.environ import *
links={
(1, 2) : 1.8,
(1, 7) : 1,
(1, 13) : 5.4,
(2, 8) : 2.3,
(2, 3) : 1.7,
(2, 5) : 7,
(2, 7) : 2,
(2, 12) : 3,
(3, 4) : 2,
(3, 10) : 6.5,
(4, 5) : 1,
(4, 6) : 2,
(5, 8) : 5,
(5, 10) : 1,
(5, 11) : 1.5,
(6, 11) : 2.1,
(7, 12) : 2,
(8, 9) : 2,
(8, 13) : 0.7,
(9, 10) : 1.1,
(10, 11) : 1,
(12, 13) : 2.5,
}
# Creation of a Concrete Model
model = ConcreteModel()
# Sets
model.i = Set(initialize=[1,2,3,4,5,6,7,8,9,10,11,12,13], doc='Nodes')
model.k = Set(initialize=['Orange','SFR', 'Bouygues'], doc='Companies')
model.links = Set(within=model.i*model.i, initialize=links.keys())
# Parameters
model.c = Param(initialize=25, doc='Cost of transforming an existing link into a backbone link in euro/km')
model.d = Param(model.links, default=0, initialize=links, doc='distance in 10 km between nodes')
# Variables
model.a = Var(model.i, model.k, within=Binary, doc='Binary variable indicating whether node i belongs to company k (0 if it does not belong and 1 if it belongs)')
model.own_link = Var(model.links, model.k, within=Binary, doc='Own the link')
model.build_link = Var(model.links, within=Binary, doc='build link')
#Contraints#
def allocation_rule(model, i):
return sum(model.a[i,k] for k in model.k) == 1
model.allocation = Constraint(model.i, rule=allocation_rule, doc='Each node can only belong to one company')
def minimum_rule(model, k):
return sum(model.a[i,k] for i in model.i) >= 2
model.minimum = Constraint(model.k, rule=minimum_rule, doc='Each company must have at least 2 nodes')
def link_owner(model, k, n1, n2):
return model.own_link[n1, n2, k] <= 0.5 * (model.a[n1, k] + model.a[n2, k])
model.link1 = Constraint(model.k, model.links, rule=link_owner)
# link the "build link" variable to lack of link ownership
def link_build(model, *link):
return model.build_link[link] >= 1 - sum(model.own_link[link, k] for k in model.k)
model.build_constraint = Constraint(model.links, rule=link_build)
# objective
cost = sum(model.build_link[link]*model.c*model.d[link] for link in model.links)
model.z = Objective(expr=cost, sense=minimize, doc='Minimize the cost of implementing a backbone connecting the three sub-networks')
## Display of the output ##
optimizer = SolverFactory("glpk") #creates an optimizer object that uses the glpk package installed to your usr/bin.
result = optimizer.solve(model) #tells your optimizer to solve the model object
print(result)
print('Link Ownership Plan:')
for idx in model.own_link.index_set():
if model.own_link[idx].value: # will be true if it is 1, false if 0
print(idx, model.own_link[idx].value)
print('\nLink Build Plan:')
for idx in model.build_link.index_set():
if model.build_link[idx].value: # will be true if it is 1, false if 0
print(idx, model.build_link[idx].value)
Output:
Problem:
- Name: unknown
Lower bound: 232.5
Upper bound: 232.5
Number of objectives: 1
Number of constraints: 105
Number of variables: 128
Number of nonzeros: 365
Sense: minimize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 2183
Number of created subproblems: 2183
Error rc: 0
Time: 0.21333098411560059
Solution:
- number of solutions: 0
number of solutions displayed: 0
Link Ownership Plan:
(1, 2, 'Orange') 1.0
(1, 7, 'Orange') 1.0
(1, 13, 'Orange') 1.0
(2, 8, 'Orange') 1.0
(2, 5, 'Orange') 1.0
(2, 7, 'Orange') 1.0
(2, 12, 'Orange') 1.0
(3, 10, 'SFR') 1.0
(4, 6, 'Bouygues') 1.0
(5, 8, 'Orange') 1.0
(6, 11, 'Bouygues') 1.0
(7, 12, 'Orange') 1.0
(8, 9, 'Orange') 1.0
(8, 13, 'Orange') 1.0
(12, 13, 'Orange') 1.0
Link Build Plan:
(2, 3) 1.0
(3, 4) 1.0
(4, 5) 1.0
(5, 10) 1.0
(5, 11) 1.0
(9, 10) 1.0
(10, 11) 1.0

Defining pyomo parameters from dataframe columns

I am trying to make a number of parameters from the columns of pandas dataframes, where the index will be the set elements, the column name would be the parameter name, and the column values would be the parameter values. Is there any way to do this automatically, rather than one by one?
An example is:
import pandas as pd
import numpy as np
from pyomo import environ as pe
model = pe.ConcreteModel()
df1 = pd.DataFrame(np.array([
[1, 5, 9],
[2, 4, 61],
[3, 24, 9]]),
columns=['p1', 'p2', 'p3'])
model.myset = pe.Set(initialize=df1.index.tolist())
def init1(myset, num):
return df1['p1'][num]
def init2(myset, num):
return df1['p2'][num]
def init3(myset, num):
return df1['p3'][num]
model.p1 = pe.Param(model.myset, initialize=init1)
model.p2 = pe.Param(model.myset, initialize=init2)
model.p3 = pe.Param(model.myset, initialize=init3)
However, to make this more succinct I would like to
a) use a single function init for each column, by passing the column name (p1, p2, or p3) to the function, and
b) not have to write out a new line to define each parameter p1, p2, p3.
It seems that a) should be possible (though I haven't figure out how), but I'm not sure about b). I had tried looping over the columns of the dataframe, but from what I can tell, the pyomo parameter names must be declared explicitly.
Try something like this....
Recognize that I used your same dataframe, but I think you were trying to make the index the letters {a, b, c} so I just made that the df index and the controlling set, leaving the other 2 for your parameters. If you just want to let pandas auto-index it then you could use that integer set for your pyomo index.
Also, pyomo likes dictionary relationships for the key:value pair of the indexed parameters, so you just need to shoot it the pandas series in dictionary format as shown.
** edited to answer your second question about instantiating model components from columns with the model.add_component() function
import pandas as pd
import numpy as np
import pyomo.environ as pe
model = pe.ConcreteModel()
df1 = pd.DataFrame(np.array([
['a', 5, 9],
['b', 4, 61],
['c', 24, 9]]),
columns=['S', 'p1', 'p2'])
df1.set_index('S', inplace=True) # declare the index in the df
model.S = pe.Set(initialize=df1.index)
for c in df1.columns:
df1[c] = pd.to_numeric(df1[c]) # convert the numeric types in columns p1, p2
model.add_component(c, pe.Param(model.S, initialize=df1[c].to_dict(), within=pe.Reals))
model.pprint()
Yields:
1 Set Declarations
S : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'a', 'b', 'c'}
2 Param Declarations
p1 : Size=3, Index=S, Domain=Reals, Default=None, Mutable=False
Key : Value
a : 5
b : 4
c : 24
p2 : Size=3, Index=S, Domain=Reals, Default=None, Mutable=False
Key : Value
a : 9
b : 61
c : 9
3 Declarations: S p1 p2

ValueError on tensorflow while_loop shape invariants

import tensorflow as tf
cluster_size = tf.constant(6) # size of the cluster
m = tf.constant(6) # number of contigs (column size)
n = tf.constant(3) # number of points in a single contigs (column size)
contigs_index = tf.reshape(tf.range(0, m, 1, dtype=tf.int32), [1, -1])
contigs = tf.constant(
[[1.1, 2.2, 3.3], [6.6, 5.5, 4.4], [7.7, 8.8, 9.9], [11.1, 22.2, 33.3],
[66.6, 55.5, 44.4], [77.7, 88.8, 99.9]])
# pad zeo to the right till fixed length
def rpad_with_zero(points):
points = tf.slice(tf.pad(points, tf.reshape(tf.concat(
[tf.zeros([1, 2], tf.int32), tf.add(
tf.zeros([1, 2], tf.int32),
tf.subtract(cluster_size, tf.size(points)))], 0), [2, -1]), "CONSTANT"),
(0, tf.subtract(cluster_size, tf.size(points))),
(1, cluster_size))
return points
#calculate pearson correlation coefficient r value
def calculate_pcc(row, contigs):
r = tf.divide(tf.subtract(
tf.multiply(tf.to_float(n), tf.reduce_sum(tf.multiply(row, contigs), 1)),
tf.multiply(tf.reduce_sum(row, 1), tf.reduce_sum(contigs, 1))),
tf.multiply(
tf.sqrt(tf.subtract(
tf.multiply(tf.to_float(n), tf.reduce_sum(tf.square(row), 1)),
tf.square(tf.reduce_sum(row, 1)))),
tf.sqrt(tf.subtract(tf.multiply(
tf.to_float(n), tf.reduce_sum(tf.square(contigs), 1)),
tf.square(tf.reduce_sum(contigs, 1)))
)))
return r
#slice first row from contigs
row = tf.slice(contigs, (0, 0), (1, 3))
#calculate pcc
r = calculate_pcc(row, contigs)
#cluster member index whose r value is greater than 0.90, then casting to
# int32,
members0_index = tf.cast(tf.reshape(tf.where(tf.greater(r, 0.90)), [1, -1]),
tf.int32)
#members = index <intersection> members, padding the members index with
# zeros at right, to keep the fixed cluster length
members0_index = rpad_with_zero(
tf.reshape(tf.sets.set_intersection(contigs_index, members0_index).values,
[1, -1]))
#update index with the rest element index from contigs, and padding
contigs_index = rpad_with_zero(
tf.reshape(tf.sets.set_difference(contigs_index, members0_index).values,
[1, -1]))
#def condition(contigs, contigs_index, members0_index):
def condition(contigs_index, members0_index):
return tf.greater(tf.count_nonzero(contigs_index),
0) # iterate until there is a contig
#def body(contigs, contigs_index, members0_index):
def body(contigs_index, members0_index):
i = tf.reshape(tf.slice(contigs_index, [0, 0], [1, 1]),
[]) #the first element in the contigs_index
row = tf.slice(contigs, (i, 0),
(1, 3)) #slice the ith contig from contigs
r = calculate_pcc(row, contigs)
members_index = tf.cast(tf.reshape(tf.where(tf.greater(r, 0.90)), [1, -1]),
tf.int32)
members_index = rpad_with_zero(rpad_with_zero(
tf.reshape(tf.sets.set_intersection(contigs_index, members_index).values,
[1, -1])))
members0_index = tf.concat([members0_index, members_index], 0)
contigs_index = rpad_with_zero(
tf.reshape(tf.sets.set_difference(contigs_index, members_index).values,
[1, -1]))
#return [contigs, contigs_index, members0_index]
return [contigs_index, members0_index]
sess = tf.Session()
sess.run(tf.while_loop(condition, body,
#loop_vars=[contigs, contigs_index, members0_index],
loop_vars=[contigs_index, members0_index],
#shape_invariants=[contigs.get_shape(), contigs_index.get_shape(),
# tf.TensorShape([None, 6])]))
shape_invariants=[contigs_index.get_shape(), tf.TensorShape([None, 6])]))
The error is:
ValueError: The shape for while_12/Merge:0 is not an invariant for the
loop. It enters the loop with shape (1, 6), but has shape (?, ?) after
one iteration. Provide shape invariants using either the
shape_invariants argument of tf.while_loop or set_shape() on the
loop variables.
It seems the variable
contigs_index
is responsible, but i really don't know why! I unfold the loop execute each statement but could not find any shape mismatch!
shape_invariants=[contigs_index.get_shape(), tf.TensorShape([None, 6])])) should become shape_invariants=[tf.TensorShape([None, None]), tf.TensorShape([None, 6])])), to allow for shape changes of contigs_index variable (in the rpad_with_zero call).

python: Finding min values of subsets of a list

I have a list that looks something like this
(The columns would essentially be acct, subacct, value.):
1,1,3
1,2,-4
1,3,1
2,1,1
3,1,2
3,2,4
4,1,1
4,2,-1
I want update the list to look like this:
(The columns are now acct, subacct, value, min of the value for each account)
1,1,3,-4
1,2,-4,-4
1,3,1,-4
2,1,1,1
3,1,2,2
3,2,4,2
4,1,1,-1
4,2,-1,-1
The fourth value is derived by taking the min(value) for each account. So, for account 1, the min is -4, so col4 would be -4 for the three records tied to account 1.
For account 2, there is only one value.
For account 3, the min of 2 and 4 is 2, so the value for col 4 is 2 where account = 3.
I need to preserve col3, as I will need to use the value in column 3 for other calculations later. I also need to create this additional column for output later.
I have tried the following:
with open(file_name, 'rU') as f: #opens PW file
data = zip(*csv.reader(f, delimiter = '\t'))
# data = list(list(rec) for rec in csv.reader(f, delimiter='\t'))
#reads csv into a list of lists
#print the first row
uniqAcct = []
data[0] not in used and (uniqAcct.append(data[0]) or True)
But short of looping through and matching on each unique count and then going back through and adding a new column, I am stuck. I think there must be a pythonic way of doing this, but I cannot figure it out. Any help would be greatly appreciated!
I cannot use numpy, pandas, etc as they cannot be installed on this server yet. I need to use just basic python2
So the problem here is your data structure, it's not trivial to index.
Ideally you'd change it to something readible and keep it in those containers. However if you insist on changing it back into tuples I'd go with this construction
# dummy values
data = [
(1, 1, 3),
(1, 2,-4),
(1, 3, 1),
(2, 1, 1),
(3, 1, 2),
(3, 2, 4),
(4, 1, 1),
(4, 2,-1),
]
class Account:
def __init__(self, acct):
self.acct = acct
self.subaccts = {} # maps sub account id to it's value
def as_tuples(self):
min_value = min(val for val in self.subaccts.values())
for subacct, val in self.subaccts.items():
yield (self.acct, subacct, val, min_value)
def accounts_as_tuples(accounts):
return [ summary for acct_obj in accounts.values() for summary in acct_obj.as_tuples() ]
accounts = {}
for acct, subacct, val in data:
if acct not in accounts:
accounts[acct] = Account(acct)
accounts[acct].subaccts[subacct] = val
print(accounts_as_tuples(accounts))
But ideally, I'd keep it in the Account objects and just add a method that extracts the minimal value of the account when it's needed.
Here is another way using your initial approach.
Modify the way you import your data, so you can easily handle it in python.
import csv
mylist = []
with open(file_name, 'rU') as f: #opens PW file
data = csv.reader(f, delimiter = '\t')
for row in data:
splitted = row[0].split(',')
# this is in case you need integers
splitted = [int(i) for i in splitted]
mylist += [splitted]
Then, add the fourth column
updated = []
for acc in set(zip(*mylist)[0]):
acclist = [x for x in mylist if x[0] == acc]
m = min(i for sublist in acclist for i in sublist)
[l.append(m) for l in acclist]
updated += acclist

Pandas Dataframe ValueError: Shape of passed values is (X, ), indices imply (X, Y)

I am getting an error and I'm not sure how to fix it.
The following seems to work:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df.apply(func = random, axis = 1)
and my output is:
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
However, when I change one of the of the columns to a value such as 1 or None:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1
df.apply(func = random, axis = 1)
I get the the error:
ValueError: Shape of passed values is (5,), indices imply (5, 5)
I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change
def random(row):
return [1,2,3,4]
to
def random(row):
print [1,2,3,4]
everything seems to work normally.
This question is a clearer way of asking this question, which I feel may have been confusing.
My goal is to compute a list for each row and then create a column out of that.
EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.
If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:
>>> def random(row):
... return row.mean()
and then use apply:
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460 0.544879 0.554407 -0.161357 0.184867
2 0.269807 1.132344 0.120303 -0.116843 0.351403
3 -1.131396 1.278477 1.567599 0.483912 0.549648
4 0.288147 0.382764 -0.840972 0.838950 0.167222
I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...) instead of [...]):
>>> def random(row):
... return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 (1, 2, 3, 4, 5)
1 -0.198460 0.544879 0.554407 -0.161357 (1, 2, 3, 4, 5)
2 0.269807 1.132344 0.120303 -0.116843 (1, 2, 3, 4, 5)
3 -1.131396 1.278477 1.567599 0.483912 (1, 2, 3, 4, 5)
4 0.288147 0.382764 -0.840972 0.838950 (1, 2, 3, 4, 5)
I use the code below it is just fine
import numpy as np
df = pd.DataFrame(np.array(your_data), columns=columns)