I am trying to make a number of parameters from the columns of pandas dataframes, where the index will be the set elements, the column name would be the parameter name, and the column values would be the parameter values. Is there any way to do this automatically, rather than one by one?
An example is:
import pandas as pd
import numpy as np
from pyomo import environ as pe
model = pe.ConcreteModel()
df1 = pd.DataFrame(np.array([
[1, 5, 9],
[2, 4, 61],
[3, 24, 9]]),
columns=['p1', 'p2', 'p3'])
model.myset = pe.Set(initialize=df1.index.tolist())
def init1(myset, num):
return df1['p1'][num]
def init2(myset, num):
return df1['p2'][num]
def init3(myset, num):
return df1['p3'][num]
model.p1 = pe.Param(model.myset, initialize=init1)
model.p2 = pe.Param(model.myset, initialize=init2)
model.p3 = pe.Param(model.myset, initialize=init3)
However, to make this more succinct I would like to
a) use a single function init for each column, by passing the column name (p1, p2, or p3) to the function, and
b) not have to write out a new line to define each parameter p1, p2, p3.
It seems that a) should be possible (though I haven't figure out how), but I'm not sure about b). I had tried looping over the columns of the dataframe, but from what I can tell, the pyomo parameter names must be declared explicitly.
Try something like this....
Recognize that I used your same dataframe, but I think you were trying to make the index the letters {a, b, c} so I just made that the df index and the controlling set, leaving the other 2 for your parameters. If you just want to let pandas auto-index it then you could use that integer set for your pyomo index.
Also, pyomo likes dictionary relationships for the key:value pair of the indexed parameters, so you just need to shoot it the pandas series in dictionary format as shown.
** edited to answer your second question about instantiating model components from columns with the model.add_component() function
import pandas as pd
import numpy as np
import pyomo.environ as pe
model = pe.ConcreteModel()
df1 = pd.DataFrame(np.array([
['a', 5, 9],
['b', 4, 61],
['c', 24, 9]]),
columns=['S', 'p1', 'p2'])
df1.set_index('S', inplace=True) # declare the index in the df
model.S = pe.Set(initialize=df1.index)
for c in df1.columns:
df1[c] = pd.to_numeric(df1[c]) # convert the numeric types in columns p1, p2
model.add_component(c, pe.Param(model.S, initialize=df1[c].to_dict(), within=pe.Reals))
model.pprint()
Yields:
1 Set Declarations
S : Size=1, Index=None, Ordered=Insertion
Key : Dimen : Domain : Size : Members
None : 1 : Any : 3 : {'a', 'b', 'c'}
2 Param Declarations
p1 : Size=3, Index=S, Domain=Reals, Default=None, Mutable=False
Key : Value
a : 5
b : 4
c : 24
p2 : Size=3, Index=S, Domain=Reals, Default=None, Mutable=False
Key : Value
a : 9
b : 61
c : 9
3 Declarations: S p1 p2
Related
I am having some trouble filtering a pandas dataframe on a column (let's call it column_1) whose data type is a list. Specifically, I want to return only rows such that column_1 and the intersection of another predetermined list are not empty. However, when I try to put the logic inside the arguments of the .where, function, I always get errors. Below are my attempts, with the errors returned.
Attemping to test whether or not a single element is inside the list:
table[element in table['column_1']]
returns the error ...
KeyError: False
trying to compare a list to all of the lists in the rows of the dataframe:
table[[349569] == table.column_1] returns the error Arrays were different lengths: 23041 vs 1
I'm trying to get these two intermediate steps down before I test the intersection of the two lists.
Thanks for taking the time to read over my problem!
consider the pd.Series s
s = pd.Series([[1, 2, 3], list('abcd'), [9, 8, 3], ['a', 4]])
print(s)
0 [1, 2, 3]
1 [a, b, c, d]
2 [9, 8, 3]
3 [a, 4]
dtype: object
And a testing list test
test = ['b', 3, 4]
Apply a lambda function that converts each element of s to a set and intersection with test
print(s.apply(lambda x: list(set(x).intersection(test))))
0 [3]
1 [b]
2 [3]
3 [4]
dtype: object
To use it as a mask, use bool instead of list
s.apply(lambda x: bool(set(x).intersection(test)))
0 True
1 True
2 True
3 True
dtype: bool
Hi for long term use you can wrap the whole work flow in functions and apply the functions where you need. As you did not put any example dataset. I am taking an example data set and resolving it. Considering I have text database. First I will find the #tags into a list then I will search the only #tags I want and filter the data.
# find all the tags in the message
def find_hashtags(post_msg):
combo = r'#\w+'
rx = re.compile(combo)
hash_tags = rx.findall(post_msg)
return hash_tags
# find the requered match according to a tag list and return true or false
def match_tags(tag_list, htag_list):
matched_items = bool(set(tag_list).intersection(htag_list))
return matched_items
test_data = [{'text': 'Head nipid mõnusateks sõitudeks kitsastel tänavatel. #TipStop'},
{'text': 'Homses Rooli Võimus uus #Peugeot208!\nVaata kindlasti.'},
{'text': 'Soovitame ennast tulevikuks ette valmistada, electric car sest uus #PeugeotE208 on peagi kohal! ⚡️⚡️\n#UnboringTheFuture'},
{'text': "Aeg on täiesti uueks roadtrip'i kogemuseks! \nLase ennast üllatada - #Peugeot5008!"},
{'text': 'Tõeline ikoon, mille stiil avaldab muljet läbi eco car, electric cars generatsioonide #Peugeot504!'}
]
test_df = pd.DataFrame(test_data)
# find all the hashtags
test_df["hashtags"] = test_df["text"].apply(lambda x: find_hashtags(x))
# the only hashtags we are interested
tag_search = ["#TipStop", "#Peugeot208"]
# match the tags in our list
test_df["tag_exist"] = test_df["hashtags"].apply(lambda x: match_tags(x, tag_search))
# filter the data
main_df = test_df[test_df.tag_exist]
I have a function myfunc, which does calculations on two pandas DataFrame columns. Output is a Numpy array.
def myfunc(df, args):
import numpy
return numpy.array([df.iloc[:,args[0]].sum,df.iloc[:,args[1]].sum])
This function is called within rolling_df_apply:
def rolling_df_apply(df, myfunc, window, *args):
import pandas
result = pandas.concat(pandas.DataFrame(myfunc(df.iloc[i:window+i],args), index=[df.index[i+window-1]]) for i in xrange(0,len(df)-window+1))
return result
Running this via
import numpy
import pandas
df=pandas.DataFrame(numpy.random.randint(5,size=(5,2)))
window=3
args = [0,1]
result = rolling_df_apply(df, myfunc, window, *args)
gives ValueError within pandas.concat(): Shape of passed values is (1, 2), indices imply (1, 1).
What must be changed to get this running?
Which indices imply shape 1,1? Shape of all dataframes to concatenate should be 1,2, though.
In myfunc, .sum should be .sum() in myfunc.
Since myfunc returns an array of length 2,
pandas.DataFrame(myfunc(df.iloc[i:window+i],args), index=[df.index[i+window-1]])
is essentially the same as
pd.DataFrame([0,1], index=[0])
which raises
ValueError: Shape of passed values is (1, 2), indices imply (1, 1)
The error is saying that the value [0,1] implies 1 row and 2 columns,
while the index implies 1 row and 1 column.
On way to fix this would be to pass a dict instead of a list:
In [191]: pd.DataFrame({'a':0,'b':1}, index=[0])
Out[191]:
a b
0 0 1
So, to fix your code with minimal changes,
import pandas as pd
import numpy as np
def myfunc(df, args):
return {'a':df.iloc[:,args[0]].sum(), 'b':df.iloc[:,args[1]].sum()}
def rolling_df_apply(df, myfunc, window, *args):
frames = [pd.DataFrame(myfunc(df.iloc[i:window+i],args),
index=[df.index[i+window-1]])
for i in xrange(0,len(df)-window+1)]
result = pd.concat(frames)
return result
np.random.seed(2015)
df = pd.DataFrame(np.random.randint(5,size=(5,2)))
window=3
args = [0,1]
result = rolling_df_apply(df, myfunc, window, *args)
print(result)
yields
a b
2 7 6
3 7 5
4 3 3
However, it would be much more efficient to replace myfunc and rolling_df_apply with a call to pd.rolling_sum:
result = pd.rolling_sum(df, window=3).dropna(axis=0)
yields the same result.
I have a pandas dataframe that resembles one generated as follows.
import numpy as np
import pandas as pd
x0 = pd.DataFrame(np.random.normal(size=(10, 4)))
x1 = pd.DataFrame({'x': [1,1,2,3,2,3,4,1,2,3]})
df = pd.concat((x0, x1), axis=1)
and a function:
def fun(df, n=100):
z = np.random.normal(size=n)
return np.dot(df[[0,1,2,3]], [0.5*z,-1*z,0.3*z,1.2*z])
I would like to:
use identical draws z for each unique value in x,
take the product of the output in the above step over items of unique x
Any suggestion?
Explanation:
Generate n=100 draws to get z such that len(z)=100
For each elem in z, evaluate the function fun,
For i in df.x.unique(), compute the product of the output in step (2) element-wise. I am expecting to get a DataFrame or array of dimension (len(df.x.unique(), n=100)
4.
It sounds like you want to group by 'x', taking one of its instances (let's assume we take the first one observed).
just call your function as follows:
f = fun(df.groupby('x').first())
>>> f.shape
Out[25]: (4, 100)
>>> len(df.x.unique()
Out[26]: 4
I am getting an error and I'm not sure how to fix it.
The following seems to work:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df.apply(func = random, axis = 1)
and my output is:
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
[1,2,3,4]
However, when I change one of the of the columns to a value such as 1 or None:
def random(row):
return [1,2,3,4]
df = pandas.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))
df['E'] = 1
df.apply(func = random, axis = 1)
I get the the error:
ValueError: Shape of passed values is (5,), indices imply (5, 5)
I've been wrestling with this for a few days now and nothing seems to work. What is interesting is that when I change
def random(row):
return [1,2,3,4]
to
def random(row):
print [1,2,3,4]
everything seems to work normally.
This question is a clearer way of asking this question, which I feel may have been confusing.
My goal is to compute a list for each row and then create a column out of that.
EDIT: I originally start with a dataframe that hase one column. I add 4 columns in 4 difference apply steps, and then when I try to add another column I get this error.
If your goal is add new column to DataFrame, just write your function as function returning scalar value (not list), something like this:
>>> def random(row):
... return row.mean()
and then use apply:
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 -1.278878
1 -0.198460 0.544879 0.554407 -0.161357 0.184867
2 0.269807 1.132344 0.120303 -0.116843 0.351403
3 -1.131396 1.278477 1.567599 0.483912 0.549648
4 0.288147 0.382764 -0.840972 0.838950 0.167222
I don't know if it possible for your new column to contain lists, but it deinitely possible to contain tuples ((...) instead of [...]):
>>> def random(row):
... return (1,2,3,4,5)
...
>>> df['new'] = df.apply(func = random, axis = 1)
>>> df
A B C D new
0 0.201143 -2.345828 -2.186106 -0.784721 (1, 2, 3, 4, 5)
1 -0.198460 0.544879 0.554407 -0.161357 (1, 2, 3, 4, 5)
2 0.269807 1.132344 0.120303 -0.116843 (1, 2, 3, 4, 5)
3 -1.131396 1.278477 1.567599 0.483912 (1, 2, 3, 4, 5)
4 0.288147 0.382764 -0.840972 0.838950 (1, 2, 3, 4, 5)
I use the code below it is just fine
import numpy as np
df = pd.DataFrame(np.array(your_data), columns=columns)
I have the following C++ code
std::map<std::string, std::vector<std::vector<std::vector<double> > > > details
details["string"][index][index].push_back(123.5);
May I know what is the Pythonic to declare an empty map of vector of vector of vector? :p
I try to have
self.details = {}
self.details["string"][index][index].add(value)
I am getting
KeyError: 'string'
Probably the best way would be to use a dict for the outside container with strings for the keys mapping to an inner dictionary with tuples (the vector indices) mapping to doubles:
d = {'abc': {(0,0,0): 1.2, (0,0,1): 1.3}}
It's probably less efficient (less time-efficient at least, it's actually more space-efficient I would imagine) than actually nesting the lists, but IMHO cleaner to access:
>>> d['abc'][0,0,1]
1.3
Edit
Adding keys as you went:
d = {} #start with empty dictionary
d['abc'] = {} #insert a new string key into outer dict
d['abc'][0,3,3] = 1.3 #insert new value into inner dict
d['abc'][5,3,3] = 2.4 #insert another value into inner dict
d['def'] = {} #insert another string key into outer dict
d['def'][1,1,1] = 4.4
#...
>>> d
{'abc': {(0, 3, 3): 1.3, (5, 3, 3): 2.4}, 'def': {(1, 1, 1): 4.4}}
Or if using Python >= 2.5, an even more elegant solution would be to use defaultdict: it works just like a normal dictionary, but can create values for keys that don't exist.
import collections
d = collections.defaultdict(dict) #The first parameter is the constructor of values for keys that don't exist
d['abc'][0,3,3] = 1.3
d['abc'][5,3,3] = 2.4
d['def'][1,1,1] = 4.4
#...
>>> d
defaultdict(<type 'dict'>, {'abc': {(0, 3, 3): 1.3, (5, 3, 3): 2.4}, 'def': {(1, 1, 1): 4.4}})
Python is a dynamic (latent-typed) language, so there is no such thing as a "map of vector of vector of vector" (or "dict of list of list of list" in Python-speak). Dicts are just dicts, and can contain values of any type. And an empty dict is simply: {}
create dict that contains a nested list which inturn contains a nested list
dict1={'a':[[2,4,5],[3,2,1]]}
dict1['a'][0][1]
4
Using collections.defaultdict, you can try the following lambda trick below. Note that you'll encounter problems pickling these objects.
from collections import defaultdict
# Regular dict with default float value, 1D
dict1D = defaultdict(float)
val1 = dict1D["1"] # string key type; val1 == 0.0 by default
# 2D
dict2D = defaultdict(lambda: defaultdict(float))
val2 = dict2D["1"][2] # string and integer key types; val2 == 0.0 by default
# 3D
dict3D = defaultdict(lambda: defaultdict(lambda: defaultdict(float)))
val3 = dict3D[1][2][3] # val3 == 0.0 by default
# N-D, arbitrary nested defaultdicts
dict4D = defaultdict(lambda: defaultdict(lambda: defaultdict(lambda: defaultdict(str))))
val4 = dict4D["abc"][10][9][90] # val4 == '' by default
You can basically nest as many of these defaultdict collection types. Also, note that they behave like regular python dictionaries that can take the usual key types (non-mutable and hashable). Best of luck!