Create a dataframe based on an old one - python-2.7

I have a dataframe as :
A B C D
0 s 3 a
4 s 2 a
5 s 2 a
6 s 1 a
7 s 2 b
7 s 3 b
6 s 0 b
How can I create a new dataframe as the following?
A B C D
0 4 8 4-a
7 3 5 3-b
The new dataframe summarize the old one by grouped the elements of column "D", So "A" is the index, "B" is count of elements, "C" is sum of element where "D" has the same value.

Well, assuming that your data is stored in df, it's a multistep process which could be done like this
import pandas as pd
data = {'A': {0: 0, 1: 4, 2: 5, 3: 6, 4: 7, 5: 7, 6: 6},
'B': {0: 's', 1: 's', 2: 's', 3: 's', 4: 's', 5: 's', 6: 's'},
'C': {0: 3, 1: 2, 2: 2, 3: 1, 4: 2, 5: 3, 6: 0},
'D': {0: 'a', 1: 'a', 2: 'a', 3: 'a', 4: 'b', 5: 'b', 6: 'b'}}
df = pd.DataFrame(data)
# Handling column A (first index per value in D)
output_df = df.drop_duplicates(subset='D', keep='first')
# Itering through rows
for index, row in output_df.iterrows():
#Calcultating the counts in B
output_df.loc[index, 'B'] = df[df.D == row.D].B.count()
#Calcultating the sum in C
output_df.loc[index, 'C'] = df[df.D == row.D].C.sum()
#Finally changing values in D by concatenating values in B and D
output_df.loc[:, 'D'] = output_df.B.map(str) + "-" + output_df.D
Output :
A B C D
0 4 8 4-a
7 3 5 3-b

Related

df.info() doesn't show any information about the dataframe

When I run
print df
the result
A B C D
0 4 8 4-a
7 3 5 3-b
when I select only one column
print df['D']
Nothing showing
print df.info()
Nothing showing
I couldn't understant what is wrong?
I set the data using this code
import pandas as pd
data = {'A': {0: 0, 1: 4, 2: 5, 3: 6, 4: 7, 5: 7, 6: 6},
'B': {0: 's', 1: 's', 2: 's', 3: 's', 4: 's', 5: 's', 6: 's'},
'C': {0: 3, 1: 2, 2: 2, 3: 1, 4: 2, 5: 3, 6: 0},
'D': {0: 'a', 1: 'a', 2: 'a', 3: 'a', 4: 'b', 5: 'b', 6: 'b'}}
df = pd.DataFrame(data)
# Handling column A (first index per value in D)
output_df = df.drop_duplicates(subset='D', keep='first')
# Itering through rows
for index, row in output_df.iterrows():
#Calcultating the counts in B
output_df.loc[index, 'B'] = df[df.D == row.D].B.count()
#Calcultating the sum in C
output_df.loc[index, 'C'] = df[df.D == row.D].C.sum()
#Finally changing values in D by concatenating values in B and D
output_df.loc[:, 'D'] = output_df.B.map(str) + "-" + output_df.D

Dictionary from Pandas dataframe

I read two columns of a large file (10 million lines) using pandas read_csv (first line is the header), and now I want to convert the dataframe to a dictionary where the 1st column is the key and the second column is the value.
col_name = ['A', 'B'];
df = pd.read_csv(f_loc, usecols = col_name, sep = "\s+", dtype={'B':np.float16});
Create index with first column by set_index and convert by Series.to_dict:
df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
print (df)
a b
0 1 2
1 3 4
df = df.set_index('A')['B'].to_dict()
print (df)
{1: 2, 3: 4}
Another idea with zip:
d = dict(zip(df['A'], df['B']))
print (d)
{1: 2, 3: 4}
Or:
d = dict(df.values)
print (d)
{1: 2, 3: 4}

merge x python dictionaries into 1 with aggregated values

I have a list of dictionaries with the same key names, I want to consolidate the dictionaries into one dictionary with averaged values only in number-based values:
[{'a': 3, 'b': 'm', 'c': 7},
{'a': 1.0, 'b': 'm', 'c': 2},
{'a': 5, 'b': 'm', 'c': 4.0}]
into an averaged dictionary:
[{'a': 3, 'b': 'm', 'c': 4}]
If you can assume you have at least one dict in the list and all the dicts have all the keys you can do:
import numbers
dicts =[{'a': 3, 'b': 'm', 'c': 7},
{'a': 1.0, 'b': 'm', 'c': 2},
{'a': 5, 'b': 'm', 'c': 4.0}]
avg_dict = {}
for key in dicts[0]:
avg_dict[key] = sum([d[key] for d in dicts])/len(dicts) if isinstance(dicts[0][key], numbers.Number) else dicts[0][key]
Maybe not the most pythonic way, but it will do the job:
lst = [{'a': 3, 'b': 'm', 'c': 7},
{'a': 1.0, 'b': 'm', 'c': 2},
{'a': 5, 'b': 'm', 'c': 4.0}]
result = {}
for item in lst:
for j in item:
if type(item[j]) == str:
result[j] = item[j]
elif j in result:
result[j] += item[j]
else:
result[j] = item[j]
for i in result:
if type(result[i]) != str:
result[i] = int(result[i] / len(lst))
print(result)

Using Gurobi in Python and adding variables

I am trying to write my first Gurobi optimization code and this is where I am stuck with:
I have the following dictionary for my first subscript:
input for k in range(1,11):
i[k] = int(k)
print i
output {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}
And, I have the following dictionaries for my second subscript:
c_il = {1: 2, 2: 1, 3: 1, 4: 4, 5: 3, 6: 4, 7: 3, 8: 2, 9: 1, 10: 4}
c_iu = {1: 3, 2: 2, 3: 2, 4: 5, 5: 4, 6: 5, 7: 4, 8: 3, 9: 2, 10: 5}
I am trying to create variables as following:
x = m.addVars(i, c_il, vtype=GRB.BINARY, name="x")
x = m.addVars(i, c_iu, vtype=GRB.BINARY, name="x")
Apparently, it is not giving what I am looking for. What I am looking for is x_(i),(c_il) and x_(i),(c_iu); ignore parenthesis.
More clearly, the following is what I am trying to obtain by using dicts i, c_il, and c_iu:
{1: <gurobi.Var x[1,2]>,
2: <gurobi.Var x[2,1]>,
3: <gurobi.Var x[3,1]>,
4: <gurobi.Var x[4,5]>,
5: <gurobi.Var x[5,3]>,
6: <gurobi.Var x[6,4]>,
7: <gurobi.Var x[7,3]>,
8: <gurobi.Var x[8,2]>,
9: <gurobi.Var x[9,1]>,
10: <gurobi.Var x[10,4]>,
11: <gurobi.Var x[1,3]>,
12: <gurobi.Var x[2,2]>,
13: <gurobi.Var x[3,2]>,
14: <gurobi.Var x[4,5]>,
15: <gurobi.Var x[5,4]>,
16: <gurobi.Var x[6,5]>,
17: <gurobi.Var x[7,4]>,
18: <gurobi.Var x[8,3]>,
19: <gurobi.Var x[9,2]>,
20: <gurobi.Var x[10,5]>}
Since I am using dictionaries everywhere, I want to keep it consistent by continuing to use dictionaries so that I can do multiplications and additions with my parameters which are all in dictionaries. Is there any way to create these variables with m.addVars or m.addVar?
Thanks!
Edit: Modified to make it more clear.
It looks like you want to create 10 variables that are indexed by something. The best way to do this is to create the two indexes as lists. If you want x[12], x[21], then write:
from gurobipy import *
m = Model()
il = [ 12, 21, 31, 44, 53, 64, 73, 82, 91, 104 ]
x = m.addVars(il, vtype=GRB.BINARY, name="x")
And if you want to write x[1,2], x[2,1], then write:
from gurobipy import *
m = Model()
il = [ (1,2), (2,1), (3,1), (4,4), (5,3), (6,4), (7,3), (8,2), (9,1), (10,4) ]
x = m.addVars(il, vtype=GRB.BINARY, name="x")
After a few years of experience, I can easily write the below as an answer. Since the past myself was concerned with keeping the dictionaries as is (I highly criticize and question...), a quick solution is as follows.
x = {}
for (i,j) in c_il.items():
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%s"%str([i,j]))
for (i,j) in c_iu.items():
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%s"%str([i,j]))
Alternatively,
x = {(i,j): m.addVar(vtype=GRB.BINARY, name="x%s"%str([i,j]))
for (i,j) in c_il.items()}
for (i,j) in c_iu.items():
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%s"%str([i,j]))
One liner alternative:
x = {(i,j): m.addVar(vtype=GRB.BINARY, name="x%s"%str([i,j]))
for (i,j) in [(k,l) for (k,l) in c_il.items()] + [(k,l) for (k,l) in c_iu.items()]}

How do I Pythonically print a list with formatting?

I have a list:
L = [1, 2, 3, 4, 5, 6]
and I want to print
1 B 2 J 3 C 4 A 5 J 6 X
from that list.
How do I do that?
Do I have to make another list and zip them up, or is there some way I can have the letters in my format specifier?
You could do it either way:
L = [1, 2, 3, 4, 5, 6]
from itertools import chain
# new method
print "{} B {} J {} C {} A {} J {} X".format(*L)
# old method
print "%s B %s J %s C %s A %s J %s X" % tuple(L)
# without string formatting
print ' '.join(chain.from_iterable(zip(map(str, L), 'BJCAJX')))
See the docs on str.format and string formatting.
A nice way to do this is have a dictionary of numbers to prefixes:
prefixes = {1: 'B', 2: 'J', 3: 'C', 4: 'A', 5: 'J', 6: 'X'}
Then you can do:
print ' 'join('%s %s' % (num, prefix) for num, prefix in prefixes.itervalues())
If you also have a list of letters:
nums = [1, 2, 3, 4, 5, 6]
ltrs = ['B', 'J', 'C', 'A', 'J', 'X']
print ' '.join('%s %s' % (num, ltr) for num, ltr in zip(nums, ltrs)