Append new column by subtracting existing column in csv by using python - python-2.7

I tried to append new column to an existing csv file using python. it is not showing any error but the column is not created.
I have a CSV file with 5 columns and I want to add data in the 6th column by subtracting between existing columns.
ID,SURFACES,A1X,A1Y,A1Z,A2X
1,GROUND,800085.3323,961271.977,-3.07E-18,800080.8795
ADD THE COLUMN AX( = A1X - A2X)
CODE:
>>> x = csv.reader(open('E:/solarpotential analysis/iitborientation/trialcsv.csv','rb'))
>>> y = csv.writer(open('E:/solarpotential analysis/iitborientation/trial.csv','wb',buffering=0))
>>> for row in x:
a = float(row[0])
b = str(row[1])
c = float(row[2])
d = float(row[3])
e = float(row[4])
f = float(row[2] - row[5])
y.writerow([a,b,c,d,e,f])
it shows no error but not be updated in output file

You can do this by this way:
inputt=open("input.csv","r")
outputt=open("output.csv","w")
for line in inputt.readlines():
#print line.replace("\n","")
outputt.write(line.replace("\n","") + ";6column\n")
inputt.close()
outputt.close()

Related

find unique values row wise on comma separated values

For a dataframe like below:
df = pd.DataFrame({'col':['abc,def,ghi,jkl,abc','abc,def,ghi,def,ghi']})
How to get unique values of the column col row wise in a new column like as follows:
col unique_col
0 abc,def,ghi,jkl,abc abc,def,ghi,jkl
1 abc,def,ghi,def,ghi abc,def,ghi
I tried using iteritems but got Attribute error :
for i, item in df.col.iteritems():
print item.unique()
import pandas as pd
df = pd.DataFrame({'col':['abc,def,ghi,jkl,abc','abc,def,ghi,def,ghi']})
def unique_col(col):
return ','.join(set(col.split(',')))
df['unique_col'] = df.col.apply(unique_col)
result:
col unique_col
0 abc,def,ghi,jkl,abc ghi,jkl,abc,def
1 abc,def,ghi,def,ghi ghi,abc,def

Python - reading text file delimited by semicolon, ploting chart using openpyxl

I have copied the text file to excel sheet separating cells by ; delimiter.
I need to plot a chart using the same file which I achieved. Since all the values copied are type=str my chart gives me wrong points.
Please suggest to overcome this. Plot is should be made of int values
from datetime import date
from openpyxl import Workbook,load_workbook
from openpyxl.chart import (
LineChart,
Reference,
Series,
)
from openpyxl.chart.axis import DateAxis
excelfile = "C:\Users\lenovo\Desktop\how\openpychart.xlsx"
wb = Workbook()
ws = wb.active
f = open("C:\Users\lenovo\Desktop\sample.txt")
data = []
num = f.readlines()
for line in num:
line = line.split(";")
ws.append(line)
f.close()
wb.save(excelfile)
wb.close()
wb = load_workbook(excelfile, data_only=True)
ws = wb.active
c1 = LineChart()
c1.title = "Line Chart"
##c1.style = 13
c1.y_axis.title = 'Size'
c1.x_axis.title = 'Test Number'
data = Reference(ws, min_col=6, min_row=2, max_col=6, max_row=31)
series = Series(data, title='4th average')
c1.append(series)
data = Reference(ws, min_col=7, min_row=2, max_col=7, max_row=31)
series = Series(data, title='Defined Capacity')
c1.append(series)
##c1.add_data(data, titles_from_data=True)
# Style the lines
s1 = c1.series[0]
s1.marker.symbol = "triangle"
s1.marker.graphicalProperties.solidFill = "FF0000" # Marker filling
s1.marker.graphicalProperties.line.solidFill = "FF0000" # Marker outline
s1.graphicalProperties.line.noFill = True
s2 = c1.series[1]
s2.graphicalProperties.line.solidFill = "00AAAA"
s2.graphicalProperties.line.dashStyle = "sysDot"
s2.graphicalProperties.line.width = 100050 # width in EMUs
##s2 = c1.series[2]
##s2.smooth = True # Make the line smooth
ws.add_chart(c1, "A10")
##
##from copy import deepcopy
##stacked = deepcopy(c1)
##stacked.grouping = "stacked"
##stacked.title = "Stacked Line Chart"
##ws.add_chart(stacked, "A27")
##
##percent_stacked = deepcopy(c1)
##percent_stacked.grouping = "percentStacked"
##percent_stacked.title = "Percent Stacked Line Chart"
##ws.add_chart(percent_stacked, "A44")
##
### Chart with date axis
##c2 = LineChart()
##c2.title = "Date Axis"
##c2.style = 12
##c2.y_axis.title = "Size"
##c2.y_axis.crossAx = 500
##c2.x_axis = DateAxis(crossAx=100)
##c2.x_axis.number_format = 'd-mmm'
##c2.x_axis.majorTimeUnit = "days"
##c2.x_axis.title = "Date"
##
##c2.add_data(data, titles_from_data=True)
##dates = Reference(ws, min_col=1, min_row=2, max_row=7)
##c2.set_categories(dates)
##
##ws.add_chart(c2, "A61")
### setup and append the first series
##values = Reference(ws, (1, 1), (9, 1))
##series = Series(values, title="First series of values")
##chart.append(series)
##
### setup and append the second series
##values = Reference(ws, (1, 2), (9, 2))
##series = Series(values, title="Second series of values")
##chart.append(series)
##
##ws.add_chart(chart)
wb.save(excelfile)
wb.close()
I have modified below code in for loop and it worked.
f = open("C:\Users\lenovo\Desktop\sample.txt")
data = []
num = f.readlines()
for line in num:
line = line.split(";")
new_line=[]
for x in line:
if x.isdigit():
x=int(x)
new_line.append(x)
else:
new_line.append(x)
ws.append(new_line)
f.close()
wb.save(excelfile)
wb.close()
For each list,for each value check if its a digit, if yes converts to integer and store in another list.
Using x=map(int,x) didnt work since I have character values too.
I felt above is much more easy than using x=map(int,x) with try and Except
Thanks
Basha

Pandas append list to list of column names

I'm looking for a way to append a list of column names to existing column names in a DataFrame in pandas and then reorder them by col_start + col_add.
The DataFrame already contains the columns from col_start.
Something like:
import pandas as pd
df = pd.read_csv(file.csv)
col_start = ["col_a", "col_b", "col_c"]
col_add = ["Col_d", "Col_e", "Col_f"]
df = pd.concat([df,pd.DataFrame(columns = list(col_add))]) #Add columns
df = df[[col_start.extend(col_add)]] #Rearrange columns
Also, is there a way to capitalize the first letter for each item in col_start, analogous to title() or capitalize()?
Your code is nearly there, a couple things:
df = pd.concat([df,pd.DataFrame(columns = list(col_add))])
can be simplified to just this as col_add is already a list:
df = pd.concat([df,pd.DataFrame(columns = col_add)])
Also you can also just add 2 lists together so:
df = df[[col_start.extend(col_add)]]
becomes
df = df[col_start+col_add]
And to capitalise the first letter in your list just do:
In [184]:
col_start = ["col_a", "col_b", "col_c"]
col_start = [x.title() for x in col_start]
col_start
Out[184]:
['Col_A', 'Col_B', 'Col_C']
EDIT
To avoid the KeyError on the capitalised column names, you need to capitalise after calling concat, the columns have a vectorised str title method:
In [187]:
df = pd.DataFrame(columns = col_start + col_add)
df
Out[187]:
Empty DataFrame
Columns: [col_a, col_b, col_c, Col_d, Col_e, Col_f]
Index: []
In [188]:
df.columns = df.columns.str.title()
df.columns
Out[188]:
Index(['Col_A', 'Col_B', 'Col_C', 'Col_D', 'Col_E', 'Col_F'], dtype='object')
Here what you want to do :
import pandas as pd
#Here you have a first dataframe
d1 = pd.DataFrame([[1,2,3],[4,5,6]], columns=['col1','col2','col3'])
#a second one
d2 = pd.DataFrame([[8,7,3,8],[4,8,6,8]], columns=['col4','col5','col6', 'col7'])
#Here we can make a dataframe with d1 and d2
d = pd.concat((d1,d2), axis=1)
#We want a different order from the columns ?
d = d[col_start + col_add]
If you want to capitalize values from a column 'col', you can do
d['col'] = d['col'].str.capitalize()
PS: Update Pandas if ".str.capitalize()" doesn't work.
Or, what you can do :
df['col'] = df['col'].map(lambda x:x.capitalize())

Printing Results from Loops

I currently have a piece of code that works in two segments. The first segment opens the existing text file from a specific path on my local drive and then arranges, based on certain indices, into a list of sub list. In the second segment I take the sub-lists I have created and group them on a similar index to simplify them (starts at def merge_subs). I am getting no error code but I am not receiving a result when I try to print the variable answer. Am I not correctly looping the original list of sub-lists? Ultimately I would like to have a variable that contains the final product from these loops so that I may write the contents of it to a new text file. Here is the code I am working with:
from itertools import groupby, chain
from operator import itemgetter
with open ("somepathname") as g:
# reads text from lines and turns them into a list sub-lists
lines = g.readlines()
for line in lines:
matrix = line.split()
JD = matrix [2]
minTime= matrix [5]
maxTime= matrix [7]
newLists = [JD,minTime,maxTime]
L = newLists
def merge_subs(L):
dates = {}
for sub in L:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
answer.append([date] + dates[date])
new code
def openfile(self):
filename = askopenfilename(parent=root)
self.lines = open(filename)
def simplify(self):
g = self.lines.readlines()
for line in g:
matrix = line.split()
JD = matrix[2]
minTime = matrix[5]
maxTime = matrix[7]
self.newLists = [JD, minTime, maxTime]
print(self.newLists)
dates = {}
for sub in self.newLists:
date = sub[0]
if date not in dates:
dates[date] = []
dates[date].extend(sub[1:])
answer = []
for date in sorted(dates):
print(answer.append([date] + dates[date]))
enter code here
enter code here

Python dictionary construction from file with multiple similar values and keys

I am new to python (well to coding in general) and am trying to use it to analyze some data at work. I have a file like this:
HWI-ST591_0064:5:1101:1228:2111#0/1 + 7included 11 A>G - -
HWI-ST591_0064:5:1101:1205:2125#0/1 + genomic 17 A>G - -
HWI-ST591_0064:5:1101:1178:2129#0/1 + 7included 6 A>C 8 A>T
HWI-ST591_0064:5:1101:1176:2164#0/1 + 7included 6 A>T 8 A>G
HWI-ST591_0064:5:1101:1199:2234#0/1 + 7included 14 T>C 21 G>A
HWI-ST591_0064:5:1101:1208:2249#0/1 + 7included 32 C>T - -
Tab delimited. I am trying to create a dictionary that contains the first value of the line (a unique identifier) as a list of values that matches the joined last 4 values as the key, like this:
{'32C>T--': ['HWI-ST591_0064:5:1101:1208:2249#0/1'],
'6A>C8A>C': ['HWI-ST591_0064:5:1101:1318:2090#0/1'],
'36A>G--': ['HWI-ST591_0064:5:1101:1425:2093#0/1'],
'----': ['HWI-ST591_0064:5:1101:1222:2225#0/1'],
'6A>C8A>T': ['HWI-ST591_0064:5:1101:1178:2129#0/1','HWIST591_0064:5:1101:1176:2164#0/1']}
This way I can then get a list of the unique identifies and count or sort or do the other things I need to do. I can get the dictionary made, but when I try to output it to a file I get an error. I think the problem is because this is a list, I keep getting the error
File "trial.py", line 33, in
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))
TypeError: unhashable type: 'list'
Is there a way to make this work so I can have it in a file? I tried .iteritems() on the for loop making the dictionary but that didn't seem to work. Thanks and here is my code:
inFile = open('path', 'rU')
outFile = open('path', 'w')
from collections import defaultdict
mutReadDict = defaultdict(list)
for line in inFile:
entry = line.strip('\n').split('\t')
fastQ_ID = entry[0]
strand = entry[1]
chromosome = entry[2]
mut1pos = entry[3]
mut1base = entry[4]
mut2pos = entry[5]
mut2base = entry[6]
mutKey = mut1pos + mut1base + mut2pos + mut2base
if chromosome == '7included':
mutReadDict[mutKey].append(fastQ_ID)
else:
pass
keyList = [mutReadDict.keys()]
keyList.sort()
for key in keyList:
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))
outFile.close()
I think you want:
keyList = mutReadDict.keys()
instead of
keyList = [mutReadDict.keys()]
You probably mean this too:
for key in keyList:
outFile.write("%s\t%s\n" % (key, '\t'.join(mutReadDict[key])))