Use python to calculate data in CSV - python-2.7

Like the sample, first I want to read the CSV file and sum each row and store the result in new column (which need to create).
sample:

import csv
new_rows = []
with open('file.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
row = [int(val) for val in row]
row.append(sum(row))
new_rows.append(row)
with open('file.csv', 'w') as csvfile:
csv.writer(csvfile).writerows(new_rows)
turning file.csv from
1,2
3,4
into
1,2,3
3,4,7

Related

Cannot iterate through two csv files and compare

I'm relatively new to python (2.7) and need help looping through 2 CSV files. The first (outer loop) file is the row I want to write if certain conditions are met with the second (inner loop) file.
import csv
f = open('../CI Working Copy.csv')
with open('../first.csv', 'wb') as n:
theWriter = csv.writer(n)
csv_f = csv.reader(f)
g = open('../second.csv')
csv_g = csv.reader(g)
for row in csv_f:
cbd = row[3]
ced = row[4]
rbd = row[5]
red = row[6]
ciCn = row[10]
for iRow in csv_g:
cn = iRow[0]
startDate = iRow[1]
endDate = iRow[2]
iId = iRow[3]
writeRow = 'false'
if ciCn == cn:
if (cbd == startDate and ced == endDate) or (rbd == startDate and red == endDate):
theWriter.writerow(row)
g.close()
f.close()
It makes it into the second (inner loop) file, but never returns to the outer loop. I only need to write the row from the first file.
For each row of the first csv file, you consume all the second file, so you need to go back on the beginning of the second file on each iteration.
The solution is:
for row in csv_f:
g.seek(0) #go at the start of the second file
for iRow in csv_g:
do_smth(iRow,row)
g.close()

Make a comma separated list of out of co-ordinates from a csv file

I have values x and y in a csv and i am reading those values and converting them into a numpy array using below code:
import numpy as np
import csv
data = np.loadtxt('datapoints.csv', delimiter=',')
# Putting data from csv file to variable
x = data[:, 0]
y = data[:, 1]
# Converting npArray to simple array
np.asarray(x)
np.asarray(y)
So, now i have the values of x and y.
But, i want them to be in this format:
[[x1,y1],[x2,y2], [x3,y3], ...... [xn,yn]]
How do i do that?
use zip :
result = [list(a) for a in zip(np.asarray(x),np.asarray(y))]

Read specific column of csv

The code is working fine but it is creating list of values in braces. I want to modify the code in such a way that it prints as in csv in proper column and row format.
Expected output :
Ver Total
4 5
4 5
4 5
4 5
Actual Output:
(ver,total) (4,5) (4,5) (4,5)
Here is the following code
import csv
f = open("a.csv", 'r')
reader = csv.reader(f)
data = []
for line in f:
cells = line.split(",")
data.append((cells[0], cells[3]))
print data
Try this code:
import csv
with open('a.csv') as csvfile:
reader = csv.reader(csvfile)
rowcnt = 0
for row in reader:
if rowcnt == 0:
print row[0], row[1]
else:
print row[0], ' ', row[1]
rowcnt = rowcnt + 1
Provides the following output:
Ver Stat
4 5
4 5
4 5

Convert a text file to an numpyarray

I am new to python. I have a .txt file
SET = 20,21,23,21,23
45,46,42,23,55
with many number of rows. How would I convert this txt file into an array ignoring spaces and commas? Any help would be really appreciated.
l1=[]
file = open('list-num')
for l in file:
l2 = map(int,l.split(','))
l1 = l1 + l2
print l1
Your data looks like :
SET 1 = 13900100,13900141,13900306,13900442,13900453,13900461, 13900524,13900537,13900619,13900632,13900638,13900661, 13900665,13900758,13900766,13900825,13900964,13901123, 13901131,13901136,13901141,13901143,13901195,13901218,
you can use the numpy command : np.genfromtxt ()
import numpy as np
import matplotlib.pyplot as plt
data = np.genfromtxt("text.txt", delimiter=",")
data = data[np.logical_not(np.isnan(data))] #Remove nan value
print data
I get :
[ 13900141. 13900306. 13900442. 13900453. 13900461. 13900524.
13900537. 13900619. 13900632. 13900638. 13900661. 13900665.
13900758. 13900766. 13900825. 13900964. 13901123. 13901131.
13901136. 13901141. 13901143. 13901195. 13901218.]
It should work ;)
------------------------------------
Other way :
import numpy as np
f = open("text.txt", "r") #Open data file
data = f.read() #Read data file
cut = data.split() #Split data file
value = cut[2] #Pick the value part
array = np.array(value) #Value becomes an array
print array
I get :
13900100,13900141,13900306,13900442,13900453,13900461,13900524,13900537,13900619,13900632,13900638,13900661,13900665,13900758,13900766,13900825,13900964,13901123,13901131,13901136,13901141,13901143,13901195,13901218

Using Interval tree to find overlapping regions

I have two files
File 1
chr1:4847593-4847993
TGCCGGAGGGGTTTCGATGGAACTCGTAGCA
File 2
Pbsn|X|75083240|75098962|
TTTACTACTTAGTAACACAGTAAGCTAAACAACCAGTGCCATGGTAGGCTTGAGTCAGCT
CTTTCAGGTTCATGTCCATCAAAGATCTACATCTCTCCCCTGGTAGCTTAAGAGAAGCCA
TGGTGGTTGGTATTTCCTACTGCCAGACAGCTGGTTGTTAAGTGAATATTTTGAAGTCC
File 1 has approximately 8000 more lines with different header and sequence below it.
I would first like to match the start and end co ordinates from file1 to file 2 or see if its close to each other let say by +- 100 if yes then match the sequence in file 2 and then print out the header info for file 2 and the matched sequence.
My approach use interval tree(in python i am still trying to get a hang of it), store the co ordinates ?
I tried using re.match but its not giving me accurate results.
Any tips would be highly appreciated.
Thanks.
My first try,
How ever now i have hit another road block so for my second second file if my start and end is 5000 and 8000 respectively I want to change this by subtracting 2000 so my new start and stop is 3000 and 5000 here is my code
from intervaltree import IntervalTree
from collections import defaultdict
binding_factor = some.txt
genome = dict()
with open('file2', 'r') as rows:
for row in rows:
#print row
if row.startswith('>'):
row = row.strip().split('|')
chrom_name = row[5]
start = int[row[3]
end = int(row[3])
# one interval tree per chromosome
if chrom_name not in genome:
genome[chrom_name] = IntervalTree()
# first time we've encountered this chromosome, createtree
# index the feature
genome[chrom_name].addi(start,end,row[2])
#for key,value in genome.iteritems():
#print key, ":", value
mast = defaultdict(list)
with open(file1', 'r') as f:
for row in f:
row = row.strip().split()
row[0] = row[0].replace('chr', '') if row[0].startswith('chr') else row[0]
row[0] = 'MT' if row[0] == 'M' else row[0]
#print row[0]
mast[row[0]].append({
'start':int(row[1]),
'end':int(row[2])
})
#for k,v in mast.iteritems():
#print k, ":", v
with open(binding_factor, 'w') as f :
for k,v in mast.iteritems():
for i in v:
g = genome[k].search(i['start'],i['end'])
if g:
print g
l = gene
f.write(str(l)`enter code here` + '\n')