Read specific column of csv - python-2.7

The code is working fine but it is creating list of values in braces. I want to modify the code in such a way that it prints as in csv in proper column and row format.
Expected output :
Ver Total
4 5
4 5
4 5
4 5
Actual Output:
(ver,total) (4,5) (4,5) (4,5)
Here is the following code
import csv
f = open("a.csv", 'r')
reader = csv.reader(f)
data = []
for line in f:
cells = line.split(",")
data.append((cells[0], cells[3]))
print data

Try this code:
import csv
with open('a.csv') as csvfile:
reader = csv.reader(csvfile)
rowcnt = 0
for row in reader:
if rowcnt == 0:
print row[0], row[1]
else:
print row[0], ' ', row[1]
rowcnt = rowcnt + 1
Provides the following output:
Ver Stat
4 5
4 5
4 5

Related

Getting errors while inserting text file rows into list of lists

Code
with open(dataset, "r") as f:
line = f.readline()
while line != None and line != "":
arr = line.split(splitter)
if (len(arr) < 3):
continue
user, item, rating = int(arr[0]), int(arr[1]), int(arr[2])
# user,items, ratings are taken from text file
if (len(train) <= user):
train.append([])
train[user].append([item]) # appending to list_of_lists `error here`
train[user].append([rating]) #`ERROR here
Output error
train[user].append([item])
IndexError: list index out of range
Dataset
0 0 9
0 12345 8
0 4425 9
1 20110 10
1 1687759 10
1 13490 8
1 3259 10

Reading CSV file and take majority voting of certain column

I need to calculate the majority vote for an TARGET_LABEL Column of my CSV file in Python.
I have a data frame with Row ID and assigned TARGET_LABEL. What I need is the count of TARGET_LABEL(majority). How do I do this?
For Example Data is in this form:
**Row ID TARGET_LABEL**
Row2 0
Row6 0
Row7 0
Row10 0
Row12 0
Row15 1
. .
. .
Row99999 1
I have python script which only reads data from CSV. Here It is
import csv
ifile = open('file1.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()
In case TARGET_LABEL** does not have a NaN values, you could use:
counts = df['TARGET_LABEL'].value_counts()
max_counts = counts.max()
Otherwise if it could contain NaN values, use
df = df.dropna(subset=['TARGET_LABEL'])
removes all the NaN values
df['TARGET_LABEL'].value_counts().max()
should give you the max counts,
df['TARGET_LABEL'].value_counts().idxmax()
should give you the most frequent value.
The package collection contains the class Counter which works similar to a dict (or more precisely a defaultdict(lambda: 0)) and which can be used to find the most frequent item.

Using Interval tree to find overlapping regions

I have two files
File 1
chr1:4847593-4847993
TGCCGGAGGGGTTTCGATGGAACTCGTAGCA
File 2
Pbsn|X|75083240|75098962|
TTTACTACTTAGTAACACAGTAAGCTAAACAACCAGTGCCATGGTAGGCTTGAGTCAGCT
CTTTCAGGTTCATGTCCATCAAAGATCTACATCTCTCCCCTGGTAGCTTAAGAGAAGCCA
TGGTGGTTGGTATTTCCTACTGCCAGACAGCTGGTTGTTAAGTGAATATTTTGAAGTCC
File 1 has approximately 8000 more lines with different header and sequence below it.
I would first like to match the start and end co ordinates from file1 to file 2 or see if its close to each other let say by +- 100 if yes then match the sequence in file 2 and then print out the header info for file 2 and the matched sequence.
My approach use interval tree(in python i am still trying to get a hang of it), store the co ordinates ?
I tried using re.match but its not giving me accurate results.
Any tips would be highly appreciated.
Thanks.
My first try,
How ever now i have hit another road block so for my second second file if my start and end is 5000 and 8000 respectively I want to change this by subtracting 2000 so my new start and stop is 3000 and 5000 here is my code
from intervaltree import IntervalTree
from collections import defaultdict
binding_factor = some.txt
genome = dict()
with open('file2', 'r') as rows:
for row in rows:
#print row
if row.startswith('>'):
row = row.strip().split('|')
chrom_name = row[5]
start = int[row[3]
end = int(row[3])
# one interval tree per chromosome
if chrom_name not in genome:
genome[chrom_name] = IntervalTree()
# first time we've encountered this chromosome, createtree
# index the feature
genome[chrom_name].addi(start,end,row[2])
#for key,value in genome.iteritems():
#print key, ":", value
mast = defaultdict(list)
with open(file1', 'r') as f:
for row in f:
row = row.strip().split()
row[0] = row[0].replace('chr', '') if row[0].startswith('chr') else row[0]
row[0] = 'MT' if row[0] == 'M' else row[0]
#print row[0]
mast[row[0]].append({
'start':int(row[1]),
'end':int(row[2])
})
#for k,v in mast.iteritems():
#print k, ":", v
with open(binding_factor, 'w') as f :
for k,v in mast.iteritems():
for i in v:
g = genome[k].search(i['start'],i['end'])
if g:
print g
l = gene
f.write(str(l)`enter code here` + '\n')

create multiple lists for a given test data in python

I am relatively new to python so please excuse me if this is a very rudimentary question. This is my first time asking question.
I have a test file which is of the format below.
1 2 4
1 3 2
1 4 1
2 1 2
2 2 1
2 3 1
3 2 3
3 7 1
4 1 1
....
I am trying to read the file line by line and for each value in column 1 (1, 2, 3...), i need to create a list of the form below
list_1 = [[2,4], [3,2], [4,1]]
list_2 = [[1,2], [2,1], [3,1]]
list_3 = [[2,3], [7,1]]
list_4 = [[1,1]]
...
list_n
where values in the list are from column 2 and column 3 respectively.
Sincerely appreciate any guidance in this regard. Thank you
Use a defaultdict. This way, you don't have to check if your key already exists in the dictionary.
from collections import defaultdict
def parse(filename):
result = defaultdict(list)
with open(filename) as infile:
for line in infile:
c1, c2, c3 = map(int, line.split())
result[c1].append([c2, c3])
return result
def main():
result = parse("test_data.txt")
print(result)
if __name__ == '__main__':
main()

write the same data into many lines in text file python

I would like to write the same information for many lines into a text file. Basicly, I have a list of numbers. I want to write these number in one line and then copy the first line to the next 400 lines.
My code at the moment is
outfile = open(outfilename+'.dat','w')
for j in range (0,len(elevation_list)):
outfile.write(elevation_list[j]+' ')
outfile.close()
And it only writes the first line.
For example, my elevation list is 1, 2, 3, 4, 5
I want my text file like the following
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Can anyone please help me with this?
Do you have any line breaks in your elevation list? Try this instead:
elevations = ""
for elevation in elevation_list:
elevations+= elevation
outfile = open(outfilename+'.dat','w')
for i in range(400):
outfile.write(elevations +'\n')
outfile.close()
This is what you want:
class RepeatedWrite(object):
def __init__(self, elevation_list, no_of_lines=5, outfilename="outfile.dat"):
self.elevation_list = elevation_list
self.no_of_lines = no_of_lines
self.outfilename = outfilename
def write_to_file(self):
with open(self.outfilename, 'w') as fp:
for i in xrange(self.no_of_lines):
fp.write(' '.join([str(ele) for ele in self.elevation_list]))
fp.write("\n")
elevation_list = [1, 2, 3, 4, 5]
RepeatedWrite(elevation_list).write_to_file()