Read specific column of csv

Read specific column of csv - python-2.7

The code is working fine but it is creating list of values in braces. I want to modify the code in such a way that it prints as in csv in proper column and row format.
Expected output :
Ver Total
4 5
4 5
4 5
4 5
Actual Output:
(ver,total) (4,5) (4,5) (4,5)
Here is the following code
import csv
f = open("a.csv", 'r')
reader = csv.reader(f)
data = []
for line in f:
cells = line.split(",")
data.append((cells[0], cells[3]))
print data

Try this code:
import csv
with open('a.csv') as csvfile:
reader = csv.reader(csvfile)
rowcnt = 0
for row in reader:
if rowcnt == 0:
print row[0], row[1]
else:
print row[0], ' ', row[1]
rowcnt = rowcnt + 1
Provides the following output:
Ver Stat
4 5
4 5
4 5

Related

Getting errors while inserting text file rows into list of lists

Code
with open(dataset, "r") as f:
line = f.readline()
while line != None and line != "":
arr = line.split(splitter)
if (len(arr) < 3):
continue
user, item, rating = int(arr[0]), int(arr[1]), int(arr[2])
# user,items, ratings are taken from text file
if (len(train) <= user):
train.append([])
train[user].append([item]) # appending to list_of_lists `error here`
train[user].append([rating]) #`ERROR here
Output error
train[user].append([item])
IndexError: list index out of range
Dataset
0 0 9
0 12345 8
0 4425 9
1 20110 10
1 1687759 10
1 13490 8
1 3259 10

Reading CSV file and take majority voting of certain column

I need to calculate the majority vote for an TARGET_LABEL Column of my CSV file in Python.
I have a data frame with Row ID and assigned TARGET_LABEL. What I need is the count of TARGET_LABEL(majority). How do I do this?
For Example Data is in this form:
**Row ID TARGET_LABEL**
Row2 0
Row6 0
Row7 0
Row10 0
Row12 0
Row15 1
. .
. .
Row99999 1
I have python script which only reads data from CSV. Here It is
import csv
ifile = open('file1.csv', "rb")
reader = csv.reader(ifile)
rownum = 0
for row in reader:
# Save header row.
if rownum == 0:
header = row
else:
colnum = 0
for col in row:
print '%-8s: %s' % (header[colnum], col)
colnum += 1
rownum += 1
ifile.close()

In case TARGET_LABEL** does not have a NaN values, you could use:
counts = df['TARGET_LABEL'].value_counts()
max_counts = counts.max()
Otherwise if it could contain NaN values, use
df = df.dropna(subset=['TARGET_LABEL'])
removes all the NaN values
df['TARGET_LABEL'].value_counts().max()
should give you the max counts,
df['TARGET_LABEL'].value_counts().idxmax()
should give you the most frequent value.

The package collection contains the class Counter which works similar to a dict (or more precisely a defaultdict(lambda: 0)) and which can be used to find the most frequent item.

Using Interval tree to find overlapping regions

I have two files
File 1
chr1:4847593-4847993
TGCCGGAGGGGTTTCGATGGAACTCGTAGCA
File 2
Pbsn|X|75083240|75098962|
TTTACTACTTAGTAACACAGTAAGCTAAACAACCAGTGCCATGGTAGGCTTGAGTCAGCT
CTTTCAGGTTCATGTCCATCAAAGATCTACATCTCTCCCCTGGTAGCTTAAGAGAAGCCA
TGGTGGTTGGTATTTCCTACTGCCAGACAGCTGGTTGTTAAGTGAATATTTTGAAGTCC
File 1 has approximately 8000 more lines with different header and sequence below it.
I would first like to match the start and end co ordinates from file1 to file 2 or see if its close to each other let say by +- 100 if yes then match the sequence in file 2 and then print out the header info for file 2 and the matched sequence.
My approach use interval tree(in python i am still trying to get a hang of it), store the co ordinates ?
I tried using re.match but its not giving me accurate results.
Any tips would be highly appreciated.
Thanks.
My first try,
How ever now i have hit another road block so for my second second file if my start and end is 5000 and 8000 respectively I want to change this by subtracting 2000 so my new start and stop is 3000 and 5000 here is my code
from intervaltree import IntervalTree
from collections import defaultdict
binding_factor = some.txt
genome = dict()
with open('file2', 'r') as rows:
for row in rows:
#print row
if row.startswith('>'):
row = row.strip().split('|')
chrom_name = row[5]
start = int[row[3]
end = int(row[3])
# one interval tree per chromosome
if chrom_name not in genome:
genome[chrom_name] = IntervalTree()
# first time we've encountered this chromosome, createtree
# index the feature
genome[chrom_name].addi(start,end,row[2])
#for key,value in genome.iteritems():
#print key, ":", value
mast = defaultdict(list)
with open(file1', 'r') as f:
for row in f:
row = row.strip().split()
row[0] = row[0].replace('chr', '') if row[0].startswith('chr') else row[0]
row[0] = 'MT' if row[0] == 'M' else row[0]
#print row[0]
mast[row[0]].append({
'start':int(row[1]),
'end':int(row[2])
})
#for k,v in mast.iteritems():
#print k, ":", v
with open(binding_factor, 'w') as f :
for k,v in mast.iteritems():
for i in v:
g = genome[k].search(i['start'],i['end'])
if g:
print g
l = gene
f.write(str(l)`enter code here` + '\n')

create multiple lists for a given test data in python

I am relatively new to python so please excuse me if this is a very rudimentary question. This is my first time asking question.
I have a test file which is of the format below.
1 2 4
1 3 2
1 4 1
2 1 2
2 2 1
2 3 1
3 2 3
3 7 1
4 1 1
....
I am trying to read the file line by line and for each value in column 1 (1, 2, 3...), i need to create a list of the form below
list_1 = [[2,4], [3,2], [4,1]]
list_2 = [[1,2], [2,1], [3,1]]
list_3 = [[2,3], [7,1]]
list_4 = [[1,1]]
...
list_n
where values in the list are from column 2 and column 3 respectively.
Sincerely appreciate any guidance in this regard. Thank you

Use a defaultdict. This way, you don't have to check if your key already exists in the dictionary.
from collections import defaultdict
def parse(filename):
result = defaultdict(list)
with open(filename) as infile:
for line in infile:
c1, c2, c3 = map(int, line.split())
result[c1].append([c2, c3])
return result
def main():
result = parse("test_data.txt")
print(result)
if __name__ == '__main__':
main()

write the same data into many lines in text file python

I would like to write the same information for many lines into a text file. Basicly, I have a list of numbers. I want to write these number in one line and then copy the first line to the next 400 lines.
My code at the moment is
outfile = open(outfilename+'.dat','w')
for j in range (0,len(elevation_list)):
outfile.write(elevation_list[j]+' ')
outfile.close()
And it only writes the first line.
For example, my elevation list is 1, 2, 3, 4, 5
I want my text file like the following
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
Can anyone please help me with this?

Do you have any line breaks in your elevation list? Try this instead:
elevations = ""
for elevation in elevation_list:
elevations+= elevation
outfile = open(outfilename+'.dat','w')
for i in range(400):
outfile.write(elevations +'\n')
outfile.close()

This is what you want:
class RepeatedWrite(object):
def __init__(self, elevation_list, no_of_lines=5, outfilename="outfile.dat"):
self.elevation_list = elevation_list
self.no_of_lines = no_of_lines
self.outfilename = outfilename
def write_to_file(self):
with open(self.outfilename, 'w') as fp:
for i in xrange(self.no_of_lines):
fp.write(' '.join([str(ele) for ele in self.elevation_list]))
fp.write("\n")
elevation_list = [1, 2, 3, 4, 5]
RepeatedWrite(elevation_list).write_to_file()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Read specific column of csv - python-2.7

Try this code: import csv with open('a.csv') as csvfile: reader = csv.reader(csvfile) rowcnt = 0 for row in reader: if rowcnt == 0: print row[0], row[1] else: print row[0], ' ', row[1] rowcnt = rowcnt + 1 Provides the following output: Ver Stat 4 5 4 5 4 5

Related

Getting errors while inserting text file rows into list of lists

Reading CSV file and take majority voting of certain column

Using Interval tree to find overlapping regions

create multiple lists for a given test data in python

write the same data into many lines in text file python

Categories

Resources