Python 2.7 - How to call individual columns from transposed csv file - python-2.7

I understand that the csv module exists, however for my current project we are not allowed to use the module to call csv files.
My code is as follows;
table = []
for line in open("data.csv"):
data = line.split(",")
table.append(data)
transposed = [[table[j][i] for j in range(len(table))] for i in range(len(table[0]))]
rows = transposed[1][1:]
rows = [float(i) for i in rows]
I'm really new to python so this is probably a massively basic question, I've been scouring the internet all day and struggle to find a solution. All I need to do is to be able to call data from any individual column so I can analyse it. Thanks

your data is organized in a list of lists. Each sub list represents a row. To better illustrate this I would avoid using list comprehensions because they are more difficult to read. Additionally I would avoid using variables like 'i' and 'j' and instead use more descriptive names like row or column. Here is a simple example of how I would accomplish this
def read_csv():
table = []
with open("data.csv") as fileobj:
for line in fileobj.readlines():
data = line.strip().split(',')
table.append(data)
return table
def get_column_data(data, column_index):
column_data = []
for row in data:
cell_data = row[column_index]
column_data.append(cell_data)
return column_data
data = read_csv()
get_column_data(data, column_index=2) #example usage

Related

Write tuple to csv by skipping missing columns

I have a list of ordered tuples which each tuple contains column name and value pair to be written to a csv for example
lst = [('name','bob'),('age',19),('loc','LA')]
which has in for for bob, age 19 and location, loc, in LA. I want to be able to write this to CSV file based on column names and sometimes some of these columns are missing, for example for another row.
lst2 = [('name','bob'),('loc','LA')]
age is missing, how I can write these rows properly in python to a csv?
Those tuples can be used to initialize a dict so csv.DictWriter seems the best choice. In this example I create a dict filled with default values. For each list of tuples, I copy the dict, update with the known values and write it out.
import csv
# sample data
lst = [('name','bob'),('age',19),('loc','LA')]
lst2 = [('name','jane'),('loc','LA')]
lists = [lst, lst2]
# columns need some sort of default... I just guessed
defaults = {'name':'', 'age':-1, 'loc':'N/A'}
with open('output.csv', 'wb') as outfile:
writer = csv.DictWriter(outfile, fieldnames=sorted(defaults.keys()))
writer.writeheader()
for row_tuples in lists:
# copy defaults then update with known values
kv = defaults.copy()
kv.update(row_tuples)
writer.writerow(kv)
# debug...
print open('output.csv').read()
You should give more examples, as to what exactly is required- as what if the location is not given in ls2 then what do you want to write to your csv? From what I understand, you can make a function and default argument:
import csv
def write_tuples_to_csv(name="DefaultName", age="DefaultAge", loc="Default location"):
writer = csv.writer(open("/path/to/csv/file", 'a')) # appending to a file
row = (name, age, loc)
writer.writerow(['name','num','location'])
writer.writerow(row)
Now you can call this function for every item in the list. This should help you to get you started.

Python 2.7: Returning a value in a csv file from input

I've got a csv with:
T,8,101
T,10,102
T,5,103
and need to search the csv file, in the 3rd column for my input, and if found, return the 2nd column value in that same row (searching "102" would return "10"). I then need to save the result to use in another calculation. (I am just trying to print the result for now..) I am new to python (2 weeks) and wanted to get a grasp on reading/writing in csv files. All the searchable results, didn't give me the answer I needed. Thanks
Here is my code:
name = input("waiting")
import csv
with open('cards.csv', 'rt') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
if row[2] == name:
print(row[1])
As stated in my comment above I would implement a general approach without using the csv-module like this:
import io
s = """T,8,101
T,10,102
T,5,103"""
# use io.StringIO to get a file-like object
f = io.StringIO(s)
lines = [tuple(line.split(',')) for line in f.read().splitlines()]
example_look_up = '101'
def find_element(look_up):
for t in lines:
if t[2] == look_up:
return t[1]
result = find_element(example_look_up)
print(result)
Please keep in mind, that this is Python3-code. You need to replace print() with print if using with Python2 and maybe change something related to the StringIO which I am using for demonstration purposes here in order to get a file-like object. However, this snippet should give you a basic idea about a possible solution.

Efficient means of mass converting thousands of different find and replaces within one file

I am trying to convert a map file for some SNP data I want to use from Affy ids to dbSNP rs ids.
I am trying to find an effective way to this. I have the annotation file for the Axiom array from which the data comes from, so I know the proper ids.
I was wondering if anyone could suggest a good bash/Python/Perl based method to do this. It amounts to >100,000 different replacements. The idea I had in mind was the
sed -i 's/Affy#/rs#/g' filename
method, but I figure this would not be the most efficient. Any suggestions? Thanks!
Python code, assuming your substitutions are stored in subs.csv:
import csv
subs = dict(csv.reader(open('subs.csv'), delimiter='\t'))
source = csv.reader(open('all_snp.map'), delimiter='\t')
dest = csv.writer(open('all_snp_out.map', 'wb'), delimiter='\t')
for row in source:
row[1] = subs.get(row[1], row[1])
dest.writerow(row)
The line row[1] = subs.get(row[1], row[1]): row[1] is the Affx column, and it replaces it with a dictionary lookup which either gets the rsNumber equivalent if there is one, or returns the original Affx bit if there isn't one.

Parsing Unbalanced Text into Tables in R

I am trying to pull data from some text files on the SEC's EDGAR webpage and I keep running into a similar problem where there are tables that visually look very simple in the text file, but I have trouble parsing them into something useful in R. In particular, I can't seem to figure out how to balance some of the tables when there are either values missing in a column, especially at the end.
The approach I've taken so far is to read in the text files with readLines and split the strings based on the tab delimiters, but this doesn't always work when there are missing values. Is there a better approach or some way to intelligently coerce each row into a data frame? I can't seem to get rbind.fill to work in this case.
Here is my most recent attempt:
raw.data = readLines("http://www.sec.gov/Archives/edgar/data/1349353/0001349353-13-000002.txt")
# parse basic document information
companyName = gsub("\t\tCOMPANY CONFORMED NAME:\t\t\t","",raw.data[grep("\t\tCOMPANY CONFORMED NAME:\t\t\t",raw.data)])
cik = gsub("\t\tCENTRAL INDEX KEY:\t\t\t","",raw.data[grep("\t\tCENTRAL INDEX KEY:\t\t\t",raw.data)])
secfilename = gsub("<FILENAME>","",raw.data[grep("<FILENAME>",raw.data)])
# trim down to table
table13f = raw.data[(grep("<TABLE>",raw.data)+1):(grep("</TABLE>",raw.data)-1)]
table13f = table13f[!grepl("INFORMATION TABLE",table13f, ignore.case=TRUE)]
table13f = table13f[!grepl("VOTING AUTHORITY",table13f, ignore.case=TRUE)]
table13f = table13f[!grepl("NAME OF ISSUER",table13f, ignore.case=TRUE)]
table13f = table13f[nchar(table13f)>0]
# extract data vectors
splittable = strsplit(table13f,"\t")
splittable2 = data.frame(splittable)
Thanks in advance for the help and/or advice!
You should be able to parse the last table13f string using the following line:
data <- read.csv(text=table13f,header = T,quote = "\"", sep = "\t", fill = T)

String from CSV to list - Python

I don't get it. I have a CSV data with the following content:
wurst;ball;hoden;sack
1;2;3;4
4;3;2;1
I want to iterate over the CSV data and put the heads in one list and the content in another list. Heres my code so far:
data = [ i.strip() for i in open('test.csv', 'r').readlines() ]
for i_c, i in enumerate(data):
if i_c == 0:
heads = i
else:
content = i
heads.split(";")
content.split(";")
print heads
That always returns the following string, not a valid list.
wurst;ball;hoden;sack
Why does split not work on this string?
Greetings and merry Christmas,
Jan
The split method returns the list, it does not modify the object in place. Try:
heads = heads.split(";")
content = content.split(";")
I've noticed also that your data seems to all be integers. You might consider instead the following for content:
content = [int(i) for i in content.split(";")]
The reason is that split returns a list of strings, and it seems like you might need to deal with them as numbers in your code later on. Of course, disregard if you are expecting non-numeric data to show up at some point.