Python CSV export writing characters to new lines - list

I have been using multiple code snippets to create a solution that will allow me to write a list of players in a football team to a csv file.
import csv
data = []
string = input("Team Name: ")
fName = string.replace(' ', '') + ".csv"
print("When you have entered all the players, press enter.")
# while loop that will continue allowing entering of players
done = False
while not done:
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ")
print (string)
file = open(fName, 'w')
output = csv.writer(file)
for row in string:
tempRow = row
output.writerow(tempRow)
file.close()
print("Team written to file.")
I would like the exported csv file to look like this:
player1,25,striker
player2,27,midfielder
and so on. However, when I check the exported csv file it looks more like this:
p
l
a
y
e
r
,
2
5
and so on.
Does anyone have an idea of where i'm going wrong?
Many thanks
Karl

Your string is a single string. It is not a list of strings. You are expecting it to be a list of strings when you are doing this:
for row in string:
When you iterate over a string, you are iterating over its characters. Which is why you are seeing a character per line.
Declare a list of strings. And append every string to it like this:
done = False
strings_list = []
while not done:
string = ""
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ") + '\n'
strings_list.append(string)
Now iterate over this strings_list and print to the output file. Since you are putting the delimiter (comma) yourself in the string, you do not need a csv writer.
a_file = open(fName, 'w')
for row in strings_list:
print(row)
a_file.write(row)
a_file.close()
Note:
string is a name of a standard module in Python. It is wise not to use this as a name of any variable in your program. Same goes for your variable file

Related

rstrip, split and sort a list from input text file

I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y

python script returning list with double bracket

I have the following code in python 3. I'm trying to read a text file and output a list of numerical values. These values will then be used when searching through a number of pdf invoices.
Here is what I have for the text file portion:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
txt_numbers.append(num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)
Here is what is the output:
[[], ['51,500.00'], ['6,000.00'], ['77,000.00'], ['37,000.00']]
Question: How do I remove the "[" from within the list. I would like to have just ['51,500.00', '6,000.00', etc...]
I tried doing new_text_numbers = (", ".join(txt_numbers)) and then print(new_text_numbers)
Problem: I was appending a list with a list, which is allowed in python just not what I wanted.
Added lines:
new_num = (", ".join(num))
txt_numbers.append(new_num)
Solution:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
new_num = (", ".join(num))
txt_numbers.append(new_num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)

Do not write line that begins with certain string

I'm trying to omit writing the lines that begin with "KO", however when I run the code the lines still are written to the output file. I tried calling a a boolean expression to see if "KO" was in geneData and it comes back as true. I'm stuck with just that part.
#Read in hsa links
hsa = []
with open ('/users/skylake/desktop/pathway-HSAs.txt', 'r') as file:
for line in file:
line = line.strip()
hsa.append(line)
#Import Modules | Create KEGG Variable
from bioservices.kegg import KEGG
import re
k = KEGG()
##Data Parsing | Writing to File
#for i in range(len(hsa)):
data = k.get(hsa[2])
dict_data = k.parse(data)
#Prep title of file
nameData = re.sub("\[u'", "", str(dict_data['NAME']))
nameData = re.sub(" - Homo sapiens(human)']", "", nameData)
f = open('/Users/Skylake/Desktop/pathway-info/' + nameData + '.txt' , 'w')
#Prep gene data format
geneData = re.sub("', u'", "',\n", str(dict_data['GENE']))
geneData = re.sub("': u'", ": ", geneData)
geneData = re.sub("{u'", "", geneData)
geneData = re.sub("'}", "", geneData)
geneData = re.sub("\[KO", "\nKO", geneData)
f.write("Genes\n")
f.writelines([line for line in geneData if 'KO' not in line])
#Prep compound data format
if 'COMPOUND' in dict_data:
compData = re.sub("\"", "'", str(dict_data['COMPOUND']))
compData = re.sub("', u'", "\n", compData)
compData = re.sub("': u'", ": ", compData)
compData = re.sub("{u'", "", compData)
compData = re.sub("'}", "", compData)
f.write("\nCompounds\n")
f.write(compData)
#Close file
f.close()
Your genedata variable is a single string. When you iterate over it, you are dealing with the individual characters of the string; your line variable is horribly misnamed. The two-character string 'KO' is obviously not contained within any of these single characters, thus your boolean condition is always True.
With no example input data, nor any expected output data, I can't tell what you're trying to do well enough to suggest a solution.

Simple way to refactor this Python code to reduce repetition

I'd like help refactoring this code to reduce redundant lines/concepts. The code for this def in basically repeated 3 times.
Restrictions:
- I'm new, so a really fancy list comprehension or turning things into objects with dunders and method overrides is way to advanced for me.
- Built in modules only. This is Pyhton 2.7 code, and only imports os and re.
What the overall script does:
Finds files with a fixed prefix. The files are pipe-delimited text files. The first row is a header. It has a footer which can be 1 or more rows. Based on the prefix, the script throws away "columns" from the text file that aren't needed in another step. It saves the data, comma-separated, in a new file with a .csv extension.
The bulk of the work is done in processRawFiles(). This is what I'd like refactored, since it's wildly repetitive.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecodeheader+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
for key in groupedFileDict:
# Process the rateCodes file
if key == 'rateCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('RATE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process the roomTypes file
if key == 'roomTypeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('LABEL')
ratecodemeaning = parsedheaders.index('SHORT_DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
# Process sourceCodes file
if key == 'sourceCodeFiles':
for fname_Value in groupedFileDict[key]: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
rawheaders = filedatalines[0] # 1st element of the list is the first row of the file, with the headers
parsedheaders = rawheaders.split('|') # turn the header string into a list where | was delimiter
print '\n'
print 'outname: ', outname, '\n'
# print 'rawheaders: ', rawheaders, '\n'
# print 'parsedheaders: ',parsedheaders, '\n'
# print filedatalines[0:2]
print '\n'
ratecodeindex = parsedheaders.index('SOURCE_CODE')
ratecodemeaning = parsedheaders.index('DESCRIPTION')
for dataline in filedatalines:
if dataline[:4] == 'LOGO':
firstuselessline = filedatalines.index(dataline)
# print firstuselessline
# ignore the first line which was the headers
# stop before the line that starts with LOGO - the first useless line
for dataline in filedatalines[1:firstuselessline-1:]:
# print dataline.split('|')
theratecode = dataline.split('|')[ratecodeindex]
theratemeaning = dataline.split('|')[ratecodemeaning]
# print theratecode, '\t', theratemeaning, '\n'
linetowrite = theratecode + ',' + theratemeaning + '\n'
outputfile.write(linetowrite)
outputfile.close()
processRawFiles(groupedFilestoProcess)
Had to redo my code because there was a new incident where the files in question neither had the header row, nor the footer row. However, since the columns I want still occur in the same order I can keep them only. Also, we stop reading if any next row has fewer columns than the larger of the two indices used.
As for reducing repetition, processRawFiles contains two def's that remove the need to repeat a lot of that parsing code from before.
def separateTranslationTypes(translationFileList):
'''Takes in list of all files to process and find which are roomtypes
, ratecodes or sourcecodes. The type of file determines how it will be processed.'''
rates = []
rooms = []
sources = []
for afile in translationFileList:
rates.append( [m.group() for m in re.finditer('cf_ratecode+(.*)', afile)] )
rooms.append( [m.group() for m in re.finditer('cf_roomtypes+(.*)', afile)] )
sources.append( [m.group() for m in re.finditer('cf_sourcecodes+(.*)', afile)] )
# empty list equates to False. So if x is True if the list is not empty - thus kept.
rates = [x[0] for x in rates if x]
rooms = [x[0] for x in rooms if x]
sources = [x[0] for x in sources if x]
print '... rateCode files :: ',rates,'\n'
print '... roomType files :: ',rooms,'\n'
print '... sourceCode files :: ',sources, '\n'
return {'rateCodeFiles':rates,
'roomTypeFiles':rooms,
'sourceCodeFiles':sources}
groupedFilestoProcess = separateTranslationTypes(allFilestoProcess)
def processRawFiles(groupedFileDict):
def someFixedProcess(bFileList, codeIndex, codeDescriptionIndex):
for fname_Value in bFileList: # fname_Value is the filename
if os.path.exists(fname_Value):
workingfile = open(fname_Value,'rb')
filedatastring = workingfile.read() # turns entire file contents to a single string
workingfile.close()
outname = 'forUpload_' + fname_Value[:-4:] + '.csv' # removes .txt of any other 3 char extension
outputfile = open(outname,'wb')
filedatalines = filedatastring.split('\n') # a list containing each line of the file
# print '\n','outname: ',outname,'\n\n'
# HEADERS ARE NOT IGNORED! Since the file might not have headers.
print outname
for dataline in filedatalines:
# print filedatalines.index(dataline), dataline.split('|')
# e.g. index 13, reuires len 14, so len > index is needed
if len(dataline.split('|')) > codeDescriptionIndex:
thecode_text = dataline.split('|')[codeIndex]
thedescription_text = dataline.split('|')[codeDescriptionIndex]
linetowrite = thecode_text + ',' + thedescription_text + '\n'
outputfile.write(linetowrite)
outputfile.close()
def processByType(aFileList, itsType):
typeDict = {'rateCodeFiles' : {'CODE_INDEX': 4,'DESC_INDEX':7},
'roomTypeFiles' : {'CODE_INDEX': 1,'DESC_INDEX':13},
'sourceCodeFiles': {'CODE_INDEX': 2,'DESC_INDEX':3}}
# print 'someFixedProcess(',aFileList,typeDict[itsType]['CODE_INDEX'],typeDict[itsType]['DESC_INDEX'],')'
someFixedProcess(aFileList,
typeDict[itsType]['CODE_INDEX'],
typeDict[itsType]['DESC_INDEX'])
for key in groupedFileDict:
processByType(groupedFileDict[key],key)
processRawFiles(groupedFilestoProcess)

Retrieve particular parts of string from a text file and save it in a new file in MATLAB

I am trying to retrieve particular parts of a string in a text file such as below and i would like to save them in a text file in MATLAB
Original text file
D 1m8ea_ 1m8e A: d.174.1.1 74583 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74583
D 1m8eb_ 1m8e B: d.174.1.1 74584 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74584
D 3e7ia1 3e7i A:77-496 d.174.1.1 158052 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158052
D 3e7ib1 3e7i B:77-496 d.174.1.1 158053 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158053
D 2bhja1 2bhj A:77-497 d.174.1.1 128533 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=128533
So basically, I would like to retrieve the pdbcodes id which are labeled as "1m8e", chainid labeled as "A" the Start values which is "77" and stop values which is "496" and i would like all of these values to be saved inside of a fprintf statment.
Is there some kind of method is which i can use in RegExp stating which index its all starting at and retrieve those strings based on the position in the text file for each line?
In the end, all i want to have in the fprinf statement is 1m8e, A, 77, 496.
So far i have two fopen function which reads a file and one that writes to a new file and to read each line by line, also a fprintf statment:
pdbcode = '';
chainid = '';
start = '';
stop = '';
fin = fopen('dir.cla.scop.txt_1.75.txt', 'r');
fout = fopen('output_scop.txt', 'w');
% TODO: Add error check!
while true
line = fgetl(fin); % Get the next line from the file
if ~ischar(line)
% End of file
break;
end
% Print result into output_cath.txt file
fprintf(fout, 'INSERT INTO cath_domains (scop_pdbcode, scop_chainid, scopbegin, scopend) VALUES("%s", %s, %s, %s);\n', pdbcode, chainid, start, stop);
Thank you.
You should be able to strsplit on whitespace, get the third ("1m8e") and fourth elements ("A:77-496"), then repeat the process on the fourth element using ":" as the split character, and then again on the second of those two arguments using "-" as the split character. That's one approach. For example, you could do:
% split on space and tab, and ignore empty tokens
tokens = strsplit(line, ' \t', true);
pdbcode = tokens(3);
% split fourth token from previous split on colon
tokens = strsplit(tokens(4), ':');
chainid = tokens(1);
% split second token from previous split on dash
tokens = strsplit(tokens(2), '-');
start = tokens(1);
stop = tokens(2);
If you really wanted to use regular expressions, you could try the following
pattern = '\S+\s+\S+\s+(\S+)\s+([A-Za-z]+):([0-9]+)-([0-9]+)';
[mat tok] = regexp(line, pattern, 'match', 'tokens');
pdbcode = cell2mat(tok)(1);
chainid = cell2mat(tok)(2);
start = cell2mat(tok)(3);
stop = cell2mat(tok)(4);