I'm trying to omit writing the lines that begin with "KO", however when I run the code the lines still are written to the output file. I tried calling a a boolean expression to see if "KO" was in geneData and it comes back as true. I'm stuck with just that part.
#Read in hsa links
hsa = []
with open ('/users/skylake/desktop/pathway-HSAs.txt', 'r') as file:
for line in file:
line = line.strip()
hsa.append(line)
#Import Modules | Create KEGG Variable
from bioservices.kegg import KEGG
import re
k = KEGG()
##Data Parsing | Writing to File
#for i in range(len(hsa)):
data = k.get(hsa[2])
dict_data = k.parse(data)
#Prep title of file
nameData = re.sub("\[u'", "", str(dict_data['NAME']))
nameData = re.sub(" - Homo sapiens(human)']", "", nameData)
f = open('/Users/Skylake/Desktop/pathway-info/' + nameData + '.txt' , 'w')
#Prep gene data format
geneData = re.sub("', u'", "',\n", str(dict_data['GENE']))
geneData = re.sub("': u'", ": ", geneData)
geneData = re.sub("{u'", "", geneData)
geneData = re.sub("'}", "", geneData)
geneData = re.sub("\[KO", "\nKO", geneData)
f.write("Genes\n")
f.writelines([line for line in geneData if 'KO' not in line])
#Prep compound data format
if 'COMPOUND' in dict_data:
compData = re.sub("\"", "'", str(dict_data['COMPOUND']))
compData = re.sub("', u'", "\n", compData)
compData = re.sub("': u'", ": ", compData)
compData = re.sub("{u'", "", compData)
compData = re.sub("'}", "", compData)
f.write("\nCompounds\n")
f.write(compData)
#Close file
f.close()
Your genedata variable is a single string. When you iterate over it, you are dealing with the individual characters of the string; your line variable is horribly misnamed. The two-character string 'KO' is obviously not contained within any of these single characters, thus your boolean condition is always True.
With no example input data, nor any expected output data, I can't tell what you're trying to do well enough to suggest a solution.
Related
I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y
I have the following code in python 3. I'm trying to read a text file and output a list of numerical values. These values will then be used when searching through a number of pdf invoices.
Here is what I have for the text file portion:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
txt_numbers.append(num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)
Here is what is the output:
[[], ['51,500.00'], ['6,000.00'], ['77,000.00'], ['37,000.00']]
Question: How do I remove the "[" from within the list. I would like to have just ['51,500.00', '6,000.00', etc...]
I tried doing new_text_numbers = (", ".join(txt_numbers)) and then print(new_text_numbers)
Problem: I was appending a list with a list, which is allowed in python just not what I wanted.
Added lines:
new_num = (", ".join(num))
txt_numbers.append(new_num)
Solution:
txt_numbers = []
for file in os.listdir(my_path):
if file[-3:] == "txt":
with open(my_path + file, 'r') as txt_file:
txt = txt_file.readlines()
for line in txt:
# get number between quotes
num = re.findall(r'(?<=").*?(?=")', line)
new_num = (", ".join(num))
txt_numbers.append(new_num)
for c, value in enumerate(txt_numbers, 1):
print(c, value)
I have been using multiple code snippets to create a solution that will allow me to write a list of players in a football team to a csv file.
import csv
data = []
string = input("Team Name: ")
fName = string.replace(' ', '') + ".csv"
print("When you have entered all the players, press enter.")
# while loop that will continue allowing entering of players
done = False
while not done:
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ")
print (string)
file = open(fName, 'w')
output = csv.writer(file)
for row in string:
tempRow = row
output.writerow(tempRow)
file.close()
print("Team written to file.")
I would like the exported csv file to look like this:
player1,25,striker
player2,27,midfielder
and so on. However, when I check the exported csv file it looks more like this:
p
l
a
y
e
r
,
2
5
and so on.
Does anyone have an idea of where i'm going wrong?
Many thanks
Karl
Your string is a single string. It is not a list of strings. You are expecting it to be a list of strings when you are doing this:
for row in string:
When you iterate over a string, you are iterating over its characters. Which is why you are seeing a character per line.
Declare a list of strings. And append every string to it like this:
done = False
strings_list = []
while not done:
string = ""
a = input("Name of player: ")
if a == "":
done = True
else:
string += a + ','
string += input("Age: ") + ','
string += input("Position: ") + '\n'
strings_list.append(string)
Now iterate over this strings_list and print to the output file. Since you are putting the delimiter (comma) yourself in the string, you do not need a csv writer.
a_file = open(fName, 'w')
for row in strings_list:
print(row)
a_file.write(row)
a_file.close()
Note:
string is a name of a standard module in Python. It is wise not to use this as a name of any variable in your program. Same goes for your variable file
I am reading a text file into Matlab called 'test.txt' which is structured as follows:
$variable1 = answer1;
$variable2 = answer2;
$variable3 = answer3;
I read the text file into Matlab line by line using the following segment of code:
fid = fopen('test.txt.');
tline = fgetl(fid);
tracks = {};
while ischar(tline)
tracks{end+1} = regexp(tline, '(?<=^.*\=\s*)(.*)(?=\s*;$)', 'match', 'once');
tline = fgetl(fid);
end
fclose(fid);
This piece of code returns the value of each variable line by line and would output:
answer1
answer2
answer3
What I want to do is modify my regexp expression so that I can specify the name of the variable to retrieve and have Matlab output the value assigned to the variable specified.
E.g. If I specify in my code to find the value of $variable2, Matlab would return:
answer2
Regards
One possible solution:
function [tracks] = GetAnswer(Filename, VariableName)
fid = fopen(Filename);
tline = fgetl(fid);
tracks = {};
% prefix all $ in VariableName with \ for `regexp` and `regexprep`
VariableName = regexprep(VariableName, '\$', '\\$');
while ischar(tline)
if (regexp(tline, [ '(', VariableName, ')', '( = )', '(.*)', '(;)' ]))
tracks{end+1} = regexprep(tline, [ '(', VariableName, ')', '( = )', '(.*)', '(;)' ], '$3');
% if you want all matches (not only the 1st one),
% remove the following `break` line.
break;
end
tline = fgetl(fid);
end
fclose(fid);
return
You can call it this way:
Answer = GetAnswer('test.txt', '$variable2')
Answer =
'answer2'
I am trying to retrieve particular parts of a string in a text file such as below and i would like to save them in a text file in MATLAB
Original text file
D 1m8ea_ 1m8e A: d.174.1.1 74583 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74583
D 1m8eb_ 1m8e B: d.174.1.1 74584 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74584
D 3e7ia1 3e7i A:77-496 d.174.1.1 158052 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158052
D 3e7ib1 3e7i B:77-496 d.174.1.1 158053 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158053
D 2bhja1 2bhj A:77-497 d.174.1.1 128533 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=128533
So basically, I would like to retrieve the pdbcodes id which are labeled as "1m8e", chainid labeled as "A" the Start values which is "77" and stop values which is "496" and i would like all of these values to be saved inside of a fprintf statment.
Is there some kind of method is which i can use in RegExp stating which index its all starting at and retrieve those strings based on the position in the text file for each line?
In the end, all i want to have in the fprinf statement is 1m8e, A, 77, 496.
So far i have two fopen function which reads a file and one that writes to a new file and to read each line by line, also a fprintf statment:
pdbcode = '';
chainid = '';
start = '';
stop = '';
fin = fopen('dir.cla.scop.txt_1.75.txt', 'r');
fout = fopen('output_scop.txt', 'w');
% TODO: Add error check!
while true
line = fgetl(fin); % Get the next line from the file
if ~ischar(line)
% End of file
break;
end
% Print result into output_cath.txt file
fprintf(fout, 'INSERT INTO cath_domains (scop_pdbcode, scop_chainid, scopbegin, scopend) VALUES("%s", %s, %s, %s);\n', pdbcode, chainid, start, stop);
Thank you.
You should be able to strsplit on whitespace, get the third ("1m8e") and fourth elements ("A:77-496"), then repeat the process on the fourth element using ":" as the split character, and then again on the second of those two arguments using "-" as the split character. That's one approach. For example, you could do:
% split on space and tab, and ignore empty tokens
tokens = strsplit(line, ' \t', true);
pdbcode = tokens(3);
% split fourth token from previous split on colon
tokens = strsplit(tokens(4), ':');
chainid = tokens(1);
% split second token from previous split on dash
tokens = strsplit(tokens(2), '-');
start = tokens(1);
stop = tokens(2);
If you really wanted to use regular expressions, you could try the following
pattern = '\S+\s+\S+\s+(\S+)\s+([A-Za-z]+):([0-9]+)-([0-9]+)';
[mat tok] = regexp(line, pattern, 'match', 'tokens');
pdbcode = cell2mat(tok)(1);
chainid = cell2mat(tok)(2);
start = cell2mat(tok)(3);
stop = cell2mat(tok)(4);