Python- Writing all results from a loop to a variable - python-2.7

I have a .txt file with dozens of columns and hundreds of rows. I want to write the results of the entirety of two specific columns into two variables. I don't have a great deal of experience with for loops but here is my attempt to loop through the file.
a = open('file.txt', 'r') #<--This puts the file in read mode
header = a.readline() #<-- This skips the strings in the 0th row indicating the labels of each column
for line in a:
line = line.strip() #removes '\n' characters in text file
columns = line.split() #Splits the white space between columns
x = float(columns[0]) # the 1st column of interest
y = float(columns[1]) # the 2nd column of interest
print(x, y)
f.close()
Outside of the loop, printing x or y only displays the last value of the text file. I want it to have all the values of the specified columns of the file. I know of the append command but I am unsure how to apply it in this situation within the for loop.
Does anyone have any suggestions or easier methods on how to do this?

Make two lists x and y before you sart the loop and append to them in the loop:
a = open('file.txt', 'r') #<--This puts the file in read mode
header = a.readline() #<-- This skips the strings in the 0th row indicating the labels of each column
x = []
y = []
for line in a:
line = line.strip() #removes '\n' characters in text file
columns = line.split() #Splits the white space between columns
x.append(float(columns[0])) # the 1st column of interest
y.append(float(columns[1])) # the 2nd column of interest
f.close()
print('all x:')
print(x)
print('all y:')
print(y)

Your code only binds the value of the last element. I'm not sure that is your entire codes, but if you want to keep add the values of the column, I would suggest appending it to the array then print it outside of loop.
listx = []
listy = []
a = open('openfile', 'r')
#skip the header
for line in a:
#split the line
#set the x and y variables.
listx.append(x)
listy.append(y)
#print outside of loop.

Related

rstrip, split and sort a list from input text file

I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y

Crystal get from n line to n line from a file

How can I get specific lines in a file and add it to array?
For example: I want to get lines 200-300 and put them inside an array. And while at that count the total line in the file. The file can be quite big.
File.each_line is a good reference for this:
lines = [] of String
index = 0
range = 200..300
File.each_line(file, chomp: true) do |line|
index += 1
if range.includes?(index)
lines << line
end
end
Now lines holds the lines in range and index is the number of total lines in the file.
To prevent reading the entire file and allocating a new array for all of its content, you can use File.each_line iterator:
lines = [] of String
File.each_line(file, chomp: true).with_index(1) do |line, idx|
case idx
when 1...200 then next # ommit lines before line 200 (note exclusive range)
when 200..300 then lines << line # collect lines 200-300
else break # early break, to be efficient
end
end

Extracting data using regular expressions: Python

The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers.
I am finding trouble in appending the list. From my below code, it is just appending the first(0) index of the line. Please help me. Thank you.
import re
hand = open ('a.txt')
lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
if len(stuff)!= 1 : continue
num = int (stuff[0])
lst.append(num)
print sum(lst)
import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
line=line.rstrip();
l=re.findall('[0-9]+',line);
if len(l)==0:
continue
ls+=l
for i in range(len(ls)):
ls[i]=int(ls[i]);
print(sum(ls));
Great, thank you for including the whole txt file! Your main problem was in the if len(stuff)... line which was skipping if stuff had zero things in it and when it had 2,3 and so on. You were only keeping stuff lists of length 1. I put comments in the code but please ask any questions if something is unclear.
import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
#If we didn't find anything on this line then continue
if len(stuff) == 0: continue
#if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element
#If we did find something, stuff will be a list of string:
#(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
#For now lets just add this list onto our str_num_list
#without worrying about converting to int.
#We use '+=' instead of 'append' since both stuff and str_num_lst are lists
str_num_lst += stuff
#Print out the str_num_list to check if everything's ok
print str_num_lst
#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
overall_sum += int(str_num)
#Print sum
print 'Overall sum is:'
print overall_sum
EDIT:
You are right, reading in the entire file as one line is a good solution, and it's not difficult to do. Check out this post. Here is what the code could look like.
import re
hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in
#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
tot += int(str_num)
print('Overall sum is:',tot) #editing to add ()

Why does this code only read the first line rather than the whole .txt file?

I have a code here on Python 2.7 that is supposed to tell me the frequency of a letter or word within a single text file.
def frequency_a_in_text(textfile, a):
"""Counts how many "a" letters are in the text file.
"""
try:
f = open(textfile,'r')
lines = f.readlines()
f.close()
except IOError:
return -1
tot = 0
for line in lines:
split = str(line.split())
k = split.count(s)
tot = tot + k
return tot
print frequency_a_in_text("RandomTextFile.txt", "a")
There's a little bit of extra coding in there - the "try" and "except", but that's just telling me that if I can't open the text file, then it'll return a "-1" to me.
Whenever I run it, it seems to just read the first line and tell me how many "a" letters there are.
You are returning out of the function after the first iteration of your loop.
The return statement should be outside of the loop.
for line in lines:
split = str(line.split())
k = split.count(s)
tot = tot + k
return tot

Retrieve particular parts of string from a text file and save it in a new file in MATLAB

I am trying to retrieve particular parts of a string in a text file such as below and i would like to save them in a text file in MATLAB
Original text file
D 1m8ea_ 1m8e A: d.174.1.1 74583 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74583
D 1m8eb_ 1m8e B: d.174.1.1 74584 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74584
D 3e7ia1 3e7i A:77-496 d.174.1.1 158052 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158052
D 3e7ib1 3e7i B:77-496 d.174.1.1 158053 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158053
D 2bhja1 2bhj A:77-497 d.174.1.1 128533 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=128533
So basically, I would like to retrieve the pdbcodes id which are labeled as "1m8e", chainid labeled as "A" the Start values which is "77" and stop values which is "496" and i would like all of these values to be saved inside of a fprintf statment.
Is there some kind of method is which i can use in RegExp stating which index its all starting at and retrieve those strings based on the position in the text file for each line?
In the end, all i want to have in the fprinf statement is 1m8e, A, 77, 496.
So far i have two fopen function which reads a file and one that writes to a new file and to read each line by line, also a fprintf statment:
pdbcode = '';
chainid = '';
start = '';
stop = '';
fin = fopen('dir.cla.scop.txt_1.75.txt', 'r');
fout = fopen('output_scop.txt', 'w');
% TODO: Add error check!
while true
line = fgetl(fin); % Get the next line from the file
if ~ischar(line)
% End of file
break;
end
% Print result into output_cath.txt file
fprintf(fout, 'INSERT INTO cath_domains (scop_pdbcode, scop_chainid, scopbegin, scopend) VALUES("%s", %s, %s, %s);\n', pdbcode, chainid, start, stop);
Thank you.
You should be able to strsplit on whitespace, get the third ("1m8e") and fourth elements ("A:77-496"), then repeat the process on the fourth element using ":" as the split character, and then again on the second of those two arguments using "-" as the split character. That's one approach. For example, you could do:
% split on space and tab, and ignore empty tokens
tokens = strsplit(line, ' \t', true);
pdbcode = tokens(3);
% split fourth token from previous split on colon
tokens = strsplit(tokens(4), ':');
chainid = tokens(1);
% split second token from previous split on dash
tokens = strsplit(tokens(2), '-');
start = tokens(1);
stop = tokens(2);
If you really wanted to use regular expressions, you could try the following
pattern = '\S+\s+\S+\s+(\S+)\s+([A-Za-z]+):([0-9]+)-([0-9]+)';
[mat tok] = regexp(line, pattern, 'match', 'tokens');
pdbcode = cell2mat(tok)(1);
chainid = cell2mat(tok)(2);
start = cell2mat(tok)(3);
stop = cell2mat(tok)(4);