If line not present in the text file - python - python-2.7

I have a list with a set of strings and another dynamic list:
arr = ['sample1','sample2','sample3']
applist=[]
I am reading a text file line by line, and if a line starts with any of the strings in arr, then I append it to applist, as follows:
for line in open('test.txt').readlines():
for word in arr:
if line.startswith(word):
applist.append(line)
Now, if I do not have a line with any of the strings in the arr list, then I want to append 'NULL' to applist instead. I tried:
for line in open('test.txt').readlines():
for word in arr:
if line.startswith(word):
applist.append(line)
elif word not in 'test.txt':
applist.append('NULL')
But it obviously doesn't work (it inserts many unnecessary NULLs). How do I go about it? Also, there are other lines in the text file besides the three lines starting with the strings in arr. But I want to append only these three lines. Thanks in advance!

for line in open('test.txt').readlines():
found = False
for word in arr:
if line.startswith(word):
applist.append(line)
found = True
break
if not found: applist.append('NULL')

I think this might be what you are looking for:
found1 = NULL
found2 = NULL
found3 = NULL
for line in open('test.txt').readlines():
if line.startswith(arr[0]):
found1 = line;
elif line.startswith(arr[1]):
found2 = line;
elif line.startswith(arr[2]):
found3 = line;
for word in arr:
applist = [found1, found2, found3]
you could clean that up and make it better looking, but that should give you the logic you're going for.

Related

how to not remove space in file

how to keep the space betwen the words?
in the code it deletes them and prints them in column.. so how to print them in row and with the space?
s ='[]'
f = open('q4.txt', "r")
for line in f:
for word in line:
b = word.strip()
c = list(b)
for j in b:
if ord(j) == 32:
print ord(33)
if ord(j) == 97:
print ord(123)
if ord(j) == 65:
print ord(91)
chr_nums = chr(ord(j) - 1)
print chr_nums
f.close()
Short answer: remove the word.strip() command - that's deleting the space. Then put a comma after the print operation to prevent a newline: print chr_nums,
There are several problems with your code aside from what you ask about here:
ord() takes a string (character) not an int, so ord(33) will fail.
for word in line: will be iterating over characters, not words, so word will be a single character and for j in b is unnecessary.
Take a look at the first for loop :
for line in f:
here the variable named 'line' is actually a line from the text file you are reading. So this 'line' variable is actually a string. Now take a look at the second for loop :
for word in line:
Here you are using a for loop on a string variable named as 'line' which we have got from the previous loop. So in the variable named 'word' you are not going to get a word, but single characters one by one. Let me demonstrate this using a simple example :
for word in "how are you?":
print(word)
The output of this code will be as follows :
h
o
w
a
r
e
y
o
u
?
You are getting individual characters from the line and so you don't need to use another for loop like you did 'for j in b:'. I hope this helped you.

Crystal get from n line to n line from a file

How can I get specific lines in a file and add it to array?
For example: I want to get lines 200-300 and put them inside an array. And while at that count the total line in the file. The file can be quite big.
File.each_line is a good reference for this:
lines = [] of String
index = 0
range = 200..300
File.each_line(file, chomp: true) do |line|
index += 1
if range.includes?(index)
lines << line
end
end
Now lines holds the lines in range and index is the number of total lines in the file.
To prevent reading the entire file and allocating a new array for all of its content, you can use File.each_line iterator:
lines = [] of String
File.each_line(file, chomp: true).with_index(1) do |line, idx|
case idx
when 1...200 then next # ommit lines before line 200 (note exclusive range)
when 200..300 then lines << line # collect lines 200-300
else break # early break, to be efficient
end
end

Extracting data using regular expressions: Python

The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of [0-9]+ and then converting the extracted strings to integers and summing up the integers.
I am finding trouble in appending the list. From my below code, it is just appending the first(0) index of the line. Please help me. Thank you.
import re
hand = open ('a.txt')
lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
if len(stuff)!= 1 : continue
num = int (stuff[0])
lst.append(num)
print sum(lst)
import re
ls=[];
text=open('C:/Users/pvkpu/Desktop/py4e/file1.txt');
for line in text:
line=line.rstrip();
l=re.findall('[0-9]+',line);
if len(l)==0:
continue
ls+=l
for i in range(len(ls)):
ls[i]=int(ls[i]);
print(sum(ls));
Great, thank you for including the whole txt file! Your main problem was in the if len(stuff)... line which was skipping if stuff had zero things in it and when it had 2,3 and so on. You were only keeping stuff lists of length 1. I put comments in the code but please ask any questions if something is unclear.
import re
hand = open ('a.txt')
str_num_lst = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('[0-9]+', line)
#If we didn't find anything on this line then continue
if len(stuff) == 0: continue
#if len(stuff)!= 1: continue #<-- This line was wrong as it skip lists with more than 1 element
#If we did find something, stuff will be a list of string:
#(i.e. stuff = ['9607', '4292', '4498'] or stuff = ['4563'])
#For now lets just add this list onto our str_num_list
#without worrying about converting to int.
#We use '+=' instead of 'append' since both stuff and str_num_lst are lists
str_num_lst += stuff
#Print out the str_num_list to check if everything's ok
print str_num_lst
#Get an overall sum by looping over the string numbers in the str_num_lst
#Can convert to int inside the loop
overall_sum = 0
for str_num in str_num_lst:
overall_sum += int(str_num)
#Print sum
print 'Overall sum is:'
print overall_sum
EDIT:
You are right, reading in the entire file as one line is a good solution, and it's not difficult to do. Check out this post. Here is what the code could look like.
import re
hand = open('a.txt')
all_lines = hand.read() #Reads in all lines as one long string
all_str_nums_as_one_line = re.findall('[0-9]+',all_lines)
hand.close() #<-- can close the file now since we've read it in
#Go through all the matches to get a total
tot = 0
for str_num in all_str_nums_as_one_line:
tot += int(str_num)
print('Overall sum is:',tot) #editing to add ()

how to skip multiple header lines using python

I am new to python. Trying to write a script that will use numeric colomns from a file whcih also contains a header. Here is an example of a file:
#File_Version: 4
PROJECTED_COORDINATE_SYSTEM
#File_Version____________-> 4
#Master_Project_______->
#Coordinate_type_________-> 1
#Horizon_name____________->
sb+
#Horizon_attribute_______-> STRUCTURE
474457.83994 6761013.11978
474482.83750 6761012.77069
474507.83506 6761012.42160
474532.83262 6761012.07251
474557.83018 6761011.72342
474582.82774 6761011.37433
474607.82530 6761011.02524
I'd like to skip the header. here is what i tried. It works of course if i know which characters will appear in the header like "#" and "#". But how can i skip all lines containing any letter character?
in_file1 = open(input_file1_short, 'r')
out_file1 = open(output_file1_short,"w")
lines = in_file1.readlines ()
x = []
y = []
for line in lines:
if "#" not in line and "#" not in line:
strip_line = line.strip()
replace_split = re.split(r'[ ,|;"\t]+', strip_line)
x = (replace_split[0])
y = (replace_split[1])
out_file1.write("%s\t%s\n" % (str(x),str(y)))
in_file1.close ()
Thank you very much!
I think you could use some built ins like this:
import string
for line in lines:
if any([letter in line for letter in string.ascii_letters]):
print "there is an ascii letter somewhere in this line"
This is only looking for ascii letters, however.
you could also:
import unicodedata
for line in lines:
if any([unicodedata.category(unicode(letter)).startswith('L') for letter in line]):
print "there is a unicode letter somewhere in this line"
but only if I understand my unicode categories correctly....
Even cleaner (using suggestions from other answers. This works for both unicode lines and strings):
for line in lines:
if any([letter.isalpha() for letter in line]):
print "there is a letter somewhere in this line"
But, interestingly, if you do:
In [57]: u'\u2161'.isdecimal()
Out[57]: False
In [58]: u'\u2161'.isdigit()
Out[58]: False
In [59]: u'\u2161'.isalpha()
Out[59]: False
The unicode for the roman numeral "Two" is none of those,
but unicodedata.category(u'\u2161') does return 'Nl' indicating a numeric (and u'\u2161'.isnumeric() is True).
This will check the first character in each line and skip all lines that doesn't start with a digit:
for line in lines:
if line[0].isdigit():
# we've got a line starting with a digit
Use a generator pipeline to filter your input stream.
This takes the lines from your original input lines, but stops to check that there are no letters in the entire line.
input_stream = (line in lines if
reduce((lambda x, y: (not y.isalpha()) and x), line, True))
for line in input_stream:
strip_line = ...

Retrieve particular parts of string from a text file and save it in a new file in MATLAB

I am trying to retrieve particular parts of a string in a text file such as below and i would like to save them in a text file in MATLAB
Original text file
D 1m8ea_ 1m8e A: d.174.1.1 74583 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74583
D 1m8eb_ 1m8e B: d.174.1.1 74584 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=74584
D 3e7ia1 3e7i A:77-496 d.174.1.1 158052 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158052
D 3e7ib1 3e7i B:77-496 d.174.1.1 158053 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=158053
D 2bhja1 2bhj A:77-497 d.174.1.1 128533 cl=53931,cf=56511,sf=56512,fa=56513,dm=56514,sp=56515,px=128533
So basically, I would like to retrieve the pdbcodes id which are labeled as "1m8e", chainid labeled as "A" the Start values which is "77" and stop values which is "496" and i would like all of these values to be saved inside of a fprintf statment.
Is there some kind of method is which i can use in RegExp stating which index its all starting at and retrieve those strings based on the position in the text file for each line?
In the end, all i want to have in the fprinf statement is 1m8e, A, 77, 496.
So far i have two fopen function which reads a file and one that writes to a new file and to read each line by line, also a fprintf statment:
pdbcode = '';
chainid = '';
start = '';
stop = '';
fin = fopen('dir.cla.scop.txt_1.75.txt', 'r');
fout = fopen('output_scop.txt', 'w');
% TODO: Add error check!
while true
line = fgetl(fin); % Get the next line from the file
if ~ischar(line)
% End of file
break;
end
% Print result into output_cath.txt file
fprintf(fout, 'INSERT INTO cath_domains (scop_pdbcode, scop_chainid, scopbegin, scopend) VALUES("%s", %s, %s, %s);\n', pdbcode, chainid, start, stop);
Thank you.
You should be able to strsplit on whitespace, get the third ("1m8e") and fourth elements ("A:77-496"), then repeat the process on the fourth element using ":" as the split character, and then again on the second of those two arguments using "-" as the split character. That's one approach. For example, you could do:
% split on space and tab, and ignore empty tokens
tokens = strsplit(line, ' \t', true);
pdbcode = tokens(3);
% split fourth token from previous split on colon
tokens = strsplit(tokens(4), ':');
chainid = tokens(1);
% split second token from previous split on dash
tokens = strsplit(tokens(2), '-');
start = tokens(1);
stop = tokens(2);
If you really wanted to use regular expressions, you could try the following
pattern = '\S+\s+\S+\s+(\S+)\s+([A-Za-z]+):([0-9]+)-([0-9]+)';
[mat tok] = regexp(line, pattern, 'match', 'tokens');
pdbcode = cell2mat(tok)(1);
chainid = cell2mat(tok)(2);
start = cell2mat(tok)(3);
stop = cell2mat(tok)(4);