Remove whitespaces from speciifc part of file - python-2.7

code:
with open(filename) as f:
file_list = f.readlines()
file_list = [line.strip() for line in file_list] # remove whitespaces from each line of file
code to process data between start and end tags (these tags can have whitespaces thats why i have removed them above)
This code works fine for me but if the file is too big then i don't think its sensible to copy whole data in a list then strip whitespaces from each line.
How can i remove whitespaces for specific part of list so that only that much part i can save in list ?
I tried:
with open(filename) as f:
for line in f.readlines():
if line.strip() == "start":
start = f.readlines.index("start")
if line.strip() == "end"
end = f.readlines.index("end")
file_list = f.readlines[start:end]
But its giving error
start = f.readlines.index("start")
AttributeError: 'builtin_function_or_method' object has no attribute 'index'
I just want to write an efficient code of code mentioned on top of this post.

The problem with your code is that the file object f is an iterator, and once you call f.readlines() it is exhausted, so finding the index of a line by calling f.readlines() again can't work. Also, calling readlines() at all negates your effort of storing only the interesting parts of the file, as readlines() would read the entire file into memory anyways.
Instead, just memorize whether you've already seen the start-line and add the following lines to the list until you see the end-line.
with open(filename) as f:
started, lines = False, []
for line in f:
stripped = line.strip()
if stripped == "end": break
if started: lines.append(stripped)
if stripped == "start": started = True
Alternatively, you could also use itertools.takewhile to get all the lines up to the end-line.
import itertools
with open(filename) as f:
for line in f:
if line.strip() == "start":
lines = itertools.takewhile(lambda l: l.strip() != "end", f)
lines = map(str.strip, lines)
break
Or even shorter, using another takewhile to read (and discard) the lines before the start-line:
with open("test.txt") as f:
list(itertools.takewhile(lambda l: l.strip() != "start", f))
lines = itertools.takewhile(lambda l: l.strip() != "end", f)
lines = map(str.strip, lines)
In all cases, lines holds the (stripped) lines between the start- and the end-line, both exclusive.

Tobias's first answer can be modified a bit with continue ...
with open(filename) as f:
started, lines = False, []
for line in f:
stripped = line.strip()
if stripped == "end": break
if stripped == "start":
started = True
continue
if not started: continue
# process line here no need to store it in a list ...

Related

Need improvement in the while loop in python program

with some help of from this forum (#COLDSPEED...Thanks a lot )I have been able to read the latest file created in the directory. The program is looking for the max timestamp of file creation time. But I need two improvement
1.But what if 2 files are created in the same time stamp?
2.I want to skip the file which is already read(in case no new file arrives) when the while loop is checking for the latest file.
import os
import time
def detect_suspects(file_path, word_list):
with open(file_path) as LogFile:
Summary = {word: [] for word in word_list}
failure = ':'
for num, line in enumerate(LogFile, start=1):
for word in word_list:
if word in line:
failure += '<li>' + line + '</li>'
return failure
while True:
files = os.listdir('.')
latest_file = max(files, key=os.path.getmtime)
Error_Suspects = ['Error', 'ERROR', 'Failed', 'Failure']
print(latest_file)
Result = detect_suspects(latest_file, Error_Suspects)
print (Result)
time.sleep(5)
To address your first question, when 2 files have the exact same timestamp, max picks one and returns it. The first string that appears in the list that is associated with the max modification time is returned.
For your second question, you could make a small addition to your existing code by keeping track of the previous file and previous modification time.
Error_Suspects = ['Error', 'ERROR', 'Failed', 'Failure']
prev_file = None
prev_mtime = None
while True:
files = os.listdir('.')
latest_file = max(files, key=os.path.getmtime)
if latest_file != prev_file or (latest_file == prev_file and prev_mtime != os.path.getmtime(latest_file):
Result = detect_suspects(latest_file, Error_Suspects)
prev_file = latest_file
prev_mtime = os.path.getmtime(latest_file)
time.sleep(5)
In this code, the if condition will execute your code only if 1) your new file is different from your old file, or 2) your old and new file is the same but it was modified since the last time.

How to direct this to file

This is not direting to file p
with open('/var/tmp/out3') as f:
before = collections.deque(maxlen=1)
for line in f:
if 'disk#g5000cca025a1ee6c' in line:
sys.stdout.writelines(before)
p.write(before)
Try this:
import sys
filename = '/var/tmp/out3'
expression = 'disk#g5000cca025a1ee6c'
with open(filename, 'r') as f:
with open('p', 'w') as p_file:
previous = next(f)
for line in f:
if expression in line:
p_file.write(previous)
previous = line
If the expression is found, you should find a file 'p' in your current directory containing the expression.
It worked when I tried it on Python2.7.10. I took the code from this answer Refer to previous line when iterating through file with Python.
Hope this helps.

python readline from big text file

When I run this:
import os.path
import pyproj
srcProj = pyproj.Proj(proj='longlat', ellps='GRS80', datum='NAD83')
dstProj = pyproj.Proj(proj='longlat', ellps='WGS84', datum='WGS84')
f = file(os.path.join("DISTAL-data", "countries.txt"), "r")
heading = f.readline() # Ignore field names.
with open('C:\Python27\DISTAL-data\geonames_20160222\countries.txt', 'r') as f:
for line in f.readlines():
parts = line.rstrip().split("|")
featureName = parts[1]
featureClass = parts[2]
lat = float(parts[9])
long = float(parts[10])
if featureClass == "Populated Place":
long,lat = pyproj.transform(srcProj, dstProj, long, lat)
f.close()
I get this error:
File "C:\Python27\importing world datacountriesfromNAD83 toWGS84.py",
line 13, in for line in f.readlines() : MemoryError.
I have downloaded countries file from http://geonames.nga.mil/gns/html/namefiles.html as entire country file dataset.
Please help me to get out of this.
readlines() for large files creates a large structure in memory, you can try using:
f = open('somefilename','r')
for line in f:
dosomthing()
Answer given by Yael is helpful, I would like to improve it. A Good way to read a file or large file
with open(filename) as f:
for line in f:
print f
I like to use 'with' statement which ensure file will be properly closed.

Python: copy line, conditional criteria

I have been searching for following Python solution to copy selectively lines from 1 txt file to another. I can copy the whole file, but with only a few lines I get an error.
My code:
f = open(from_file, "r")
g = open(to_file, "w")
#copy = open(to_file, "w") # this instruction copies whole file
rowcond2 = 'xxxx' # look for this string sequence in every line
for line in f:
if rowcond2 in f:
copy.write(line,"w") in g # write every corresponding line to destination
f.close()
# copy.close() # code receive error to close destination
g.close()
So without the rowcond2, I can copy the whole file. Yet with the condition nothing is written to destination file.
Thank you for your help.
Why not to put your condition inside the for loop?
for line in f:
if condition:
copy.write(line)
I have been able to solve this case searching on SO:
Using python to write specific lines from one file to another file
#Lukas Graf: thank you for your detailed step wise explanation.

python 2.7: reading a file only up to a known line

If I wanted to read starting from a given line I can do:
with open(myfile) as f:
for x in range(from_here):
next(f)
for line in f:
do stuff
How can I do the opposite: reading only up to a given line?
I was thinking about a for loop: is there another way?
The obvious answer is to use a loop that just counts:
with open(myfile) as f:
for i in xrange(number_of_wanted_lines):
line = next(f)
# do stuff with line
Regarding the second part of your question, you can also read in the full file into a list of lines, then use slices:
with open(myfile) as f:
lines = f.readlines()[start_line_number:end_line_number+1]
for line in lines:
# do stuff with line
If you don't want to load the whole file into memory, you can also use islice (from itertools) instead of list slices:
import itertools
with open(myfile) as f:
for line in itertools.islice(f, start_line_number, end_line_number + 1):
# do stuff with line
with open(myfile) as f:
for x in range(until_here):
line = next(f)
# do stuff with line
# do stuff with the rest of f
or
import itertools as it
with open(myfile) as f:
for line in it.islice(f, until_here):
# do stuff
# do stuff with the rest of f