python script for limit text file words - python-2.7

I have an input file like:
input.txt:
to
the
cow
eliphant
pigen
then
enthosiastic
I want to remove those words which has character length is <= 4 , and if some word has more than 8 character then write those word in new file till 8 character length
output should be like:
output.txt:
eliphant
pigen
enthosia
This is my code:
f2 = open('output.txt', 'w+')
x2 = open('input.txt', 'r').readlines()
for y in x2:
if (len(y) <= 4):
y = y.replace(y, '')
f2.write(y)
elif (len(y) > 8):
y = y[0:8]
f2.write(y)
else:
f2.write(y)
f2.close()
print "Done!"
when i compile it then it gives the output like:
eliphantpigen
then
enthosia
it also writes 4 character length word... i don't understand what is the problem and how to write the code to limit character length of text file words....?

Use with when working with files, this guarantees that file would be closed.
You have then in your results because your are reading lines and not worlds.
Each line have symbol of ending '\n'. So when you are reading world then you have string
'then\n' and len of this string is 5.
with open('output.txt', 'w+') as ofp, open('input.txt', 'r') as ifp:
for line in ifp:
line = line.strip()
if len(line) > 8:
line = line[:8]
elif len(line) <= 4:
continue
ofp.write(line + '\n')

Related

What is the best way for sum numbers at a big text file?

What is the best way for sum numbers at a big text file?
The text file will contain numbers separated by a comma (',').
The number can be from any type.
No line or row limits.
for example:
1 ,-2, -3.45-7.8j ,99.6,......
...
...
Input: path to the text file
Output: the sum of the numbers
I am tried to wrote one solution at myself and want to know for better solutions:
This is my try:
I am working with chunks of data and not read line by line, and because the end of the chunk can contain some of the number (just -2 and not -2+3j) i am looking just on the "safe piece" the last comma (',') and the other part save for the next chunk
import re
CHUNK_SIZE = 1017
def calculate_sum(file_path):
_sum = 0
with open(file_path, 'r') as _f:
chunk = _f.read(CHUNK_SIZE)
while chunk:
chunk = chunk.replace(' ', '')
safe_piece = chunk.rfind(',')
next_chunk = chunk[safe_piece:] if safe_piece != 0 else ''
if safe_piece != 0:
chunk = chunk[:safe_piece]
_sum += sum(map(complex, re.findall(r"[+-]\d*\.?\d*[+-]?\d*\.?\d*j|[+-]?\d+(?:\.\d+)?", chunk)))
chunk = next_chunk + _f.read(CHUNK_SIZE)
return _sum
Thanks!
This will add up any amount of numbers in a text file. Example:
input.csv
1,-2,-3.45-7.8j,99.6
-1,1-2j
1.5,2.5,1+1j
example.py
import csv
with open('input.txt','rb') as f:
r = csv.reader(f)
total = 0
for line in r:
total += sum(complex(col) for col in line)
print total
Output
(100.15-8.8j)
If you have really long lines and insufficient memory to read it in one go, then you could use a buffering class to chunk the reads and split numbers out of the buffer using a generator function:
import re
class Buffer:
def __init__(self,filename,chunksize=4096):
self.filename = filename
self.chunksize = chunksize
self.buf = ''
def __iter__(self):
with open(self.filename) as f:
while True:
if ',' in self.buf or '\n' in self.buf:
data,self.buf = re.split(r',|\n',self.buf,1) # split off the text up to the first separator
yield complex(data)
else:
chunk = f.read(self.chunksize)
if not chunk: # if no more data to read, return the remaining buffer and exit function
if self.buf:
yield complex(self.buf)
return
self.buf += chunk
total = 0
for num in Buffer('input.txt'):
total += num
print total
Output:
(100.15-8.8j)

Process a text file to find a value above the PE score threshold of 3.19

The text file can be found at this link. What I am interested in is the value of PE score. Graphically, it appears under the column Feature2 sys.
This is my code:
def main():
file = open ( "combined_scores.txt" , "r" )
lines = file.readlines()
file.close()
count_pe=0
for line in lines:
line=line.strip()
line=line[24:31] #1problem is here:the range is not fixed in all line of the file
if line.find( "3.19") != -1 : # I need value >=3.19 not only 3.19
count_pe = count_pe + 1
print ( ">=3.19: ", count_pe )#at the end i need how many times PE>3,19 occur
main()
I suggest you parse the column using tab (\t), and compare with value "3.19". It should be something like below (Python 2.7):
with open('combined_scores.txt') as f:
lines = f.readlines()[1:] # remove the header line
# reset counter
n = 0
for line in lines:
if float(line.split('\t')[-3]) >= 3.19:
n = n + 1
# print total count
print 'total=', n

Python : count function does not work

I am stuck on an exercise from a Coursera Python course, this is the question:
"Open the file mbox-short.txt and read it line by line. When you find a line that starts with 'From ' like the following line:
From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008
You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out a count at the end.
Hint: make sure not to include the lines that start with 'From:'.
You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt"
Here is my code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
words = line.split()
if len(words) > 2 and words[0] == 'From':
print words[1]
count = count + 1
else:
continue
print "There were", count, "lines in the file with From as the first word"`
The output should be a list of emails and the sum of them, but it doesn't work and I don't know why: actually the output is "There were 0 lines in the file with From as the first word"
I used your code and downloaded the file from the link. And I am getting this output:
There were 27 lines in the file with From as the first word
Have you checked if you are downloading the file in the same location as the code file.
fname = input("Enter file name: ")
counter = 0
fh = open(fname)
for line in fh :
line = line.rstrip()
if not line.startswith('From '): continue
words = line.split()
print (words[1])
counter +=1
print ("There were", counter, "lines in the file with From as the first word")
fname = input("Enter file name: ")
fh = open(fname)
count = 0
for line in fh :
if line.startswith('From '): # consider the lines which start from the word "From "
y=line.split() # we split the line into words and store it in a list
print(y[1]) # print the word present at index 1
count=count+1 # increment the count variable
print("There were", count, "lines in the file with From as the first word")
I have written all the comments if anyone faces any difficulty, in case you need help feel free to contact me. This is the easiest code available on internet. Hope you benefit from my answer
fname = input('Enter the file name:')
fh = open(fname)
count = 0
for line in fh:
if line.startswith('From'):
linesplit =line.split()
print(linesplit[1])
count = count +1
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for i in fh:
i=i.rstrip()
if not i.startswith('From '): continue
word=i.split()
count=count+1
print(word[1])
print("There were", count, "lines in the file with From as the first word")
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if line.startswith('From'):
line=line.rstrip()
lt=line.split()
if len(lt)==2:
print(lt[1])
count=count+1
print("There were", count, "lines in the file with From as the first word")
My code looks like this and works as a charm:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
count = 0 #initialize the counter to 0 for the start
for line in fh: #iterate the document line by line
words = line.split() #split the lines in words
if not len(words) < 2 and words[0] == "From": #check for lines starting with "From" and if the line is longer than 2 positions
print(words[1]) #print the words on position 1 from the list
count += 1 # count
else:
continue
print("There were", count, "lines in the file with From as the first word")
It is a nice exercise from the course of Dr. Chuck
There is also another way. You can store the found words in a separate empty list and then print out the lenght of the list. It will deliver the same result.
My tested code as follows:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
newl = list()
for line in fh:
words = line.split()
if not len(words) < 2 and words[0] == 'From':
newl.append(words[1])
else:
continue
print(*newl, sep = "\n")
print("There were", len(newl), "lines in the file with From as the first word")
I did pass the exercise with it as well. Enjoy and keep the good work. Python is so much fun to me even though i always hated programming.

How do I dynamically search for text in a file and write to another file

I'll try to be as specific as I can. Keep in mind I just started learning this language last week so I'm not a professional. I'm trying to make a program that will read a vocabulary file that I created and write the definition for the word to another preexisting file with a different format.
Example of the two formats and what I'm trying to do here:
Word 1 - Definition
Word 1 (page 531) - Definition from other file
What I'm currently doing with it is I'm opening both files and searching a word based on user input, which isn't working. What I want to do is I want the program to go into the output file and find the word, then find the same word in the input file, get the definition only, and paste it into the output file. Then move to the next word and loop until it finds the end of file. I really don't know how to do that so I'm currently stuck. How would you python pros here on stackoverflow handle this?
Also for those who are suspicious of my reasons for this program, I'm not trying to cheat on an assignment, I'm trying to get some of my college work done ahead of time and I don't want to run into conflicts with my formatting being different from the teachers. This is just to save me time so I don't have to do the same assignment twice.
Edit 1
Here is the full code pasted from my program currently.
import os
print("Welcome to the Key Terms Finder Program. What class is this for?\n[A]ccess\n[V]isual Basic")
class_input = raw_input(">>")
if class_input == "A" or class_input == "a":
class_input = "Access"
chapter_num = 11
elif class_input == "V" or class_input == "v":
class_input = "Visual Basic"
chapter_num = 13
else:
print("Incorrect Input")
print("So the class is " + class_input)
i = 1
for i in range(1, chapter_num + 1):
try:
os.makedirs("../Key Terms/" + class_input + "/Chapter " + str(i) + "/")
except WindowsError:
pass
print("What Chapter is this for? Enter just the Chapter number. Ex: 5")
chapter_input = raw_input(">>")
ChapterFolder = "../Key Terms/" + class_input + "/Chapter " + str(chapter_input) + "/"
inputFile = open(ChapterFolder + "input.txt", "r")
outputFile = open(ChapterFolder + "output.txt", "w")
line = inputFile.readlines()
i = 0
print("Let's get down to business. Enter the word you are looking to add to the file.")
print("To stop entering words, enter QWERTY")
word_input = ""
while word_input != "QWERTY":
word_input = raw_input(">>")
outputArea = word_input
linelen = len(line)
while i < linelen:
if line[i] == word_input:
print("Word Found")
break
else:
i = i + 1
print(i)
i = 0
inputFile.close()
outputFile.close()
Not a python pro , however, I will try to answer your question.
output=[]
word=[]
definition=[]
with open('input.txt','r') as f:
for line in f:
new_line=re.sub('\n','',line)
new_line=re.sub('\s+','',line)
word.append(new_line.split("-")[0])
definition.append(new_line.split("-")[1])
with open('output.txt','r') as f:
for line in f:
new_line=re.sub('\n','',line)
new_line=re.sub('\s+','',line)
try:
index = word.index(new_line)
print index
meaning = definition[index]
print meaning
output.append(new_line+" - "+meaning)
except ValueError as e:
output.append(new_line+" - meaning not found")
print e
f=open("output.txt","w")
f.write("\n".join(output))
f.close()
Here, input.txt is the file where word and definition is present.
output.txt is the file which has only words ( it was unclear to me what output.txt contained I assumed only words ).
Above code is reading from output.txt , looking into input.txt and gets the definition if found else it skips.
Assumption is word and definition are separated by -
Does this helps?

Python print a line that contains a number greater than "x" in a file

I'm new in Python, I have a script that prints all lines in a file that contains 9 using python:
#!/usr/bin/env phyton
import re
testFile = open("test.txt", "r")
for line in testFile:
if re.findall("\\b9\\b", line):
print line
Now, how can I print all lines that contains a number greater than 9?
test.txt:
number1 9
number2 10
number3 5
number4 6
number5 15
You can use regular expression grouping:
for line in testFile:
m = re.search(r"\b(\d+)\b", line)
if m is not None and int(m.group(1)) >= 9:
print line
The (\d+) extracts the text matched by that part of the regex into m.group(1). Then the int() converts that to an integer and compares with 9.
This will extract the first instance of a number within each line. If you want to search all numbers in a line, you will need to use something like re.finditer() in combination with the above.
This prints the line if there is any space-separated number greater than 9.
testFile = open("test.txt", "r")
for line in testFile:
for word in line.split():
try:
if int(word) > 9:
print line
break
except ValueError:
pass
Or, for your example
testFile = open("test.txt", "r")
for line in testFile:
if int(line.split()[1]) > 9:
print line