I am stuck on an exercise from a Coursera Python course, this is the question:
"Open the file mbox-short.txt and read it line by line. When you find a line that starts with 'From ' like the following line:
From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008
You will parse the From line using split() and print out the second word in the line (i.e. the entire address of the person who sent the message). Then print out a count at the end.
Hint: make sure not to include the lines that start with 'From:'.
You can download the sample data at http://www.pythonlearn.com/code/mbox-short.txt"
Here is my code:
fname = raw_input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
words = line.split()
if len(words) > 2 and words[0] == 'From':
print words[1]
count = count + 1
else:
continue
print "There were", count, "lines in the file with From as the first word"`
The output should be a list of emails and the sum of them, but it doesn't work and I don't know why: actually the output is "There were 0 lines in the file with From as the first word"
I used your code and downloaded the file from the link. And I am getting this output:
There were 27 lines in the file with From as the first word
Have you checked if you are downloading the file in the same location as the code file.
fname = input("Enter file name: ")
counter = 0
fh = open(fname)
for line in fh :
line = line.rstrip()
if not line.startswith('From '): continue
words = line.split()
print (words[1])
counter +=1
print ("There were", counter, "lines in the file with From as the first word")
fname = input("Enter file name: ")
fh = open(fname)
count = 0
for line in fh :
if line.startswith('From '): # consider the lines which start from the word "From "
y=line.split() # we split the line into words and store it in a list
print(y[1]) # print the word present at index 1
count=count+1 # increment the count variable
print("There were", count, "lines in the file with From as the first word")
I have written all the comments if anyone faces any difficulty, in case you need help feel free to contact me. This is the easiest code available on internet. Hope you benefit from my answer
fname = input('Enter the file name:')
fh = open(fname)
count = 0
for line in fh:
if line.startswith('From'):
linesplit =line.split()
print(linesplit[1])
count = count +1
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for i in fh:
i=i.rstrip()
if not i.startswith('From '): continue
word=i.split()
count=count+1
print(word[1])
print("There were", count, "lines in the file with From as the first word")
fname = input("Enter file name: ")
if len(fname) < 1 : fname = "mbox-short.txt"
fh = open(fname)
count = 0
for line in fh:
if line.startswith('From'):
line=line.rstrip()
lt=line.split()
if len(lt)==2:
print(lt[1])
count=count+1
print("There were", count, "lines in the file with From as the first word")
My code looks like this and works as a charm:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
count = 0 #initialize the counter to 0 for the start
for line in fh: #iterate the document line by line
words = line.split() #split the lines in words
if not len(words) < 2 and words[0] == "From": #check for lines starting with "From" and if the line is longer than 2 positions
print(words[1]) #print the words on position 1 from the list
count += 1 # count
else:
continue
print("There were", count, "lines in the file with From as the first word")
It is a nice exercise from the course of Dr. Chuck
There is also another way. You can store the found words in a separate empty list and then print out the lenght of the list. It will deliver the same result.
My tested code as follows:
fname = input("Enter file name: ")
if len(fname) < 1:
fname = "mbox-short.txt"
fh = open(fname)
newl = list()
for line in fh:
words = line.split()
if not len(words) < 2 and words[0] == 'From':
newl.append(words[1])
else:
continue
print(*newl, sep = "\n")
print("There were", len(newl), "lines in the file with From as the first word")
I did pass the exercise with it as well. Enjoy and keep the good work. Python is so much fun to me even though i always hated programming.
Related
I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y
I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)
If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y
I would like to count different universities from which the mail was sent for which i used the following code:
fname = raw_input('Enter the file name: ')
try:
fhan = open(fname)
except:
print 'File cannot be opened:', fname
count = 0
sum = 0
for i in fhan:
if i.startswith('From'):
x=i.find('#')
y=i.find(' ',x)
str1=i[x+1:y].strip()
print str1
count=count+1
print count
The final output gives me the handles but can i remove the repeated ones, if i print uct.ac.za it shouldnot print and count again
link for file: www.py4inf.com/code/mbox-short.txt
You can append the handles in a list instead of printing it. And then convert that list in a set. In a set there are no repeated elements so you will get the a set of unique universities. And Finally, you can iterate through the set and print the universities.
For count you can use the len function that will count the universities in the set.
This is the modified code:-
fname = raw_input('Enter the file name: ')
try:
fhan = open(fname)
except:
print 'File cannot be opened:', fname
universities = []
for i in fhan:
if i.startswith('From'):
x=i.find('#')
y=i.find(' ',x)
str1=i[x+1:y].strip()
universities.append(str1)
universities = set(universities)
for i in universities:
print i
print len(universities)
The text file can be found at this link. What I am interested in is the value of PE score. Graphically, it appears under the column Feature2 sys.
This is my code:
def main():
file = open ( "combined_scores.txt" , "r" )
lines = file.readlines()
file.close()
count_pe=0
for line in lines:
line=line.strip()
line=line[24:31] #1problem is here:the range is not fixed in all line of the file
if line.find( "3.19") != -1 : # I need value >=3.19 not only 3.19
count_pe = count_pe + 1
print ( ">=3.19: ", count_pe )#at the end i need how many times PE>3,19 occur
main()
I suggest you parse the column using tab (\t), and compare with value "3.19". It should be something like below (Python 2.7):
with open('combined_scores.txt') as f:
lines = f.readlines()[1:] # remove the header line
# reset counter
n = 0
for line in lines:
if float(line.split('\t')[-3]) >= 3.19:
n = n + 1
# print total count
print 'total=', n
I have an input file like:
input.txt:
to
the
cow
eliphant
pigen
then
enthosiastic
I want to remove those words which has character length is <= 4 , and if some word has more than 8 character then write those word in new file till 8 character length
output should be like:
output.txt:
eliphant
pigen
enthosia
This is my code:
f2 = open('output.txt', 'w+')
x2 = open('input.txt', 'r').readlines()
for y in x2:
if (len(y) <= 4):
y = y.replace(y, '')
f2.write(y)
elif (len(y) > 8):
y = y[0:8]
f2.write(y)
else:
f2.write(y)
f2.close()
print "Done!"
when i compile it then it gives the output like:
eliphantpigen
then
enthosia
it also writes 4 character length word... i don't understand what is the problem and how to write the code to limit character length of text file words....?
Use with when working with files, this guarantees that file would be closed.
You have then in your results because your are reading lines and not worlds.
Each line have symbol of ending '\n'. So when you are reading world then you have string
'then\n' and len of this string is 5.
with open('output.txt', 'w+') as ofp, open('input.txt', 'r') as ifp:
for line in ifp:
line = line.strip()
if len(line) > 8:
line = line[:8]
elif len(line) <= 4:
continue
ofp.write(line + '\n')
There is a problem in the mapper.py file when I run it in the cluster. The error is " unexpected syntax before line" in "strl = line.strip()".
There is no error when I test it locally. I want to get the words of text file stored and change their format and count them and send to the output in s3 bucket.
Guidance most welcome. Thanks
mapper:
import sys, re
for line in sys.stdin:
strl = line.strip()
words = strl.split()
for word in words:
word = word.lower()
result = ""
charref = re.compile("[a-f]")
match = charref.search(word[0])
if match:
result+= "TR2234J"
else:
result+= ""
print result, "\t"
reducer:
import sys
for line in sys.stdin:
line = line.strip()
new_word =""
words = line.split("\t")
final_count = len(words)
my_num = final_count / 6
for i in range (my_num):
new_word = "".join(words[i*6:10+(i*6)])
print new_word, "\t"