Python: count frequency of words in a txt file

Python: count frequency of words in a txt file - list

I am required to count the frequency of the key words from a text file. I am not allowed to use dictionaries or sets, and I also cannot import any Python methods. I honestly cannot figure out how to do it!!
This is how its supposed to display:
car 4
dog 4
egg 3
here's what i have so far, and it absolutely does not work.
fname = input("enter file name:")
ifile = open(fname, 'r')
list1 = ['car', 'dog', 'cat'.....'ect']
list2 = []
for word in ifile:
if word in list1:
list2.index(word)[1] += 1
else:
list2.append([word,])
print(list2,)

I played with this a little... I noticed I had to enter file name in quotes for some reason.
fname = input('enter file name:')
ifile = open(fname, 'r')
list1 = []
list2 = []
for line in ifile.readlines():
for word in line.split(' '):
word = word.strip()
if word in list1:
list2[list1.index(word)] += 1
else:
list1.append(word)
list2.append(1)
for item in list1:
print item, list2[list1.index(item)]

Given you can't use any set/list structures why not use another string and write the unique words encountered, incrementing on existence. Pseudocode:
create empty string for storage
parse and extract words
iterate
check word against string (if exists: increment / ifnot exists: add and set count to 1)
output string

Related

AttributeError: 'dict' object has no attribute 'append' on line 9?

Q.)8.4 Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
this code is giving AttributeError: 'dict' object has no attribute 'append' on line 9
fname = input("Enter file name: ")
fh = open(fname)
lst = {}
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(sorted(lst))

Python dictionary has no append method.
Append is used in the list (array) in Python. Make lst a list, not a dictionary. I have made a minor change in your code below, changing
lst = {} #creation of an empty dictionary
to
lst = [] #creation of an empty list
The full code:
fname = input("Enter file name: ")
fh = open(fname)
lst = []
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(sorted(lst))

rstrip, split and sort a list from input text file

I am new with python. I am trying to rstrip space, split and append the list into words and than sort by alphabetical order. I don’t what I am doing wrong.
fname = input("Enter file name: ")
fh = open(fname)
lst = list(fh)
for line in lst:
line = line.rstrip()
y = line.split()
i = lst.append()
k = y.sort()
print y

I have been able to fix my code and the expected result output.
This is what I was hoping to code:
name = input('Enter file: ')
handle = open(name, 'r')
wordlist = list()
for line in handle:
words = line.split()
for word in words:
if word in wordlist: continue
wordlist.append(word)
wordlist.sort()
print(wordlist)

If you are using python 2.7, I believe you need to use raw_input() in Python 3.X is correct to use input(). Also, you are not using correctly append(), Append is a method used for lists.
fname = raw_input("Enter filename: ") # Stores the filename given by the user input
fh = open(fname,"r") # Here we are adding 'r' as the file is opened as read mode
lines = fh.readlines() # This will create a list of the lines from the file
# Sort the lines alphabetically
lines.sort()
# Rstrip each line of the lines liss
y = [l.rstrip() for l in lines]
# Print out the result
print y

How can i correctly print out this dictionary in a way i have each word sorted by the number of times(frequency) in the text?

How can i correctly print out this dictionary in a way i have each word sorted by the number of times(frequency) in the text?
slova = dict()
for line in text:
line = re.split('[^a-z]',text)
line[i] = filter(None,line)
i =+ 1
i = 0
for line in text:
for word in line:
if word not in slova:
slova[word] = i
i += 1

I'm not sure what your text looks like, and you also haven't provided example output, but here is what my guess is. If this doesn't help please update your question and I'll try again. The code makes use of Counter from collections to do all the heavy lifting. First all of the words in all of the lines of the text are flattened to a single list, then this list is simply passed to Counter. The keys of the Counter (the words) are then sorted by their counts and printed out.
CODE:
from collections import Counter
import re
text = ['hello hi hello yes hello',
'hello hi hello yes hello']
all_words = [w for l in text for w in re.split('[^a-z]',l)]
word_counts = Counter(all_words)
sorted_words = sorted(word_counts.keys(),
key=lambda k: word_counts[k],
reverse = True)
#Print out the word and counts
for word in sorted_words:
print word,word_counts[word]
OUTPUT:
hello 6
yes 2
hi 2

the code for counting frequency plot of letters in abody text in python

I am writing a program that produce a frequency plot of the letters in a body of text. however, there is an error in my code that I can not spot it. any ideas?
def letter_count(word,freqs,pmarks):
for char in word:
freqs[char]+=1
def letter_freq(fname):
fhand = open(fname)
freqs = dict()
alpha = list(string.uppercase[:26])
for let in alpha: freqs[let] = freqs.get(let,0)
for line in fhand:
line = line.rstrip()
words = line.split()
pmarks = list(string.punctuation)
words = [word.upper() for word in words]
for word in words:
letter_count(word,freqs,pmarks)
fhand.close()
return freqs.values

You are calling
freqs[char]+=1
with char = '.' without having initialized a value freqs['.']=0
You should check before line 3, whether the key exists already, as you can do the +=1 operation only on existing keys of the dictionary.
So something like:
for char in word:
if freqs.has_key(char):
freqs[char]+=1
Python: how can I check if the key of an dictionary exists?

TypeError: unhashable type: 'list' - creating frequency function

I am taking a text file as an input and creating a function that counts which word occurs most frequently. If 2 or more words occur most frequent and are equal I will print all of those words.
def wordOccurance(userFile):
userFile.seek(0)
line = userFile.readline()
lines = []
while line != "":
if line != "\n":
line = line.lower() # making lower case
line = line.rstrip("\n") # cleaning
line = line.rstrip("?") #cleans the whole docoument by removing "?"
line = line.rstrip("!") #cleans the whole docoument by removing "!"
line = line.rstrip(".") #cleans the whole docoument by removing "."
line = line.split(" ") #splits the texts into space
lines.append(line)
line = userFile.readline() # keep reading lines from document.
words = lines
wordDict = {} #creates the clean word Dic, from above
for word in words: #
if word in wordDict.keys():
wordDict[word] = wordDict[word] + 1
else:
wordDict[word] = 1
largest_value = max(wordDict.values())
for k in wordDict.keys():
if wordDict[k] == largest_value:
print(k)
return wordDict
Please help me with this function.

In this line you are creating a list of strings:
line = line.split(" ") #splits the texts into space
Then you append it to a list, so you have a list of lists:
lines.append(line)
Later you loop through that list of lists, and try to use a sublist as a key:
for word in words: #
if word in wordDict.keys():
wordDict[word] = wordDict[word] + 1
else:
wordDict[word] = 1 # Here you will try to assign a list (`word`) as a key, which is not allowed
One easy fix would be to flatten the list of lists first:
words = [item for sublist in lines for item in sublist]
for word in words: #
if word in wordDict.keys():
wordDict[word] = wordDict[word] + 1
else:
wordDict[word] = 1
The list comprehension [item for sublist in lines for item in sublist] will loop through lines, then loop through the sublists created by line.split(" ") and return a new list consisting of the items in each sublist. For you, lines probably looks something like this:
[['words', 'on', 'line', 'one'], ['words', 'on', 'line', 'two']]
The list comprehension will turn it into this:
['words', 'on', 'line', 'one', 'words', 'on', 'line', 'two']
If you would like to use something a little less complicated, you could just use nested loops:
# words = lines
# just use `lines` in your for loop instead of creating an identical list
wordDict = {} #creates the clean word Dic, from above
for line in lines:
for word in line:
if word in wordDict.keys():
wordDict[word] = wordDict[word] + 1
else:
wordDict[word] = 1
largest_value = max(wordDict.values())
This will probably be a little less efficient and/or "Pythonic", but it will probably be easier to wrap your head around.
Also, you may want to consider splitting each line into words before cleaning the data, because if you clean the lines first, you will only remove punctuation at the end of lines rather than at the end of words. However, this might not be necessary depending on the nature of your data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python: count frequency of words in a txt file - list

Related

AttributeError: 'dict' object has no attribute 'append' on line 9?

rstrip, split and sort a list from input text file

How can i correctly print out this dictionary in a way i have each word sorted by the number of times(frequency) in the text?

the code for counting frequency plot of letters in abody text in python

TypeError: unhashable type: 'list' - creating frequency function

Categories

Resources