If there is a character that is not in my key list, such as "X", how do I avoid it and continue without doing nothing to it? I am getting KeyError 'X' , cause there is a X in my sequence that I am looking at.
keys = ["A", "C", "D", "E"]
for char in keys:
counts[char] = 0
for line in gpcr:
if line.startswith(">"):
line = line.replace(' ','')
header = line.split()
number = header[0].split('|')
print "Id:",number[2]
continue
fo.write(number[2])
fo.write('\n')
for char in line.strip():
if char
counts[char] += 1
total = float(sum(counts.values()))
toReturn = ''
for key in keys:
aa_per = (counts[key]/total)*100
toReturn = toReturn + '%.2f'%aa_per + '%'+ '\t'
fo.write(number[1])
fo.write('\n')
fo.write(''.join(str(x) for x in toReturn))
fo.write('\n')
print toReturn
fo.close()
I am slightly confused by your question. I guess the problematic line is
aa_per = (counts[key]/total)*100
You can check for a KeyError by using a try block:
try:
aa_per = (counts[key]/total)*100
except KeyError:
aa_per = 0
I guess if the key doesn't occur the percentage should be 0 here.
In general try blocks are a powerful tool to check for exceptions or warnings. See also herehttps://docs.python.org/3/tutorial/errors.html
If you still want to count the number of occurrences of this X character, you can use defaultdict
from collections import defaultdict
counts = defaultdict(int)
counts will be an instance of a dictionary that instead of raising a KeyError, will return 0 if the key does not exist. This way you will be able to avoid the dictionary initialization altogether.
Update:
If you want to use if-else, I think it should be enough to do:
for char in line.strip():
if char in keys:
counts[char] += 1
Related
I am using regexpi to find a string in a phrase. But I also encountered with something different which I never intended.
Let's say the words I need to find are anandalak and nandaki.
str1 = {'anandalak'};
str2 = {'nanda'};
button = {'nanda'};
Both of the following return me logical 1:
~cellfun('isempty',regexpi(str1,button))
~cellfun('isempty',regexpi(str2,button))
How can I avoid this? I need logical 0 in first case and logical 1 in the second.
You probably need to use the word-boundaries(\<\>) in order to get the match which you require.
You may try:
str1 = {'anandalak'}
str2 = {'nanda'}
button = {'\<nanda\>'} % Notice this
~cellfun(#isempty,regexpi(str1,button)) % Returns ans = 0 No match
~cellfun(#isempty,regexpi(str2,button)) % Return ans = 1 Exact match
You can find the sample run result of the above implementation in here.
First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')
I am writing a program that produce a frequency plot of the letters in a body of text. however, there is an error in my code that I can not spot it. any ideas?
def letter_count(word,freqs,pmarks):
for char in word:
freqs[char]+=1
def letter_freq(fname):
fhand = open(fname)
freqs = dict()
alpha = list(string.uppercase[:26])
for let in alpha: freqs[let] = freqs.get(let,0)
for line in fhand:
line = line.rstrip()
words = line.split()
pmarks = list(string.punctuation)
words = [word.upper() for word in words]
for word in words:
letter_count(word,freqs,pmarks)
fhand.close()
return freqs.values
You are calling
freqs[char]+=1
with char = '.' without having initialized a value freqs['.']=0
You should check before line 3, whether the key exists already, as you can do the +=1 operation only on existing keys of the dictionary.
So something like:
for char in word:
if freqs.has_key(char):
freqs[char]+=1
Python: how can I check if the key of an dictionary exists?
I am trying to extract all the numbers from a string composed of digits, symbols and letters.
If the numbers are multi-digit, I have to extract them as multidigit (e.g. from "shsgd89shs2011%%5swts"), I have to pull the numbers out as they appear (89, 2011 and 5).
So far what I have done just loops through and returns all the numbers incrementally, which I like but I cannot figure out how to make it stop
after finishing with one set of digits:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
else:
continue
nums.append(number)
return nums
Running this value: "6wtwyw66hgsgs" returns ['6', '66', '666']
w
hat simple way is there of breaking out of the loop once I have gotten what I needed?
Using your function, just use a temp variable to concat each sequence of digits, yielding the groups each time you encounter a non-digit if the temp variable is not an empty string:
def string_things(strng):
temp = ""
for ele in strng:
if ele.isdigit():
temp += ele
elif temp: # if we have a sequence
yield temp
temp = "" # reset temp
if temp: # catch ending sequence
yield temp
Output
In [9]: s = "shsgd89shs2011%%5swts"
In [10]: list(string_things(s))
Out[10]: ['89', '2011', '5']
In [11]: s ="67gobbledegook95"
In [12]: list(string_things(s))
Out[12]: ['67', '95']
Or you could translate the string replacing letters and punctuation with spaces then split:
from string import ascii_letters, punctuation, maketrans
s = "shsgd89shs2011%%5swts"
replace = ascii_letters+punctuation
tbl = maketrans(replace," " * len(replace))
print(s.translate(tbl).split())
['89', '2011', '5']
L2 = []
file_Name1 = 'shsgd89shs2011%%5swts'
from itertools import groupby
for k,g in groupby(file_Name1, str.isdigit):
a = list(g)
if k == 1:
L2.append("".join(a))
print(L2)
Result ['89', '2011', '5']
Updated to account for trailing numbers:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
if each == len(strng)-1:
if number != '':
nums.append(number)
if each != 0:
if strng[each].isdigit() == False:
if strng[each-1].isdigit():
nums.append(number)
number = ""
continue;
return nums
print StringThings("shsgd89shs2011%%5swts34");
// returns ['89', '2011', '5', '34']
So, when we reach a character which is not a number, and if the previously observed character was a number, append the contents of number to nums and then simply empty our temporary container number, to avoid it containing all the old stuff.
Note, I don't know Python so the solution may not be very pythonic.
Alternatively, save yourself all the work and just do:
import re
print re.findall(r'\d+', 'shsgd89shs2011%%5swts');
Hello everybody I am new to python and need to write a program to eliminate punctuation then count the number of words in a string. So I have this:
import sys
import string
def removepun(txt):
for punct in string.punctuation:
txt = txt.replace(punct,"")
print txt
mywords = {}
for i in range(len(txt)):
item = txt[i]
count = txt.count(item)
mywords[item] = count
return sorted(mywords.items(), key = lambda item: item[1], reverse=True)
The problem is it returns back letters and counts them and not words as I hoped. Can you help me in this matter?
How about this?
>>> import string
>>> from collections import Counter
>>> s = 'One, two; three! four: five. six##$,.!'
>>> occurrence = Counter(s.translate(None, string.punctuation).split())
>>> print occurrence
Counter({'six': 1, 'three': 1, 'two': 1, 'four': 1, 'five': 1, 'One': 1})
after removing the punctuation
numberOfWords = len(txt.split(" "))
Assuming one space between words
EDIT:
a={}
for w in txt.split(" "):
if w in a:
a[w] += 1
else:
a[w] = 1
how it works
a is set to be a dict
the words in txt are iterated
if there is an entry already for dict a[w] then add one to it
if there is no entry then set one up, initialized to 1
output is the same as Haidro's excellent answer, a dict with keys of the words and values of the count of each word