Extracting Numbers from a String Without Regular Expressions - python-2.7

I am trying to extract all the numbers from a string composed of digits, symbols and letters.
If the numbers are multi-digit, I have to extract them as multidigit (e.g. from "shsgd89shs2011%%5swts"), I have to pull the numbers out as they appear (89, 2011 and 5).
So far what I have done just loops through and returns all the numbers incrementally, which I like but I cannot figure out how to make it stop
after finishing with one set of digits:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
else:
continue
nums.append(number)
return nums
Running this value: "6wtwyw66hgsgs" returns ['6', '66', '666']
w
hat simple way is there of breaking out of the loop once I have gotten what I needed?

Using your function, just use a temp variable to concat each sequence of digits, yielding the groups each time you encounter a non-digit if the temp variable is not an empty string:
def string_things(strng):
temp = ""
for ele in strng:
if ele.isdigit():
temp += ele
elif temp: # if we have a sequence
yield temp
temp = "" # reset temp
if temp: # catch ending sequence
yield temp
Output
In [9]: s = "shsgd89shs2011%%5swts"
In [10]: list(string_things(s))
Out[10]: ['89', '2011', '5']
In [11]: s ="67gobbledegook95"
In [12]: list(string_things(s))
Out[12]: ['67', '95']
Or you could translate the string replacing letters and punctuation with spaces then split:
from string import ascii_letters, punctuation, maketrans
s = "shsgd89shs2011%%5swts"
replace = ascii_letters+punctuation
tbl = maketrans(replace," " * len(replace))
print(s.translate(tbl).split())
['89', '2011', '5']

L2 = []
file_Name1 = 'shsgd89shs2011%%5swts'
from itertools import groupby
for k,g in groupby(file_Name1, str.isdigit):
a = list(g)
if k == 1:
L2.append("".join(a))
print(L2)
Result ['89', '2011', '5']

Updated to account for trailing numbers:
def StringThings(strng):
nums = []
number = ""
for each in range(len(strng)):
if strng[each].isdigit():
number += strng[each]
if each == len(strng)-1:
if number != '':
nums.append(number)
if each != 0:
if strng[each].isdigit() == False:
if strng[each-1].isdigit():
nums.append(number)
number = ""
continue;
return nums
print StringThings("shsgd89shs2011%%5swts34");
// returns ['89', '2011', '5', '34']
So, when we reach a character which is not a number, and if the previously observed character was a number, append the contents of number to nums and then simply empty our temporary container number, to avoid it containing all the old stuff.
Note, I don't know Python so the solution may not be very pythonic.
Alternatively, save yourself all the work and just do:
import re
print re.findall(r'\d+', 'shsgd89shs2011%%5swts');

Related

Text processing to get if else type condition from a string

First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')

Python 2.7 RE Search by condition

When I am using re.search, I have some problem.
For example:
a = '<span class="chapternum">1 </span>abc,def.</span>'
How can I search the number '1'?
Or how to search by matching digit start with ">" and end with writespace?
I tried:
test = re.search('(^>)(\d+)(\s$)', a)
print test
>> []
It is fail to get the number "1"
^ and $ indicate the beginning and the end of the string. If you get rid of them you have your answer:
>>> test = re.search('(>)(\d+)(\s)', a)
>>> test.groups()
('>', '1', ' ')
Not sure that you need the first and last groups though (capturing with parenthesis):
>>> a = '<span class="chapternum">23 </span>abc,def.</span>'
>>> test = re.search('>(\d+)\s', a)
>>> test.group(1)
'23'

How to get a letter from an element in a list

So I have some homework where it says that I have a list and if there is a vowel in the beginning and the last letter in each element of a list, I have to get those vowels in a string. So for example:
["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
turns into
oeaei
So far I have this code:
vowels = "aeiou"
new_list = []
for words in a_list:
a_list = [words.lower() for words in a_list]
for letters in vowels:
if a_list[0] == vowels or a_list[-1] == vowels:
new_list += a_list[vowels]
return new_list
But I get the error
[]
[]
Traceback (most recent call last):
File "C:\Users\Miraj\Desktop\Q3.py", line 27, in <module>
test_get_first_last_vowels()
File "C:\Users\Miraj\Desktop\Q3.py", line 24, in test_get_first_last_vowels
print(get_first_last_vowels([]))
File "C:\Users\Miraj\Desktop\Q3.py", line 17, in get_first_last_vowels
if a_list[0] == vowels or a_list[-1] == vowels:
IndexError: list index out of range
So can I get some help where I'm going wrong. Thank you.
Okay
a_list = ["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
You could try this:
a_list = ["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
def start_end_vowels(a_list):
vowels = "aeiou"
result = ""
for words in a_list:
words = words.lower()
for vowel in vowels:
if len(words) == 1:
if words == vowel:
result += vowel
else:
if words.startswith(vowel):
result += vowel
if words.endswith(vowel):
result += vowel
return result
# Output
>>> a_list = ["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
>>> start_end_vowels(a_list)
'oeaei'
>>> a_list = ["Abba"]
>>> start_end_vowels(a_list)
'aa'
This does work for your example, but I would double check with other test cases to be sure. It's good to know how to do this sort of question different ways.
Update: edited it to work for cases where starting vowel and ending vowel are the same.
You are working on an empty list. Further, this will never work:
>>> new_list += a_list[vowels]
TypeError: list indices must be integers, not str
Since vowels is a string, not an integer. You want to use append().
You are also checking the wrong condition:
if a_list[0] == vowels or a_list[-1] == vowels:
Should be:
if a_list[0] == letters or a_list[-1] == letters:
This needs to be executed for every word in a_list, so make sure it is inside the loop and not stand-alone.
This works:
def find_vowels(a_list):
vowels = set('aeiou')
res = []
for word in a_list:
if not word:
continue
word = word.lower()
if word[0] in vowels:
res.append(word[0])
if len(word) > 1 and word[-1] in vowels:
res.append(word[-1])
return ''.join(res)
Now:
>>> a_list = ["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
>>> find_vowels(a_list)
'oeaei'
Your code has several flaws in the light of what you are trying to do , like you need not write
for words in a_list:
when you have written
a_list = [words.lower() for words in a_list]
the looping over all words is done by the second line alone.
Also when you say
if a_list[0] == vowels or a_list[-1] == vowels:
then a_list[0] or a_list[-1] is matched with the whole string 'aeiou', which will never be true. You need to match a_list[0] or a_list[-1] with individual vowels.
And lastly as #Idos said,
new_list += a_list[vowels]
this line is not going to work.
So I have written a fresh code taking all these into account and also considering the special case, if the word is a singe letter one. The code is given below
a_list = ["Roberto", "Jessie", "A", "Geoffrey", "Eli"]
vowels = ['a','e','i','o','u']
new_list = []
for word in a_list:
if len(word)>=2:
if word[0].lower() in vowels:
new_list.append(word[0].lower())
if word[-1].lower() in vowels:
new_list.append(word[-1].lower())
elif len(word)==1:
if word.lower() in vowels:
new_list.append(word.lower())
print (''.join(new_list))
You can employ listcomps or genexps to write a succinct program as follows. It has 3 lines and reads like English.
vowels = 'aeiou'
# collect
groups = (word if len(word) == 1 else (word[0], word[-1]) for word in words)
# flatten and filter
chars = (char for group in groups for char in group if char.lower() in vowels)
# consume the iterator
''.join(chars) # 'oeAEi'

Python letter swapping

I'm making a program that scrambles words for fun and I've hit a roadblock. I am attempting to switch all the letters in a string and I'm not quite sure how to go about it (hello = ifmmp). I've looked all around and haven't been able to find any answers to this specific question. Any help would be great!
You want a simple randomized cypher? The following will work for all lowercase inputs, and can easily be extended.
import random
import string
swapped = list(string.lowercase)
random.shuffle(swapped)
cipher = string.maketrans(string.lowercase, ''.join(swapped))
def change(val):
return string.translate(val, cipher)
You can probably modify this example to achieve what you need. Here every vowel in a string is replaced by its vowel position:
from string import maketrans # Required to call maketrans function.
intab = "aeiou"
outtab = "12345"
trantab = maketrans(intab, outtab)
str = "this is string example....wow!!!";
print str.translate(trantab);
# this is the output
"th3s 3s str3ng 2x1mpl2....w4w!!!"
Try maketrans in combination with the string.translate function. This code removes letters from your word from the letters you are scrambling with first. If you just want lowercase only use string.lowercase instead of string.letters.
>>> import string, random
>>> letters = list(string.letters)
>>> random.shuffle(letters)
>>> letters = "".join(letters)
>>> word = 'hello'
>>> for letter in word:
... letters = letters.replace(letter, '')
...
>>> transtab = string.maketrans(word, letters[:len(word)])
>>> print word.translate(transtab)
XQEEN
The "scrambling" you appear to be after is called Caesar's cipher, with a right shift of 1. The following Python will achieve what you're after:
def caesar(str):
from string import maketrans
fromalpha = "abcdefghijklmnopqrstuvwxyz"
# Move the last 1 chars to the start of the string
toalpha = fromalpha[1:] + fromalpha[:1]
# Make it work with capital letters
fromalpha += fromalpha.upper()
toalpha += toalpha.upper()
x = maketrans(fromalpha, toalpha)
return str.translate(x)
If you're interested in the general case, this function will do the job. (Note that it is conventional to express Caesar ciphers in terms of left shifts, rather than right.)
def caesar(str, lshift):
from string import maketrans
fromalpha = "abcdefghijklmnopqrstuvwxyz"
toalpha = fromalpha[-lshift:] + fromalpha[:-lshift]
fromalpha += fromalpha.upper()
toalpha += toalpha.upper()
x = maketrans(fromalpha, toalpha)
return str.translate(x)

removing punctuation then counting the no of every word occurance using python

Hello everybody I am new to python and need to write a program to eliminate punctuation then count the number of words in a string. So I have this:
import sys
import string
def removepun(txt):
for punct in string.punctuation:
txt = txt.replace(punct,"")
print txt
mywords = {}
for i in range(len(txt)):
item = txt[i]
count = txt.count(item)
mywords[item] = count
return sorted(mywords.items(), key = lambda item: item[1], reverse=True)
The problem is it returns back letters and counts them and not words as I hoped. Can you help me in this matter?
How about this?
>>> import string
>>> from collections import Counter
>>> s = 'One, two; three! four: five. six##$,.!'
>>> occurrence = Counter(s.translate(None, string.punctuation).split())
>>> print occurrence
Counter({'six': 1, 'three': 1, 'two': 1, 'four': 1, 'five': 1, 'One': 1})
after removing the punctuation
numberOfWords = len(txt.split(" "))
Assuming one space between words
EDIT:
a={}
for w in txt.split(" "):
if w in a:
a[w] += 1
else:
a[w] = 1
how it works
a is set to be a dict
the words in txt are iterated
if there is an entry already for dict a[w] then add one to it
if there is no entry then set one up, initialized to 1
output is the same as Haidro's excellent answer, a dict with keys of the words and values of the count of each word