Python - Check If string Is In bigger String - regex

I'm working with Python v2.7, and I'm trying to find out if you can tell if a word is in a string.
If for example i have a string and the word i want to find:
str = "ask and asked, ask are different ask. ask"
word = "ask"
How should i code so that i know that the result i obtain doesn't include words that are part of other words. In the example above i want all the "ask" except the one "asked".
I have tried with the following code but it doesn't work:
def exact_Match(str1, word):
match = re.findall(r"\\b" + word + "\\b",str1, re.I)
if len(match) > 0:
return True
return False
Can someone please explain how can i do it?

You can use the following function :
>>> test_str = "ask and asked, ask are different ask. ask"
>>> word = "ask"
>>> def finder(s,w):
... return re.findall(r'\b{}\b'.format(w),s,re.U)
...
>>> finder(text_str,word)
['ask', 'ask', 'ask', 'ask']
Note that you need \b for boundary regex!
Or you can use the following function to return the indices of words :
in splitted string :
>>> def finder(s,w):
... return [i for i,j in enumerate(re.findall(r'\b\w+\b',s,re.U)) if j==w]
...
>>> finder(test_str,word)
[0, 3, 6, 7]

Related

How to limit the consecutive occurrences of elements in string using python regular expression?

I have string with repetition of characters. If I want to limit the repetition what would be my pattern?
e.g.
suppose my string is "aaaaajkefhejkffdddddrhigjlkglhhhh" .
I want a output aajkefhejkffddrhigjlkglhh.MOre than 4 consecutive repetitions
should be replaced by two occurrences.
I tried the below piece of pattern.
str1=re.sub(r'(\w)\1+',r'\1{2}',str1)
str1="aaaaajkefhejkffdddddrhigjlkglhhhh"
import re
str1=re.sub(r'(\w)\1+',r'\1{2}',str1)
print (str1)
I expect the output "a{2}jkefhejkf{2}d{2}rhigjlkglh{2}" but the actual output is "aajkefhejkffddrhigjlkglhh"
Try:
import re
str1="aaaaajkefhejkffdddddrhigjlkglhhhhzzz"
str1=re.sub(r'(.)\1{3,}', r"\1\1",str1)
print(str1)
Output:
aajkefhejkffddrhigjlkglhhzzz
input = "aaaaajkefhejkffdddddrhigjlkglhhhh"
def duplicate_character_shortener(input,length):
index = 0
while index<len(input)-length+1:
if input[index:index+length]==input[index+1:index+length+1]:
input = input[:index] + input[index+1:]
else:
index+=1
return input
output = duplicate_character_shortener(input,2)
print(output)
>>> aajkefhejkffddrhigjlkglhh

Text processing to get if else type condition from a string

First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')

Python 2.7 RE Search by condition

When I am using re.search, I have some problem.
For example:
a = '<span class="chapternum">1 </span>abc,def.</span>'
How can I search the number '1'?
Or how to search by matching digit start with ">" and end with writespace?
I tried:
test = re.search('(^>)(\d+)(\s$)', a)
print test
>> []
It is fail to get the number "1"
^ and $ indicate the beginning and the end of the string. If you get rid of them you have your answer:
>>> test = re.search('(>)(\d+)(\s)', a)
>>> test.groups()
('>', '1', ' ')
Not sure that you need the first and last groups though (capturing with parenthesis):
>>> a = '<span class="chapternum">23 </span>abc,def.</span>'
>>> test = re.search('>(\d+)\s', a)
>>> test.group(1)
'23'

Python letter swapping

I'm making a program that scrambles words for fun and I've hit a roadblock. I am attempting to switch all the letters in a string and I'm not quite sure how to go about it (hello = ifmmp). I've looked all around and haven't been able to find any answers to this specific question. Any help would be great!
You want a simple randomized cypher? The following will work for all lowercase inputs, and can easily be extended.
import random
import string
swapped = list(string.lowercase)
random.shuffle(swapped)
cipher = string.maketrans(string.lowercase, ''.join(swapped))
def change(val):
return string.translate(val, cipher)
You can probably modify this example to achieve what you need. Here every vowel in a string is replaced by its vowel position:
from string import maketrans # Required to call maketrans function.
intab = "aeiou"
outtab = "12345"
trantab = maketrans(intab, outtab)
str = "this is string example....wow!!!";
print str.translate(trantab);
# this is the output
"th3s 3s str3ng 2x1mpl2....w4w!!!"
Try maketrans in combination with the string.translate function. This code removes letters from your word from the letters you are scrambling with first. If you just want lowercase only use string.lowercase instead of string.letters.
>>> import string, random
>>> letters = list(string.letters)
>>> random.shuffle(letters)
>>> letters = "".join(letters)
>>> word = 'hello'
>>> for letter in word:
... letters = letters.replace(letter, '')
...
>>> transtab = string.maketrans(word, letters[:len(word)])
>>> print word.translate(transtab)
XQEEN
The "scrambling" you appear to be after is called Caesar's cipher, with a right shift of 1. The following Python will achieve what you're after:
def caesar(str):
from string import maketrans
fromalpha = "abcdefghijklmnopqrstuvwxyz"
# Move the last 1 chars to the start of the string
toalpha = fromalpha[1:] + fromalpha[:1]
# Make it work with capital letters
fromalpha += fromalpha.upper()
toalpha += toalpha.upper()
x = maketrans(fromalpha, toalpha)
return str.translate(x)
If you're interested in the general case, this function will do the job. (Note that it is conventional to express Caesar ciphers in terms of left shifts, rather than right.)
def caesar(str, lshift):
from string import maketrans
fromalpha = "abcdefghijklmnopqrstuvwxyz"
toalpha = fromalpha[-lshift:] + fromalpha[:-lshift]
fromalpha += fromalpha.upper()
toalpha += toalpha.upper()
x = maketrans(fromalpha, toalpha)
return str.translate(x)

removing punctuation then counting the no of every word occurance using python

Hello everybody I am new to python and need to write a program to eliminate punctuation then count the number of words in a string. So I have this:
import sys
import string
def removepun(txt):
for punct in string.punctuation:
txt = txt.replace(punct,"")
print txt
mywords = {}
for i in range(len(txt)):
item = txt[i]
count = txt.count(item)
mywords[item] = count
return sorted(mywords.items(), key = lambda item: item[1], reverse=True)
The problem is it returns back letters and counts them and not words as I hoped. Can you help me in this matter?
How about this?
>>> import string
>>> from collections import Counter
>>> s = 'One, two; three! four: five. six##$,.!'
>>> occurrence = Counter(s.translate(None, string.punctuation).split())
>>> print occurrence
Counter({'six': 1, 'three': 1, 'two': 1, 'four': 1, 'five': 1, 'One': 1})
after removing the punctuation
numberOfWords = len(txt.split(" "))
Assuming one space between words
EDIT:
a={}
for w in txt.split(" "):
if w in a:
a[w] += 1
else:
a[w] = 1
how it works
a is set to be a dict
the words in txt are iterated
if there is an entry already for dict a[w] then add one to it
if there is no entry then set one up, initialized to 1
output is the same as Haidro's excellent answer, a dict with keys of the words and values of the count of each word