extracting indexes regex, with python - regex

I am trying to get certain strings and their indexes.But I always get only the index of the first one. Could anyone maybe tell me what am I doing wrong?
Thnks
here is what I have written:
import re
f = open("topology_seq.txt")
strToSearch = ""
for line in f:
strToSearch += line
patFinder = re.compile("I(L|p)")
findpattern = re.search(patFinder, strToSearch)
findpattern1 = re.findall(patFinder, strToSearch)
for i in findpattern1:
print(i),
print (findpattern.end())
Output:
L
143
p
143
L
143

You can use the finditer method.
itr = re.finditer(patFinder, strToSearch)
indexes = [match.start(0) for match in itr]
re.finditer(pattern, string, flags=0)
Return an iterator yielding MatchObject instances over all non-overlapping >matches for the RE pattern in string

Related

Text processing to get if else type condition from a string

First of all, I am sorry about the weird question heading. Couldn't express it in one line.
So, the problem statement is,
If I am given the following string --
"('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
I have to parse it as
list1 = ["'James Gosling'", 'jamesgosling', 'jame gosling']
list2 = ["'SUN Microsystem'", 'sunmicrosystem']
list3 = [ list1, list2, keyword]
So that, if I enter James Gosling Sun Microsystem keyword it should tell me that what I have entered is 100% correct
And if I enter J Gosling Sun Microsystem keyword it should say i am only 66.66% correct.
This is what I have tried so far.
import re
def main():
print("starting")
sentence = "('James Gosling'/jamesgosling/jame gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
splited = sentence.split(",")
number_of_primary_keywords = len(splited)
#print(number_of_primary_keywords, "primary keywords length")
number_of_brackets = 0
inside_quotes = ''
inside_quotes_1 = ''
inside_brackets = ''
for n in range(len(splited)):
#print(len(re.findall('\w+', splited[n])), "length of splitted")
inside_brackets = splited[n][splited[n].find("(") + 1: splited[n].find(")")]
synonyms = inside_brackets.split("/")
for x in range(len(synonyms)):
try:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
print(inside_quotes_1)
except:
pass
try:
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
print(inside_quotes)
except:
pass
#print(synonyms[x])
number_of_brackets += 1
print(number_of_brackets)
if __name__ == '__main__':
main()
Output is as follows
'James Gosling
jamesgoslin
jame goslin
'SUN Microsystem
SUN Microsystem
sunmicrosyste
sunmicrosyste
3
As you can see, the last letters of some words are missing.
So, if you read this far, I hope you can help me in getting the expected output
Unfortunately, your code has a logic issue that I could not figure it out, however there might be in these lines:
inside_quotes_1 = synonyms[x][synonyms[x].find("\"") + 1: synonyms[n].find("\"")]
inside_quotes = synonyms[x][synonyms[x].find("'") + 1: synonyms[n].find("'")]
which by the way you can simply use:
inside_quotes_1 = synonyms[x][synonyms[x].find("\x22") + 1: synonyms[n].find("\x22")]
inside_quotes = synonyms[x][synonyms[x].find("\x27") + 1: synonyms[n].find("\x27")]
Other than that, you seem to want to extract the words with their indices, which you can extract them using a basic expression:
(\w+)
Then, you might want to find a simple way to locate the indices, where the words are. Then, associate each word to the desired indices.
Example Test
# -*- coding: UTF-8 -*-
import re
string = "('James Gosling'/jamesgosling/james gosling) , ('SUN Microsystem'/sunmicrosystem), keyword"
expression = r'(\w+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')

Return first instance of capturing group if found, otherwise empty string

My inputs are strings that may or may not contain a pattern:
p = '(\d)'
s = 'abcd3f'
I want to return the capturing group for the first match of this pattern if it is found, and an empty string otherwise.
result = re.search(p, s)[1]
Will return the first match. But if s = 'abcdef' then search will return None and the indexing will throw an exception. Instead of doing that, I'd like it to just return an empty string. I can do:
g = re.search(p, s)
result = ''
if len(g) > 0: result = g[1]
Or even:
try:
result = re.search(p, s)[1]
except:
result = ''
But these both seem pretty complicated for something so simple. Is there a more elegant way of accomplishing what I want, preferably in one line?
You could use if YourString is None: to accomplish that. For example:
if s is None : s = ''
Example for Python:
import re
m = re.search('(\d)', 'ab1cdf')
if m is None : m = ''
print m.group(1)

How to count words with one syllable in a list of strings of one word using regular expressions

I'm trying to count the number of words, in a pretty long text, that have one syllable. This was defined as words that have zero or more consonants followed by 1 or more vowels followed by zero or more consonants.
The text has been lowercased and split into a list of strings of single words. Yet everytime I try to use RE's to get the count I get an error because the object is a list and not a string.
How would I do this in a list?
f = open('pg36.txt')
war = f.read()
warlow = war.lower()
warsplit = warlow.split()
import re
def syllables():
count = len(re.findall('[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*', warsplit))
return count
print (count)
syllables()
Because you're trying to use findall function against the list not the string, since findall works only against the string . So you could try the below.
import re
f = open('file')
war = f.read()
warlow = war.lower()
warsplit = warlow.split()
def syllables():
count = 0
for i in warsplit:
if re.match(r'^[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*$', i):
count += 1
return count
print syllables()
f.close()
OR
Use findall function directly on warlow variable.
import re
f = open('file')
war = f.read()
warlow = war.lower()
print len(re.findall(r'(?<!\S)[bcdfghjklmnpqrstvwxyz]*[aeiou]+[bcdfghjklmnpqrstvwxyz]*(?!\S)', warlow))
f.close()
Try this regex instead:
^[^aeiouAEIOU]*[aeiouAEIOU]+[^aeiouAEIOU]*$

Use Python regular expression to extract special strings

Given strings like:
str = '12-1 abcd fadf adfad'
I want to get 12-1. How can you do it in python?
I'm using the following code, but does not work.
m = re.search('.*(\number+-\number+).*', str)
if m:
found = m.group(0)
print found
Try:
import re
str = '12-1 abcd fadf adfad'
m = re.search('(\d+-\d+)', str)
if m:
found = m.group(0)
print found

using input() with regular expressions in python

Is it possible to use an input() with regex
I've written something like this
import re
words = ['cats', 'cates', 'dog', 'ship']
for l in words:
m = re.search( r'cat..', l)
if m:
print l
else:
print 'none'
this will return 'cates'
But now I want to be able to use my own input() in ' m = re.search( r'cat..', l) '
something like
import re
words = ['cats', 'cates', 'dog', 'ship']
target = input()
for l in words:
m = re.search( r'target..', l)
if m:
print l
else:
print 'none'
this doesn't work of course (I know it will search for the word 'target' and not for the input()).
Is there a way to do this or are'nt regular expressions not the solution for my problem?
You could construct the RegEx dynamically:
target = raw_input() # use raw_input() to avoid automatically eval()-ing.
rx = re.compile(re.escape(target) + '..')
# use re.escape() to escape special characters.
for l in words:
m = rx.search(l)
....
But it is also possible without RegEx:
target = raw_input()
for l in words:
if l[:-2] == target:
print l
else:
print 'none'