Regex to find words with one character diff - regex

I have a word dictionary and I'm looking for regex that can help me to get words with only one character diff. For example say for word BIG it could be words BIT, BUG etc. Length of the words should be equal.
Thank you!

/\b([a-z]ig|b[a-z]g|bi[a-z])\b/i
You'd have to do this with every word. Regex alone is probably not the best tool for this job.

Use something like this, perhaps?
>>> def word_difference(word1, word2):
... c1, c2 = list(word1), list(word2)
... return [(i, c1[i], c2[i]) for i in in range(len(c1)) if c1[i] != c2[i]]
>>> word_difference("foo", "bar")
[(0, 'f', 'b'), (1, 'o', 'a'), (2, 'o', 'r')]
>>> word_difference("big", "bug")
[(1, 'i', 'u')]
Obviously, the length of the list returned is the number of characters that are different. I assume this is what you want, since you didn't state whether the characters may be in different positions or not - but that's just as easy, you can use sets.

I found nearly the same solution than the one using ideone.
But, as vkolodrevskiy wrote “to get words with only one character diff“,
I respected it.
My code is in Python. No language precised in the question.
import re
word = 'main'
RE = '|'.join(word[0:i]+'(?!'+char+')[a-z]'+word[i+1:] for i,char in enumerate(word))
RE = '('+RE+')'
print RE
ch = 'the main reason is pain due to rain. hello muin, where is maih ?'
print re.findall(RE,ch)

Well, you could do a bunch of complicated regular expressions, or ingenius ones, but I found something that I wanted to tell you about that may be a lot easier.
Check out the Levenshtein module to get the hamming distance between two strings. Then just get the ones that have a distance of one.
To install you can use pip install python-levenshtein. If you use Ubuntu or such you can use sudo apt-get install python-levenshtein. If you're on Windows, in order to fully utilize pip you'll need a C++ compiler (like Visual C++ 2010 express, if you're using Python 3, or Visual C++ 2008 express for Python 2.x; you can download those for free from Microsoft; do a web search for them if you want them).
import Levenshtein #Note the capital L
help(Levenshtein) #See the documentation
Levenshtein.hamming("cat", "sat") #Returns 1; they must be the same length, as you specified
There are lots of other cool functions besides hamming, though. Read the help (via the help function in the code above). The functions are actually surprisingly well-documented if you use the help function. Press q to quit the help, of course.

finally I did not use idea with regex, my solution looks like:
public boolean diffOneChar(String word1, String word2) {
int diff=0;
if(word1 == null || word2 == null) return false;
if(word1.length() == 0 || word2.length() == 0) return false;
if(word1.length() != word2.length()) return false;
for(int i=0; i<word1.length(); i++) {
if(word1.charAt(i)!=word2.charAt(i))
diff++;
}
return diff == 1;
}

Related

Exact match of string in pandas python

I have a column in data frame which ex df:
A
0 Good to 1. Good communication EI : tathagata.kar#ae.com
1 SAP ECC Project System EI: ram.vaddadi#ae.com
2 EI : ravikumar.swarna Role:SSE Minimum Skill
I have a list of of strings
ls=['tathagata.kar#ae.com','a.kar#ae.com']
Now if i want to filter out
for i in range(len(ls)):
df1=df[df['A'].str.contains(ls[i])
if len(df1.columns!=0):
print ls[i]
I get the output
tathagata.kar#ae.com
a.kar#ae.com
But I need only tathagata.kar#ae.com
How Can It be achieved?
As you can see I've tried str.contains But I need something for extact match
You could simply use ==
string_a == string_b
It should return True if the two strings are equal. But this does not solve your issue.
Edit 2: You should use len(df1.index) instead of len(df1.columns). Indeed, len(df1.columns) will give you the number of columns, and not the number of rows.
Edit 3: After reading your second post, I've understood your problem. The solution you propose could lead to some errors.
For instance, if you have:
ls=['tathagata.kar#ae.com','a.kar#ae.com', 'tathagata.kar#ae.co']
the first and the third element will match str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i])
And this is an unwanted behaviour.
You could add a check on the end of the string: str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')
Like this:
for i in range(len(ls)):
df1 = df[df['A'].str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')]
if len(df1.index != 0):
print (ls[i])
(Remove parenthesis in the "print" if you use python 2.7)
Thanks for the help. But seems like I found a solution that is working as of now.
Must use str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i])
This seems to solve the problem.
Although thanks to #IsaacDj for his help.
Why not just use:
df1 = df[df['A'].[str.match][1](ls[i])
It's the equivalent of regex match.

python hangman code for a beginner

I just started to learn python about a week ago. I tried to create a simple hangman game today. All of my code in this works so far, but there is one thing that I cannot think of how to implement. I want the code to print 'you win' when it the player correctly types 'python', letter by letter. But I cant seem to end it after they get it right. It will end if they type 'python' in one attempt, opposed to letter form. My attempt to do it is on the line with the .join. I can't seem to figure it out though. Any help or advice for a new programmer would be greatly appreciated.
guesses = []
count = 1
ans = 'python'
word = ''
while count < 10:
guess = raw_input('guess a letter: ')
guesses.append(guess)
if ''.join(word) == ans:
print 'you win'
break
elif len(guess) > 1 and ans == guess:
print ans
print 'you win'
break
else:
for char in ans:
if char in guesses:
word.append(char)
print char,
else:
print '_',
count += 1
else:
print '\nyou lose'
First, I want to start off by saying, unless you are dealing with legacy code or some library which you need that only works in 2.7, do not use python 2.7, instead use python 3.x (currently on 3.6). This is because soon 2.7 will be deprecated, and 3.6 + has a lot more features and a lot of QOL improvements to the syntax and language you will appreciate (and has support for functionality that 2.7 just doesn't have now).
With that said, I'll make the translation to 3.6 for you. it barely makes a difference.
guesses = []
count = 1
ans = 'python'
word = ''
while count < 10:
guess = input('guess a letter: ')
guesses.append(guess)
if ''.join(word) == ans:
print('you win')
break
elif len(guess) > 1 and ans == guess:
print(ans)
print('you win')
break
else:
for char in ans:
if char in guesses:
word.append(char)
print(char)
else:
print('_')
count += 1
else:
print('\nyou lose')
The only two changes here are that print now requires parenthesis, so every print 'stuff' is now print('stuff'), and raw_input is now input('input prompt'). Other than that, I'm suprised you were able to get away with word.append(char). You cannot use append() on a python str in either 2.7 or 3.x. I think you were trying to use it as an array, as that is the only reason you would use ''.join(word). To fix this I would do word = [] instead of word = ''. now your ''.join(word) should work properly.
I would advise you to take the next step and try to implement the following things to your program: If the user doesn't enter a single character, make it so that the characters are not added to the guesses list. Try making this a main.py file if you haven't already. Make parts of the program into functions. Add a new game command. Add an actual hangman in chars print out every time. Add file io to read guess words (ie instead of just python, you could add a lot of words inside of a file to chose).

Use list to generate lowercase letter and number in python

I am trying to use a list such as:
[(1,4),(2,2)]
To get:
[(a,4),(b,2)]
I am trying to use 'string.ascii_lowercase' how would I accomplish this in Python3, this is for a coding challenge to get the least characters possible.
Thanks for any help!
I'm not going to solve it for you, but I'd suggest you look into the chr and ord functions. Note that the ASCII code for "a" is 97, so to convert 1 to "a" you would have to add 96 to it.
I would suggest you to follow the suggestion given by B.Eckles above as you would learn better and probably find a shorter (character-wise) solution.
However, if you want to stick with using string.ascii_lowercase, the code snippet below could be useful to start from:
import string
a = [(1,4),(2,2)]
b = []
for (first, second) in a:
b.append(
(string.ascii_lowercase[(first-1) % len(string.ascii_lowercase)],
second))
print b
In this case, the printed solution would be:
[('a', 4), ('b', 2)]
I have inserted the module (i.e. % len(string.ascii_lowercase)) to avoid out-of-bound accesses. Just be careful that the value 0 would produce 'z' in this way.
Hope it helps!

Testing for an item in lists - Python 3

As part of a school project we are creating a trouble shooting program. I have come across a problem that I cannot solve:
begin=['physical','Physical','Software','software',]
answer=input()
if answer in begin[2:3]:
print("k")
software()
if answer in begin[0:1]:
print("hmm")
physical()
When I try to input software/Software no output is created. Can anybody see a hole in my code as it is?
In Python, slice end values are exclusive. You are slicing a smaller list than you think you are:
>>> begin=['physical','Physical','Software','software',]
>>> begin[2:3]
['Software']
>>> begin[0:1]
['physical']
Use begin[2:4] and begin[0:2] or even begin[2:] and begin[:2] to get all elements from the 3rd to the end, and from the start until the 2nd (inclusive):
>>> begin[2:]
['Software', 'software']
>>> begin[2:4]
['Software', 'software']
>>> begin[:2]
['physical', 'Physical']
>>> begin[0:2]
['physical', 'Physical']
Better yet, use str.lower() to limit the number of inputs you need to provide:
if answer.lower() == 'software':
With only one string to test, you can now put your functions in a dictionary; this gives you the option to list the various valid answers too:
options = {'software': software, 'physical': physical}
while True:
answer = input('Please enter one of the following options: {}\n'.format(
', '.join(options))
answer = answer.lower()
if answer in options:
options[answer]()
break
else:
print("Sorry, {} is not a valid option, try again".format(answer))
Your list slicing is wrong, Try the following script.
begin=['physical','Physical','Software','software',]
answer=input()
if answer in begin[2:4]:
print("k")
software()
if answer in begin[0:2]:
print("hmm")
physical()

Regular Expressions to Update a Text File in Python

I'm trying to write a script to update a text file by replacing instances of certain characters, (i.e. 'a', 'w') with a word (i.e. 'airplane', 'worm').
If a single line of the text was something like this:
a.function(); a.CallMethod(w); E.aa(w);
I'd want it to become this:
airplane.function(); airplane.CallMethod(worm); E.aa(worm);
The difference is subtle but important, I'm only changing 'a' and 'w' where it's used as a variable, not just another character in some other word. And there's many lines like this in the file. Here's what I've done so far:
original = open('original.js', 'r')
modified = open('modified.js', 'w')
# iterate through each line of the file
for line in original:
# Search for the character 'a' when not part of a word of some sort
line = re.sub(r'\W(a)\W', 'airplane', line)
modified.write(line)
original.close()
modified.close()
I think my RE pattern is wrong, and I think i'm using the re.sub() method incorrectly as well. Any help would be greatly appreciated.
If you're concerned about the semantic meaning of the text you're changing with a regular expression, then you'd likely be better served by parsing it instead. Luckily python has two good modules to help you with parsing Python. Look at the Abstract Syntax Tree and the Parser modules. There's probably others for JavaScript if that's what you're doing; like slimit.
Future reference on Regular Expression questions, there's a lot of helpful information here:
https://stackoverflow.com/tags/regex/info
Reference - What does this regex mean?
And it took me 30 minutes from never having used this JavaScript parser in Python (replete with installation issues: please note the right ply version) to writing a basic solution given your example. You can too.
# Note: sudo pip3 install ply==3.4 && sudo pip3 install slimit
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = 'a.funktion(); a.CallMethod(w); E.aa(w);'
tree = Parser().parse(data)
for node in nodevisitor.visit(tree):
if isinstance(node, ast.Identifier):
if node.value == 'a':
node.value = 'airplaine'
elif node.value == 'w':
node.value = 'worm'
print(tree.to_ecma())
It runs to give this output:
$ python3 src/python_renames_js_test.py
airplaine.funktion();
airplaine.CallMethod(worm);
E.aa(worm);
Caveats:
function is a reserved word, I used funktion
the to_ecma method pretty prints; there is likely another way to output it closer to the original input.
line = re.sub(r'\ba\b', 'airplane', line)
should get you closer. However, note that you will also be replacing a.CallMethod("That is a house") into airplane("That is airplane house"), and open("file.txt", "a") into open("file.txt", "airplane"). Getting it right in a complex syntax environment using RegExp is hard-to-impossible.