Replacing a word in a sentence with another word

Replacing a word in a sentence with another word - python-2.7

If I have a list of sentences. I need to traverse each sentence and check if any two words are the same in any two sentences. If yes then replace the word in the second sentence with a third word that is initialized. The third word is a common word (var3). For example: Rahul is eating an apple. Rahul drinks milk. Output : Rahul Is eating an apple. He is drinking milk.
var3='तो' #word to replace if words are same
summary=['Rahul drinks milk', 'Rahul eats rice', Seema is going to the market']
for sent in summary:
occurences = [index for index, value in enumerate(summary) if value == sent]
if len(occurences) > 1
for i in range(len(summary)):
for word in i:
var1=sent[i]
var2=sent[i+1]
if(var1==var2):
var3=var1
summary is the list of sentences. Now in this case there are three sentences. Where "Rahul" is the same in two sentences. So the word in the second sentence is replaced.
Can somebody please help me out with this?

class People():
def __init__(self,name,replace_with):
self.name = name
self.replace_with = replace_with
self.first_encountered = False
def __str__(self):
return self.name+" -- "+str(self.first_encountered)
sentences = ["Rahul is eating an apple.",
"Rahul drinks milk.",
"Rahul also drinks Beer.",
"Rahul likes Pizza",
"Seema is going to the market",
"Seema also drinks beer",
"and i am going to hell"
]
names= ["Rahul", "Seema"]
replaces = ["He","She"]
people = [ People(n,r) for n,r in zip(names,replaces) ]
new_sentence = []
found_in_any = [False,False]
for sentence in sentences:
for index,person in enumerate(people):
if(sentence.find(person.name)!=-1):
found_in_any[index] = True
if(not person.first_encountered):
person.first_encountered = True
new_sentence.append(sentence)
continue
if(person.first_encountered):
new_sentence.append(sentence.replace(person.name,person.replace_with))
else:
found_in_any[index] = False
if len(list(set(found_in_any))) == 1 and list(set(found_in_any))[0] == False:
new_sentence.append(sentence)
print(new_sentence)
output : ['Rahul is eating an apple.',
'He drinks milk.',
'He also drinks Beer.',
'He likes Pizza',
'Seema is going to the market',
'Seema is going to the market',
'She also drinks beer',
'and i am going to hell']

Here is a suggested solution
sen1 = "Rahul is eating an apple"
sen2 = "Rahul drinks milk"
var = "He"
for i in sen1.split(" "):
if i in sen2.split(" "):
sen2 = sen2.replace(i, var)
print(sen1)
print(sen2)
Output:
Rahul is eating an apple.
He drinks milk

sentences = ["Rahul is eating an apple.","Rahul drinks milk.","Rahul also drinks Beer.","Rahul likes Pizza","Seema is going to the market"]
new_sentence = [] first_encountered = False for sentence in sentences:
if(sentence.find(replace)!=-1):
if(not first_encountered):
first_encountered = True
new_sentence.append(sentence)
continue
if(first_encountered):
new_sentence.append(sentence.replace(replace,replace_with))
else:
new_sentence.append(sentence) new_sentence
Output :
['Rahul is eating an apple.',
'He drinks milk.',
'He also drinks Beer.',
'He likes Pizza',
'Seema is going to the market']

Related

Madlibs program throws ValueError

I'm learning Python through Codeacademy, and I'm having trouble with their Madlibs exercise. I've viewed the walkthrough after I began having trouble, but I can't see any differences between their code and mode. This is my code:
STORY = "This morning % woke up feeling %. 'It is going to be a % day!' Outside, a bunch of %s were protesting to keep % in stores. They began to % to the rhythm of the %, which made all the %s very %. Concerned, % texted %, who flew % to % and dropped % in a puddle of frozen %. % woke up in the year %, in a world where %s ruled the world."
print "Let the Madlibs begin!"
name = raw_input("Enter a name: ")
print "Please provide three adjectives: "
adj_1 = raw_input("1: ")
adj_2 = raw_input("2: ")
adj_3 = raw_input("3: ")
verb = raw_input("Enter a verb: ")
print "Now, input two nouns:"
noun_1 = raw_input("1: ")
noun_2 = raw_input("2: ")
print "Please provide a word for:"
animal = raw_input("An animal: ")
food = raw_input("A food: ")
fruit = raw_input("A fruit: ")
superhero = raw_input("A superhero: ")
country = raw_input("A country: ")
dessert = raw_input("A dessert: ")
year = raw_input("A year: ")
print STORY % (name, adj_1, adj_2, animal, food, verb, noun_1, noun_2, adj_3, name, superhero, name, country, name, dessert, name, year, noun_2)
When I run the program, I get the following error:
Traceback (most recent call last): File "Madlibs.py", line 34, in
print STORY % (name, adj_1, adj_2, animal, food, v erb, noun_1, noun_2, adj_3, name, superhero, name, cou ntry, name, dessert, name,
year, noun_2) ValueError: unsupported format character 'w' (0x77) at
index 15
Please help me see what I'm missing. Thank you!

Your format string (STORY) has some invalid placeholders in it. When you're formatting a string, you have to specify what type of data will be put at each placeholder. You do this by putting a letter after the % sign. In this case, since you're always putting in a string, that should be an s. So, STORY should start like this:
STORY = "This morning %s woke up feeling %s. [...]"
There are more details about this syntax in the Python documentation, which explains how to do things like format numbers in a certain way.
(However, it's worth bearing in mind that in modern Python we normally use a newer syntax using str.format(), which looks like this:
STORY = "This morning {name} woke up feeling {adj_1}. [...]"
print STORY.format(name="James", adj_1="terrible")
)

Python: Trying to loop through a string to find matching characters

I am trying to create a simple "guess the word" game in Python. My output is something like:
String: _____ _____
Guess a word: 'e'
String:_e__o __e_e
Guess a word: 'h'
(and so on)
String: hello there
I have a function to do this, and within this function I have this code:
def guessing(word):
count = 0
blanks = "_" * len(word)
letters_used = "" #empty string
while count<len(word):
guess = raw_input("Guess a letter:")
blanks = list(blanks)
#Checks if guesses are valid
if len(guess) != 1:
print "Please guess only one letter at a time."
elif guess not in ("abcdefghijklmnopqrstuvwxyz "):
print "Please only guess letters!"
#Checks if guess is found in word
if guess in word and guess not in letters_used:
x = word.index(guess)
for x in blanks:
blanks[x] = guess
letters_used += guess
print ("".join(blanks))
print "Number of misses remaining:", len(word)-counter
print "There are", str(word.count(guess)) + str(guess)
guess is the raw input I get from the user for a guess, and letters_used is just a collection of guesses that the user has already input. What I'm trying to do is loop through blanks based on the word.index(guess). Unfortunately, this returns:
Guess a letter: e
e___
Yes, there are 1e
Help would be much appreciated!

Your code was almost correct. There were few mistakes which I have corrected:
def find_all(needle, haystack):
"""
Finds all occurances of the string `needle` in the string `haystack`
To be invoked like this - `list(find_all('l', 'hello'))` => #[2, 3]
"""
start = 0
while True:
start = haystack.find(needle, start)
if start == -1: return
yield start
start += 1
def guessing(word):
letters_uncovered_count = 0
blanks = "_" * len(word)
blanks = list(blanks)
letters_used = ""
while letters_uncovered_count < len(word):
guess = raw_input("Guess a letter:")
#Checks if guesses are valid
if len(guess) != 1:
print "Please guess only one letter at a time."
elif guess not in ("abcdefghijklmnopqrstuvwxyz"):
print "Please only guess letters!"
if guess in letters_used:
print("This character has already been guessed correctly before!")
continue
#Checks if guess is found in word
if guess in word:
guess_positions = list(find_all(guess, word))
for guess_position in guess_positions:
blanks[x] = guess
letters_uncovered_count += 1
letters_used += guess
print ("".join(blanks))
print "Number of misses remaining:", len(word)-letters_uncovered_count
print "There are", str(word.count(guess)) + str(guess)
else:
print("Wrong guess! Try again!")

Creating a list of words does not work with a list of sentences

I am trying to take a list of sentences and split each list into new lists containing the words of each sentence.
def create_list_of_words(file_name):
for word in file_name:
word_list = word.split()
return word_list
sentence = ['a frog ate the dog']
x = create_list_of_words(sentence)
print x
This is fine as my output is
['a', 'frog', 'ate', 'the', 'dog']
However, when I try to do a list of sentences it no longer reacts the same.
my_list = ['the dog hates you', 'you love the dog', 'a frog ate the dog']
for i in my_list:
x = create_list_of_words(i)
print x
Now my out

You've had few issues at your second script:
i is 'the dog hates you' while in the first script the parameter was ['a frog ate the dog'] -> one is string and second is list.
word_list = word.split() with this line inside the loop you instantiate the word_list each iteration, instead use the append function as i wrote in my code sample.
When sending string to the function you need to split the string before the word loop.
Try this:
def create_list_of_words(str_sentence):
sentence = str_sentence.split()
word_list = []
for word in sentence:
word_list.append(word)
return word_list
li_sentence = ['the dog hates you', 'you love the dog', 'a frog ate the dog']
for se in li_sentence:
x = create_list_of_words(se)
print x

Count strings in a file, some single words, some full sentences

I want to count the occurrence of certain words and names in a file. The code below incorrectly counts fish and chips as one case of fish and one case of chips, instead of one count of fish and chips.
ngh.txt = 'test file with words fish, steak fish chips fish and chips'
import re
from collections import Counter
wanted = '''
"fish and chips"
fish
chips
steak
'''
cnt = Counter()
words = re.findall('\w+', open('ngh.txt').read().lower())
for word in words:
if word in wanted:
cnt[word] += 1
print cnt
Output:
Counter({'fish': 3, 'chips': 2, 'and': 1, 'steak': 1})
What I want is:
Counter({'fish': 2, 'fish and chips': 1, 'chips': 1, 'steak': 1})
(And ideally, I can get the output like this:
fish: 2
fish and chips: 1
chips: 1
steak: 1
)

Definition:
Wanted item: A string that is being searched for within the text.
To count wanted items, without re-counting them within longer wanted items, first count the number of times each one occurs within the string. Next, go through the wanted items, from longest to shortest, and as you encounter smaller wanted items that occur in a longer wanted item, subtract the number of results for the longer item from the shorter item. For example, assume your wanted items are "a", "a b", and "a b c", and your text is "a/a/a b/a b c". Searching for each of those individually produces: { "a": 4, "a b": 2, "a b c": 1 }. The desired result is: { "a b c": 1, "a b": #("a b") - #("a b c") = 2 - 1 = 1, "a": #("a") - #("a b c") - #("a b") = 4 - 1 - 1 = 2 }.
def get_word_counts(text, wanted):
counts = {}; # The number of times a wanted item was read
# Dictionary mapping word lengths onto wanted items
# (in the form of a dictionary where keys are wanted items)
lengths = {};
# Find the number of times each wanted item occurs
for item in wanted:
matches = re.findall('\\b' + item + '\\b', text);
counts[item] = len(matches)
l = len(item) # Length of wanted item
# No wanted item of the same length has been encountered
if (l not in lengths):
# Create new dictionary of items of the given length
lengths[l] = {}
# Add wanted item to dictionary of items with the given length
lengths[l][item] = 1
# Get and sort lenths of wanted items from largest to smallest
keys = lengths.keys()
keys.sort(reverse=True)
# Remove overlapping wanted items from the counts working from
# largest strings to smallest strings
for i in range(1,len(keys)):
for j in range(0,i):
for i_item in lengths[keys[i]]:
for j_item in lengths[keys[j]]:
#print str(i)+','+str(j)+': '+i_item+' , '+j_item
matches = re.findall('\\b' + i_item + '\\b', j_item);
counts[i_item] -= len(matches) * counts[j_item]
return counts
The following code contains test cases:
tests = [
{
'text': 'test file with words fish, steak fish chips fish and '+
'chips and fries',
'wanted': ["fish and chips","fish","chips","steak"]
},
{
'text': 'fish, fish and chips, fish and chips and burgers',
'wanted': ["fish and chips","fish","fish and chips and burgers"]
},
{
'text': 'fish, fish and chips and burgers',
'wanted': ["fish and chips","fish","fish and chips and burgers"]
},
{
'text': 'My fish and chips and burgers. My fish and chips and '+
'burgers',
'wanted': ["fish and chips","fish","fish and chips and burgers"]
},
{
'text': 'fish fish fish',
'wanted': ["fish fish","fish"]
},
{
'text': 'fish fish fish',
'wanted': ["fish fish","fish","fish fish fish"]
}
]
for i in range(0,len(tests)):
test = tests[i]['text']
print test
print get_word_counts(test, tests[i]['wanted'])
print ''
The output is as follows:
test file with words fish, steak fish chips fish and chips and fries
{'fish and chips': 1, 'steak': 1, 'chips': 1, 'fish': 2}
fish, fish and chips, fish and chips and burgers
{'fish and chips': 1, 'fish and chips and burgers': 1, 'fish': 1}
fish, fish and chips and burgers
{'fish and chips': 0, 'fish and chips and burgers': 1, 'fish': 1}
My fish and chips and burgers. My fish and chips and burgers
{'fish and chips': 0, 'fish and chips and burgers': 2, 'fish': 0}
fish fish fish
{'fish fish': 1, 'fish': 1}
fish fish fish
{'fish fish fish': 1, 'fish fish': 0, 'fish': 0}

So this solution works with your test data (and with some added terms to the test data, just to be thorough), though it can probably be improved upon.
The crux of it is to find occurances of 'and' in the words list and then to replace 'and' and its neighbours with a compound word (concatenating the neighbours with 'and') and adding this back to the list, along with a copy of 'and'.
I also converted the 'wanted' string to a list to handle the 'fish and chips' string as a distinct item.
import re
from collections import Counter
# changed 'wanted' string to a list
wanted = ['fish and chips','fish','chips','steak', 'and']
cnt = Counter()
words = re.findall('\w+', open('ngh.txt').read().lower())
for word in words:
# look for 'and', replace it and neighbours with 'comp_word'
# slice, concatenate, and append to make new words list
if word == 'and':
and_pos = words.index('and')
comp_word = str(words[and_pos-1]) + ' and ' +str(words[and_pos+1])
words = words[:and_pos-1] + words[and_pos+2:]
words.append(comp_word)
words.append('and')
for word in words:
if word in wanted:
cnt[word] += 1
print cnt
The output from your text would be:
Counter({'fish':2, 'and':1, 'steak':1, 'chips':1, 'fish and chips':1})
As noted in the comment above, it's unclear why you want/expect output to be 2 for fish, 2 for chips, and 1 for fish-and-chips in your ideal output. I'm assuming it's a typo, since the output above it has 'chips':1

I am suggesting two algorithms that will work on any patterns and any file.
The first algorithm has run time proportional to (number of characters in the file)* number of patterns.
1> For every pattern search all the patterns and create a list of super-patterns. This can be done by matching one pattern such as 'cat' against all patterns to be searched.
patterns = ['cat', 'cat and dogs', 'cat and fish']
superpattern['cat'] = ['cat and dogs', 'cat and fish']
2> Search for 'cat' in the file, let's say result is cat_count
3> Now search for every supper pattern of 'cat' in file and get their counts
for (sp in superpattern['cat']) :
sp_count = match sp in file.
cat_count = cat_count - sp
This a general solution that is brute force. Should be able to come up with a linear time solution if we arrange the patterns in a Trie.
Root-->f-->i-->s-->h-->a and so on.
Now when you are at h of the fish, and you do not get an a, increment fish_count and go to root. If you get 'a' continue. Anytime you get something un-expected, increment count of most recently found pattern and go to root or go to some other node (the longest match prefix that is a suffix of that other node). This is the Aho-Corasick algorithm, you can look it up on wikipedia or at:
http://www.cs.uku.fi/~kilpelai/BSA05/lectures/slides04.pdf
This solution is linear to the number of characters in the file.

perform join python returns a "none"

my "output" seems to populate fine however when I perform the join I get "none returned.
Any ideas?
def englishify_sentence(s):
"""English"""
words = s.lower()
words = words.split()# splits sentence into individual words
output = []
for i in range(len(words)):
if words[i].endswith("way"):
opt1 = words[i][:-3]
letters = list(opt1)#breaks given word into individual letters
translate = ("("+opt1+" or "+"w"+opt1+")")
output.append(translate)
else:
opt2 = words[i][:-2]
opt2_letters = list(opt2)#breaks given word into individual letters
first_letter = (opt2_letters.pop(-1))
opt3 = words[i][:-3]
translate2 = (first_letter+opt3)
output.append(translate2)
english = " ".join(output)#removes speech marks and creates a "real" word
print(output)
english = englishify_sentence("oday ouyay antway anway eggway")
print(english)

You forgot to return the value.
return english

Is it the print(output) that's giving you a none or the print(english)?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Replacing a word in a sentence with another word - python-2.7

Here is a suggested solution sen1 = "Rahul is eating an apple" sen2 = "Rahul drinks milk" var = "He" for i in sen1.split(" "): if i in sen2.split(" "): sen2 = sen2.replace(i, var) print(sen1) print(sen2) Output: Rahul is eating an apple. He drinks milk

Related

Madlibs program throws ValueError

Python: Trying to loop through a string to find matching characters

Creating a list of words does not work with a list of sentences

Count strings in a file, some single words, some full sentences

perform join python returns a "none"

Categories

Resources