How to write a RNG code in Python 2.7 that writes shakespeare - python-2.7

For fun, I'm trying to write a code in python that associates a random number with a letter of the alphabet or punctuation mark and adds that letter to a list. I then want to have the code keep making new lists of random letters until it outputs "to be or not to be, that is the question." I then want to print that list and see how many evaluations it took. This is what I have so far.
from random import *
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',',',' ','.']
sentence = []
numbers = []
def random(x):
randval = x
return randval
count = 0
for i in range(1000): # trying to place an upper bound on how many times to try
for i in range(41): # the number of characters in the sentence
randomness = random(randint(0,28)) # the number of enteries in the alphabet list
numbers.append(randomness)
for i in numbers:
count += 1
sentence.append(alphabet[i])
if sentence!=['t','o',' ','b','e',' ','o','r',' ','n','o','t',' ','t','o',' ','b','e',',',' ','t','h','a','t',' ','i','s','t','h','e',' ','q','u','e','s','t','i','o','n','.']:
sentence = [] ### This is supposed to empty the list if it gets the wrong order, but doesn't quite do that.
if sentence == ['t','o',' ','b','e',' ','o','r',' ','n','o','t',' ','t','o',' ','b','e',',',' ','t','h','a','t',' ','i','s','t','h','e',' ','q','u','e','s','t','i','o','n','.']:
print sentence
print count
break
new_sentence = ''.join(sentence)
print new_sentence
I'm not sure what I'm doing wrong. The list size keeps blowing up instead of keeping a length of 41. suggestions?

Related

Pseudo code to find number of occurrence of characters in a documents

I am trying to write a Pseudo-Code for a MapReduce technique where I need to find the number of occurrence of characters in the document. For example:
m: 1000 times, M: 5000 times, "": 3000 times, \n: 100 times, .:20000 times etc.
Can someone please let me know if this is this correct or I can make it better?
I have written the Pseudo-Code as shown below:
def Map(documentName, documentContent)
For Character in documentContent
EmitIntermediate(Character, 1)
def Reduce(Character, Counts)
Char_Count = 0
For count in Counts
Char_Count += count
Emit(Character,Char_Count)
I referred some of the online available Pseudo-Code for map-reduce technique and wrote this one.
For example, they have used to the following Pseudo-Code to find the number of occurrence of the word in a document:
def map(documentName, documentContent):
for line in documentContent:
words = line.split(" ")
for word in words:
EmitIntermediate(word, 1)
def reduce(word, counts):
wordCount = 0
for count in counts:
wordCount += count
Emit(word, wordCount)
def Map(documentName, documentContent)
For line in documentContent
Line_String = line
For Charcter in Line_String
EmitIntermediate(Character, 1)
def Reduce(Character, Counts)
Char_Count = 0
For count in Counts
Char_Count += count
Emit(Character,Char_Count)

Is there a pythonic way to count the number of leading matching characters in two strings?

For two given strings, is there a pythonic way to count how many consecutive characters of both strings (starting at postion 0 of the strings) are identical?
For example in aaa_Hello and aa_World the "leading matching characters" are aa, having a length of 2. In another and example there are no leading matching characters, which would give a length of 0.
I have written a function to achive this, which uses a for loop and thus seems very unpythonic to me:
def matchlen(string0, string1): # Note: does not work if a string is ''
for counter in range(min(len(string0), len(string1))):
# run until there is a mismatch between the characters in the strings
if string0[counter] != string1[counter]:
# in this case the function terminates
return(counter)
return(counter+1)
matchlen(string0='aaa_Hello', string1='aa_World') # returns 2
matchlen(string0='another', string1='example') # returns 0
You could use zip and enumerate:
def matchlen(str1, str2):
i = -1 # needed if you don't enter the loop (an empty string)
for i, (char1, char2) in enumerate(zip(str1, str2)):
if char1 != char2:
return i
return i+1
An unexpected function in os.path, commonprefix, can help (because it is not limited to file paths, any strings work). It can also take in more than 2 input strings.
Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string ('').
from os.path import commonprefix
print(len(commonprefix(["aaa_Hello","aa_World"])))
output:
2
from itertools import takewhile
common_prefix_length = sum(
1 for _ in takewhile(lambda x: x[0]==x[1], zip(string0, string1)))
zip will pair up letters from the two strings; takewhile will yield them as long as they're equal; and sum will see how many there are.
As bobble bubble says, this indeed does exactly the same thing as your loopy thing. Its sole pro (and also its sole con) is that it is a one-liner. Take it as you will.

Python - Obtain the most frequent word in a sentence, if there is a tie return the word that appears first in alphabetical order

I have written the following code below. It works without errors, the problem that I am facing is that if there are 2 words in a sentence that have been repeated the same number of times, the code does not return the first word in alphabetical order. Can anyone please suggest any alternatives? This code is going to be evaluated in Python 2.7.
"""Quiz: Most Frequent Word"""
def most_frequent(s):
"""Return the most frequently occuring word in s."""
""" Step 1 - The following assumptions have been made:
- Space is the default delimiter
- There are no other punctuation marks that need removing
- Convert all letters into lower case"""
word_list_array = s.split()
"""Step 2 - sort the list alphabetically"""
word_sort = sorted(word_list_array, key=str.lower)
"""Step 3 - count the number of times word has been repeated in the word_sort array.
create another array containing the word and the frequency in which it is repeated"""
wordfreq = []
freq_wordsort = []
for w in word_sort:
wordfreq.append(word_sort.count(w))
freq_wordsort = zip(wordfreq, word_sort)
"""Step 4 - output the array having the maximum first index variable and output the word in that array"""
max_word = max(freq_wordsort)
word = max_word[-1]
result = word
return result
def test_run():
"""Test most_frequent() with some inputs."""
print most_frequent("london bridge is falling down falling down falling down london bridge is falling down my fair lady") # output: 'bridge'
print most_frequent("betty bought a bit of butter but the butter was bitter") # output: 'butter'
if __name__ == '__main__':
test_run()
Without messing too much around with your code, I find that a good solution can be achieved through the use of the index method.
After having found the word with the highest frequency (max_word), you simply call the index method on wordfreq providing max_word as input, which returns its position in the list; then you return the word associated to this index in word_sort.
Code example is below (I removed the zip function as it is not needed anymore, and added two simpler examples):
"""Quiz: Most Frequent Word"""
def most_frequent(s):
"""Return the most frequently occuring word in s."""
""" Step 1 - The following assumptions have been made:
- Space is the default delimiter
- There are no other punctuation marks that need removing
- Convert all letters into lower case"""
word_list_array = s.split()
"""Step 2 - sort the list alphabetically"""
word_sort = sorted(word_list_array, key=str.lower)
"""Step 3 - count the number of times word has been repeated in the word_sort array.
create another array containing the word and the frequency in which it is repeated"""
wordfreq = []
# freq_wordsort = []
for w in word_sort:
wordfreq.append(word_sort.count(w))
# freq_wordsort = zip(wordfreq, word_sort)
"""Step 4 - output the array having the maximum first index variable and output the word in that array"""
max_word = max(wordfreq)
word = word_sort[wordfreq.index(max_word)] # <--- solution!
result = word
return result
def test_run():
"""Test most_frequent() with some inputs."""
print(most_frequent("london bridge is falling down falling down falling down london bridge is falling down my fair lady")) # output: 'down'
print(most_frequent("betty bought a bit of butter but the butter was bitter")) # output: 'butter'
print(most_frequent("a a a a b b b b")) #output: 'a'
print(most_frequent("z z j j z j z j")) #output: 'j'
if __name__ == '__main__':
test_run()

Why is max number ignoring two-digit numbers?

At the moment I am saving a set of variables to a text file. I am doing following to check if my code works, but whenever I use a two-digit numbers such as 10 it would not print this number as the max number.
If my text file looked like this.
tom:5
tom:10
tom:1
It would output 5 as the max number.
name = input('name')
score = 4
if name == 'tom':
fo= open('tom.txt','a')
fo.write('Tom: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
if name == 'wood':
fo= open('wood.txt','a')
fo.write('Wood: ')
fo.write(str(score ))
fo.write("\n")
fo.close()
tomL2 = []
woodL2 = []
fo = open('tom.txt','r')
tomL = fo.readlines()
tomLi = tomL2 + tomL
fo.close
tomLL=max(tomLi)
print(tomLL)
fo = open('wood.txt','r')
woodL = fo.readlines()
woodLi = woodL2 + woodL
fo.close
woodLL=max(woodLi)
print(woodLL)
You are comparing strings, not numbers. You need to convert them into numbers before using max. For example, you have:
tomL = fo.readlines()
This contains a list of strings:
['tom:5\n', 'tom:10\n', 'tom:1\n']
Strings are ordered lexicographically (much like how words would be ordered in an English dictionary). If you want to compare numbers, you need to turn them into numbers first:
tomL_scores = [int(s.split(':')[1]) for s in tomL]
The parsing is done in the following way:
….split(':') separates the string into parts using a colon as the delimiter:
'tom:5\n' becomes ['tom', '5\n']
…[1] chooses the second element from the list:
['tom', '5\n'] becomes '5\n'
int(…) converts a string into an integer:
'5\n' becomes 5
The list comprehension [… for s in tomL] applies this sequence of operations to every element of the list.
Note that int (or similarly float) are rather picky about what it accepts: it must be in the form of a valid numeric literal or it will be rejected with an error (although preceding and trailing whitespace is allowed). This is why you need ….split(':')[1] to massage the string into a form that it's willing to accept.
This will yield:
[5, 10, 1]
Now, you can apply max to obtain the largest score.
As a side-note, the statement
fo.close
will not close a file, since it doesn't actually call the function. To call the function you must enclose the arguments in parentheses, even if there are none:
fo.close()

How to find elements of a list are subset of string using python

I am a Molecular Biologist and new to programming, so excuse me for my language. I am working with python.
Example:
string = "gctatagcgttatatactagcctatagctata"
list = ["gtagctaggac", "mptalltruiworw", "12365478995", "nvncmvncmvncmvn"]
now coming to question
I want to know a method which can discover that
for element in list:
if element is subset of string (in any order)
return element
In above example the answer should be
gtagctaggac
Rather than spend time generating permutations, compare the number of letters in the list element and the string. Note that this code doesn't check for letters in the string that are not in the pattern.
string = "gctatagcgttatatactagcctatagctata"
list = ["gtagctaggac", "mptalltruiworw", "12365478995", "nvncmvncmvncmvn"]
from collections import defaultdict
def count_letters(string):
counts = defaultdict(int)
for letter in string:
counts[letter] += 1
return counts
sc = count_letters(string)
for element in list:
counts = count_letters(element)
if all([sc[letter] >= counts[letter] for letter in counts]):
print "Found", element
As a matter of style, it's better not to use the names of built-in classes like "list" and "string".