Understanding Recursive Function - python-2.7

I'm working through the book NLP with Python, and I came across this example from an 'advanced' section. I'd appreciate help understanding how it works. The function computes all possibilities of a number of syllables to reach a 'meter' length n. Short syllables "S" take up one unit of length, while long syllables "L" take up two units of length. So, for a meter length of 4, the return statement looks like this:
['SSSS', 'SSL', 'SLS', 'LSS', 'LL']
The function:
def virahanka1(n):
if n == 0:
return [""]
elif n == 1:
return ["S"]
else:
s = ["S" + prosody for prosody in virahanka1(n-1)]
l = ["L" + prosody for prosody in virahanka1(n-2)]
return s + l
The part I don't understand is how the 'SSL', 'SLS', and 'LSS' matches are made, if s and l are separate lists. Also in the line "for prosody in virahanka1(n-1)," what is prosody? Is it what the function is returning each time? I'm trying to think through it step by step but I'm not getting anywhere. Thanks in advance for your help!
Adrian

Let's just build the function from scratch. That's a good way to understand it thoroughly.
Suppose then that we want a recursive function to enumerate every combination of Ls and Ss to make a given meter length n. Let's just consider some simple cases:
n = 0: Only way to do this is with an empty string.
n = 1: Only way to do this is with a single S.
n = 2: You can do it with a single L, or two Ss.
n = 3: LS, SL, SSS.
Now, think about how you might build the answer for n = 4 given the above data. Well, the answer would either involve adding an S to a meter length of 3, or adding an L to a meter length of 2. So, the answer in this case would be LL, LSS from n = 2 and SLS, SSL, SSSS from n = 3. You can check that this is all possible combinations. We can also see that n = 2 and n = 3 can be obtained from n = 0,1 and n=1,2 similarly, so we don't need to special-case them.
Generally, then, for n ≥ 2, you can derive the strings for length n by looking at strings of length n-1 and length n-2.
Then, the answer is obvious:
if n = 0, return just an empty string
if n = 1, return a single S
otherwise, return the result of adding an S to all strings of meter length n-1, combined with the result of adding an L to all strings of meter length n-2.
By the way, the function as written is a bit inefficient because it recalculates a lot of values. That would make it very slow if you asked for e.g. n = 30. You can make it faster very easily by using the new lru_cache from Python 3.3:
#lru_cache(maxsize=None)
def virahanka1(n):
...
This caches results for each n, making it much faster.

I tried to melt my brain. I added print statements to explain to me what was happening. I think the most confusing part about recursive calls is that it seems to go into the call forward but come out backwards, as you may see with the prints when you run the following code;
def virahanka1(n):
if n == 4:
print 'Lets Begin for ', n
else:
print 'recursive call for ', n, '\n'
if n == 0:
print 'n = 0 so adding "" to below'
return [""]
elif n == 1:
print 'n = 1 so returning S for below'
return ["S"]
else:
print 'next recursivly call ' + str(n) + '-1 for S'
s = ["S" + prosody for prosody in virahanka1(n-1)]
print '"S" + each string in s equals', s
if n == 4:
print '**Above is the result for s**'
print 'n =',n,'\n', 'next recursivly call ' + str(n) + '-2 for L'
l = ["L" + prosody for prosody in virahanka1(n-2)]
print '\t','what was returned + each string in l now equals', l
if n == 4:
print '**Above is the result for l**','\n','**Below is the end result of s + l**'
print 'returning s + l',s+l,'for below', '\n','='*70
return s + l
virahanka1(4)
Still confusing for me, but with this and Jocke's elegant explanation, I think I can understand what is going on.
How about you?
Below is what the code above produces;
Lets Begin for 4
next recursivly call 4-1 for S
recursive call for 3
next recursivly call 3-1 for S
recursive call for 2
next recursivly call 2-1 for S
recursive call for 1
n = 1 so returning S for below
"S" + each string in s equals ['SS']
n = 2
next recursivly call 2-2 for L
recursive call for 0
n = 0 so adding "" to below
what was returned + each string in l now equals ['L']
returning s + l ['SS', 'L'] for below
======================================================================
"S" + each string in s equals ['SSS', 'SL']
n = 3
next recursivly call 3-2 for L
recursive call for 1
n = 1 so returning S for below
what was returned + each string in l now equals ['LS']
returning s + l ['SSS', 'SL', 'LS'] for below
======================================================================
"S" + each string in s equals ['SSSS', 'SSL', 'SLS']
**Above is the result for s**
n = 4
next recursivly call 4-2 for L
recursive call for 2
next recursivly call 2-1 for S
recursive call for 1
n = 1 so returning S for below
"S" + each string in s equals ['SS']
n = 2
next recursivly call 2-2 for L
recursive call for 0
n = 0 so adding "" to below
what was returned + each string in l now equals ['L']
returning s + l ['SS', 'L'] for below
======================================================================
what was returned + each string in l now equals ['LSS', 'LL']
**Above is the result for l**
**Below is the end result of s + l**
returning s + l ['SSSS', 'SSL', 'SLS', 'LSS', 'LL'] for below
======================================================================

This function says that:
virakhanka1(n) is the same as [""] when n is zero, ["S"] when n is 1, and s + l otherwise.
Where s is the same as the result of "S" prepended to each elements in the resulting list of virahanka1(n - 1), and l the same as "L" prepended to the elements of virahanka1(n - 2).
So the computation would be:
When n is 0:
[""]
When n is 1:
["S"]
When n is 2:
s = ["S" + "S"]
l = ["L" + ""]
s + l = ["SS", "L"]
When n is 3:
s = ["S" + "SS", "S" + "L"]
l = ["L" + "S"]
s + l = ["SSS", "SL", "LS"]
When n is 4:
s = ["S" + "SSS", "S" + "SL", "S" + "LS"]
l = ["L" + "SS", "L" + "L"]
s + l = ['SSSS", "SSL", "SLS", "LSS", "LL"]
And there you have it, step by step.
You need to know the results of the other function calls in order to calculate the final value, which can be pretty messy to do manually as you can see. It is important though that you do not try to think recursively in your head. This would cause your mind to melt. I described the function in words, so that you can see that these kind of functions is are descriptions, and not a sequence of commands.
The prosody you see, that is a part of s and l definitions, are variables. They are used in a list-comprehension, which is a way of building lists. I've described earlier how this list is built.

Related

Find starting and ending index of each unique charcters in a string in python

I have a string with characters repeated. My Job is to find starting Index and ending index of each unique characters in that string. Below is my code.
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
mo = re.search(item,x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
Output :
a 0 1
b 3 4
c 7 8
Here the end index of the characters are not correct. I understand why it's happening but how can I pass the character to be matched dynamically to the regex search function. For instance if I hardcode the character in the search function it provides the desired output
x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
output:
b 2 5
The above function is providing correct result but here I can't pass the characters to be matched dynamically.
It will be really a help if someone can let me know how to achieve this any hint will also do. Thanks in advance
String literal formatting to the rescue:
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
# for patterns better use raw strings - and format the letter into it
mo = re.search(fr"{item}+",x) # fr and rf work both :) its a raw formatted literal
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n) # fix upper limit by n-1
Output:
a 0 3 # you do see that the upper limit is off by 1?
b 3 7 # see above for fix
c 7 9
Your pattern does not need the [] around the letter - you are matching just one anyhow.
Without regex1:
x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
if last_ch == ch:
continue
else:
print(last_ch,start_idx, idx-1)
last_ch = ch
start_idx = idx
print(ch,start_idx,idx)
output:
a 0 2 # not off by 1
b 3 6
c 7 8
1RegEx: And now you have 2 problems...
Looking at the output, I'm guessing that another option would be,
import re
x = "aaabbbbcc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
print(output)
Output
a 0 3
b 3 7
c 7 9
I think it'll be in the Order of N, you can likely benchmark it though, if you like.
import re, time
timer_on = time.time()
for i in range(10000000):
x = "aabbbbccc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
timer_off = time.time()
timer_total = timer_off - timer_on
print(timer_total)

nested list of lists of inegers - doing arithmetic operation

I have a list like below and need to firs add items in each list and then multiply all results 2+4 = 6 , 3+ (-2)=1, 2+3+2=7, -7+1=-6 then 6*1*7*(-6) = -252 I know how to do it by accessing indexes and it works (as below) but I also need to do it in a way that it will work no matter how many sublist there is
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]]
a= nested_lst[0][0] + nested_lst[0][1]
b= nested_lst[1][0] + nested_lst[1][1]
c= nested_lst[2][0] + nested_lst[2][1] + nested_lst[2][2]
d= nested_lst[3][0] + nested_lst[3][1]
def sum_then_product(list):
multip= a*b*c*d
return multip
print sum_then_product(nested_lst)
I have tried with for loop which gives me addition but I don't know how to perform here multiplication. I am new to it. Please, help
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]]
for i in nested_lst:
print sum(i)
Is this what you are looking for?
nested_lst = [[2,4], [3,-2],[2,3,2], [-7,1]] # your list
output = 1 # this will generate your eventual output
for sublist in nested_lst:
sublst_out = 0
for x in sublist:
sublst_out += x # your addition of the sublist elements
output *= sublst_out # multiply the sublist-addition with the other sublists
print(output)

Efficient way for the following code

I saw this problem on hackerrank.com, the problem is to find a 4 letter palindrome from a given string which can be a long string also.
Constraint is as follows:
where, |s| is the length of the string and a,b,c,d are the positions of the corresponding letters in the palindrome.
I found out the solution for this, but it isn't efficient enough, as in during the processing time it gives 'time out' error. The code is as follows:
s='kkkkkkz'
n=0
c_i,c_j,c_k,c_l=0,0,0,0
for i in range(len(s)):
j=0;c_i+=1
while j>=0 and j<len(s):
c_j+=1
if j>i:
k=0
while k>=0 and k<len(s):
c_k+=1
if k>j:
l=0
while l>=0 and l<len(s):
c_l+=1
if l>k:
a=s[i]+s[j]+s[k]+s[l]
if a[0]==a[3] and a[1]==a[2]: n+=1
l+=1
k+=1
j+=1
print n
I thought of noticing the number of times each loop runs, which right now is 7,49,147 and 245.
It is still better than the techniques I followed before, but I am not able to to do better than this.
Suggestions please ?
One way is to use the following, but this will still not be efficient enough. Scores 12/40 ..
import itertools
s=WHATEVERSTRING
n=0
for a in itertools.combinations(s, 4):
n += (a[0] == a[3])*(a[1]==a[2])
print(n)
A working solution is to go down the following route: create a set of unique characters in the string, and map substring pairs to a dictionary. Then count all the occurrences of pairwise pairs.
from collections import defaultdict as di
data = [x for x in s.strip()]
chars = set(data)
sum_a = 0
for c in chars:
a = 0
b = di(int)
double_pairs = 0
for d in data:
if d == c:
sum_a += double_pairs
double_pairs += b[c]
b[c]+=a
a += 1
else:
double_pairs += b[d]
b[d] += a
print(sum_a%(10**9+7))

python 2.7 iterate on list printing subsets of a list

I have this list: l = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] and I would like to iterate on it, printing something like
0: [1,2,3,4]
1: [2,3,4,5]
2: [3,4,5,6]
...
n: [17,18,19,20]
So far I made this code to print 5 elements at a time, but the last iteration prints 3:
for index, item in enumerate(l):
if index == 0 or index == 1 or index == 2:
continue
print index, l[index - 3:index + 2]
How can I solve this?
You're on the right track with your list slices. Here's a tweak to make it easier:
sub_len = 4
for i in range(len(mylist) - sub_len + 1):
print l[i:i + sub_len]
Where sub_len is the desired length of the slices you print.
demo

Extracting columns with a difference in aligned data

I have some aligned data (something bioinformatic related) as so:
reference_string = 'yearning'
string2 = 'learning'
string3 = 'aligning'
I need to extract only columns showing differences in relation to the reference data.
The output should show only positional information of the columns containing differences in relation to the reference string and the corresponding reference item.
1 2 3 4
y e a r
l
a l i g
My current code does most things okay except that it also reports columns with no difference.
string1 = 'yearning'
string2 = 'learning'
string3 = 'aligning'
string_list = [string1, string2]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
for d in all_diffs:
print str(int(d+1)),
print
for c in reference:
print str(c),
print
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j in diffs_top[i]:
print str(c),
else:
print str(' '),
print
This code would give:
1 2 3 4
y e a r n i n g
l
a l i g
Any help appreciated.
EDIT: I have picked some section of real data to make the problem as clearer as possible and my attempt at solving it thus far:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)), # retains natural positions of the reference residues
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
The print out will be an output showing the position at which there is any difference to other non-reference strings and the corresponding reference letter.
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Then the next step is to write a code that will process non reference strings by printing out the difference with the reference (at that position). If there is no difference it will leave blank (' ').
Doing it manually the output will be:
3 4 6 7 8 9 11 13 14 15 17 18
H E G P Q R A G Q P A S
Q Q S L R H D
Q R A H T L D T
Q R H A S V A D A
My entire code as an attempt to get to the solution above as been messy to say the least:
reference_string = 'MAHEWGPQRLAGGQPQAS'
string1 = 'MAQQWSLQRLAGRHPQDS'
string2 = 'MAQRWGAHRLTGGQLQDT'
string3 = 'MAQRWGPHALSGVQAQDA'
string_list = [string1, string2, string3]
reference = reference_string
diffs_top, diffs = [], []
all_diffs = set()
for s in string_list:
diffs = []
for i, c in enumerate(s):
if s[i] != reference[i]:
diffs.append(i)
all_diffs.add(i)
diffs_top.append(diffs)
#print diffs_top
#print all_diffs
for d in all_diffs:
print str(int(d+1)),
print
for d in all_diffs:
for i, c in enumerate(reference):
if i == d:
print c,
print
# this is my attempt to look into non-reference strings
# to check for the difference with the reference, and print an output.
for d in all_diffs:
for i, s in enumerate(string_list):
for j, c in enumerate(s):
if j == d:
print c,
else:
print str(' '),
print
Your code is working perfectly fine (as per your logic).
What is happening , is that while printing the output, when you come across the reference string, Python looks for the corresponding entry in the diffs_top list and because while storing in diff_top, you have no entry stored for the reference string, Python just prints blank spaces for your reference string.
1 2 3 4
y e a r n i n g #prints the reference string, because you've coded in that way
#prints blank as string_list[0] and reference string are the same
l
a l i g
The question here is how exactly do you define your difference for reference string.
Besides, I also found some fundamental flaws in your code implementation. If you try to run your code by setting string_list[1] as your reference string, you would get your output as :
1 2 3 4
l e a r n i n g
y
a l i g
Is this what you need? Please spend some time in properly defining difference for all cases and then try to implement you code.
EDIT:
As per you updated requirements, replace the last block in your code with this:
for i, s in enumerate(string_list):
for d in all_diffs:
if d in diffs_top[i]:
print s[d],
else:
print ' ',
print
Cheers!
I think there is a general problem in your logic. If you need to extract only columns showing difference in relation to the reference data and string1 is the reference the output should be:
1 2 3 4
l
a l i g
So, 'yearning' shouldn't show any character because it has no difference to string1.
If you delete or put the following lines in comments, you will exactly get what I expect is the right answer:
#for c in reference:
# print str(c),
#print
Consider to review your logic if this solution is not what you actually want.
Update
Here is a shorter solution which solves your task:
from itertools import compress, izip_longest
def delta(reference, string):
return [ '' if a == b else b for a, b in izip_longest(reference, string)]
ref_string = 'MAHEWGPQRLAGGQPQAS'
strings = ['MAQQWSLQRLAGRHPQDS',
'MAQRWGAHRLTGGQLQDT',
'MAQRWGPHALSGVQAQDA']
delta_strings = [delta(ref_string, string) for string in strings]
selectors = [1 if any(tup) else 0 for tup in izip_longest(*delta_strings)]
indices = [str(i+1) for i in range(len(selectors))]
output_data = [indices, ref_string] + delta_strings
for line in output_data:
print ''.join(x.rjust(3) for x in compress(line, selectors))
Explanation:
I defined a function delta(reference, string) which returns the delta between the string and the referenced string. For example: delta("ABFF", "AECF") returns the list ['', E, C, ''].
The variable delta_strings holds all the deltas between each string in the list strings and the reference string ref_string.
The variable selector is a list containing only 1 and 0 values, where 0 specifies the collumns which shouldn't be printed and vice versa.