Matching enclosing characters Python - python-2.7

So I am trying to split characters a certain way.
If I provide this string:
text (text (adsf (asdfasdfjkl) asdfjlkasdjf) stuff) (morestuff stuff)
I want it to split it into:
['text', '(text (adsf (asdfasdfjkl) asdfjlkasdjf) stuff)', '(morestuff stuff)']
Code I had:
def pair_char(left, right, start, text, exclusive=False, verbose=False):
package = []
for e, c in enumerate(text):
left_c = right_c = 0
if text[e] == left:
left_c += 1
marker = start = e
while text[marker+1] != right or left_c > right_c:
marker += 1
if verbose:
print left_c, right_c, text[marker], left, right, text[marker]==left, text[marker]==right
if marker+1 >= len(text):
break
if text[marker] == left_c:
print "left_c"
left_c += 1
if text[marker] == right_c:
print "right_c"
right_c += 1
end = marker
if exclusive:
package.append(text[start+1:end])
else:
package.append(text[start:end+1])
e = end
package = "".join(package)
return package
Any suggestions?

Related

Find starting and ending index of each unique charcters in a string in python

I have a string with characters repeated. My Job is to find starting Index and ending index of each unique characters in that string. Below is my code.
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
mo = re.search(item,x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
Output :
a 0 1
b 3 4
c 7 8
Here the end index of the characters are not correct. I understand why it's happening but how can I pass the character to be matched dynamically to the regex search function. For instance if I hardcode the character in the search function it provides the desired output
x = 'aabbbbccc'
xs = set(x)
mo = re.search("[b]+",x)
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n)
output:
b 2 5
The above function is providing correct result but here I can't pass the characters to be matched dynamically.
It will be really a help if someone can let me know how to achieve this any hint will also do. Thanks in advance
String literal formatting to the rescue:
import re
x = "aaabbbbcc"
xs = set(x)
for item in xs:
# for patterns better use raw strings - and format the letter into it
mo = re.search(fr"{item}+",x) # fr and rf work both :) its a raw formatted literal
flag = item
m = mo.start()
n = mo.end()
print(flag,m,n) # fix upper limit by n-1
Output:
a 0 3 # you do see that the upper limit is off by 1?
b 3 7 # see above for fix
c 7 9
Your pattern does not need the [] around the letter - you are matching just one anyhow.
Without regex1:
x = "aaabbbbcc"
last_ch = x[0]
start_idx = 0
# process the remainder
for idx,ch in enumerate(x[1:],1):
if last_ch == ch:
continue
else:
print(last_ch,start_idx, idx-1)
last_ch = ch
start_idx = idx
print(ch,start_idx,idx)
output:
a 0 2 # not off by 1
b 3 6
c 7 8
1RegEx: And now you have 2 problems...
Looking at the output, I'm guessing that another option would be,
import re
x = "aaabbbbcc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
print(output)
Output
a 0 3
b 3 7
c 7 9
I think it'll be in the Order of N, you can likely benchmark it though, if you like.
import re, time
timer_on = time.time()
for i in range(10000000):
x = "aabbbbccc"
xs = re.findall(r"((.)\2*)", x)
start = 0
output = ''
for item in xs:
end = start + len(item[0])
output += (f"{item[1]} {start} {end}\n")
start = end
timer_off = time.time()
timer_total = timer_off - timer_on
print(timer_total)

Converting into range in Python

I want to convert a list into range.
a = ['Eth1/1', 'Eth1/2', 'Eth1/3', 'Eth1/4', 'Eth1/5', 'Eth1/6', 'Eth1/7', 'Eth1/8', 'Eth1/9', 'Eth1/10','Eth2/1', 'Eth2/2', 'Eth2/3', 'Eth2/4', 'Eth2/5', 'Eth2/6','Eth3/1', 'Eth3/2', 'Eth3/3', 'Eth3/4', 'Eth3/5', 'Eth3/6','Eth4/1', 'Eth4/2', 'Eth4/3', 'Eth4/4', 'Eth4/5', 'Eth4/6']
what i am trying :
fp = open('mode.txt' , 'w+')
for i in a:
fp.write('confi ' + i + '\n mode \n')
what i am looking for :
confi Eth1/1-5
mode
confi Eth1/6-10
mode
confi Eth2/1-6
mode
confi Eth3/1-6
mode
confi Eth4/1-6
mode
Any idea how to do this ?
You could create a loop that checks the current element as the start. If it starts with Eth1 then get the 4th element after as the end. Otherwise, keep the starting Eth_, iterate through the list until you get the last Eth_ element or until the list end. Assign the last element as the end.
a = ['Eth1/1', 'Eth1/2', 'Eth1/3', 'Eth1/4', 'Eth1/5', 'Eth1/6', 'Eth1/7', 'Eth1/8', 'Eth1/9', 'Eth1/10','Eth2/1', 'Eth2/2', 'Eth2/3', 'Eth2/4', 'Eth2/5', 'Eth2/6','Eth3/1', 'Eth3/2', 'Eth3/3', 'Eth3/4', 'Eth3/5', 'Eth3/6','Eth4/1', 'Eth4/2', 'Eth4/3', 'Eth4/4', 'Eth4/5', 'Eth4/6']
i = 0
while i < len(a):
start = a[i].split('/')
if (start[0] == 'Eth1'):
i += 5
else:
key = start[0]
i += 1
while i < len(a) and a[i].split('/')[0] == key:
i += 1
end = a[i-1].split('/')
print('confi ' + start[0] + '/' + start[1] + '-' + end[1] + '\n mode\n')

Split string of digits into individual cells, including digits within parentheses/brackets

I have a column where each cell has a string of digits, ?, -, and digits in parentheses/brackets/curly brackets. A good example would be something like the following:
3????0{1012}?121-2[101]--01221111(01)1
How do I separate the string into different cells by characters, where a 'character' in this case refers to any number, ?, -, and value within the parentheses/brackets/curly brackets (including said parentheses/brackets/curly brackets)?
In essence, the string above would turn into the following (spaced apart to denote a separate cell):
3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
The amount of numbers within the parentheses/brackets/curly brackets vary. There are no letters in any of the strings.
Here you are!
RegEx method:
Sub Test_RegEx()
Dim s, col, m
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
With CreateObject("VBScript.RegExp")
.Global = True
.Pattern = "(?:\d|-|\?|\(\d+\)|\[\d+\]|\{\d+\})"
For Each m In .Execute(s)
col(col.Count) = m
Next
End With
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Loop method:
Sub Test_Loop()
Dim s, col, q, t, k, i
s = "3????0{1012}?121-2[101]--01221111(01)1"
Set col = CreateObject("Scripting.Dictionary")
q = "_"
t = True
k = 0
For i = 1 To Len(s)
t = (t Or InStr(1, ")]}", q) > 0) And InStr(1, "([{", q) = 0
q = Mid(s, i, 1)
If t Then k = k + 1
col(k) = col(k) & q
Next
MsgBox Join(col.items) ' 3 ? ? ? ? 0 {1012} ? 1 2 1 - 2 [101] - - 0 1 2 2 1 1 1 1 (01) 1
End Sub
Something else to look at :)
Sub test()
'String to parse through
Dim aStr As String
'final string to print
Dim finalString As String
aStr = "3????0{1012}?121-2[101]--01221111(01)1"
'Loop through string
For i = 1 To Len(aStr)
'The character to look at
char = Mid(aStr, i, 1)
'Check if the character is an opening brace, curly brace, or parenthesis
Dim result As String
Select Case char
Case "["
result = loop_until_end(Mid(aStr, i + 1), "]")
i = i + Len(result)
result = char & result
Case "("
result = loop_until_end(Mid(aStr, i + 1), ")")
i = i + Len(result)
result = char & result
Case "{"
result = loop_until_end(Mid(aStr, i + 1), "}")
i = i + Len(result)
result = char & result
Case Else
result = Mid(aStr, i, 1)
End Select
finalString = finalString & result & " "
Next
Debug.Print (finalString)
End Sub
'Loops through and concatenate to a final string until the end_char is found
'Returns a substring starting from the character after
Function loop_until_end(aStr, end_char)
idx = 1
If (Len(aStr) <= 1) Then
loop_until_end = aStr
Else
char = Mid(aStr, idx, 1)
Do Until (char = end_char)
idx = idx + 1
char = Mid(aStr, idx, 1)
Loop
End If
loop_until_end = Mid(aStr, 1, idx)
End Function
Assuming the data is in column A starting in row 1 and that you want the results start in column B and going right for each row of data in column A, here is alternate method using only worksheet formulas.
In cell B1 use this formula:
=IF(OR(LEFT(A1,1)={"(","[","{"}),LEFT(A1,MIN(FIND({")","]","}"},A1&")]}"))),IFERROR(--LEFT(A1,1),LEFT(A1,1)))
In cell C1 use this formula:
=IF(OR(MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)={"(","[","{"}),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,MIN(FIND({")","]","}"},$A1&")]}",SUMPRODUCT(LEN($B1:B1))+1))-SUMPRODUCT(LEN($B1:B1))),IFERROR(--MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1),MID($A1,SUMPRODUCT(LEN($B1:B1))+1,1)))
Copy the C1 formula right until it starts giving you blanks (there are no more items left to split out from the string in the A cell). In your example, need to copy it right to column AA. Then you can copy the formulas down for the rest of your Column A data.

how to do align outcome values in python

i am having trouble with aligning outcome values.
Alist = ["1,25,999",
"123.4,56.7890,13.571",
"1,23.45,6,7.8"]
c = 0
while c < len(Alist):
r = 0
tokens = Alist[c].split(',')
while r < len(Alist[c].split(',')):
if '.' in tokens[r]:
print "%7.2f" %float(tokens[r]), " ",
else :
print "%3d" %float(tokens[r]), " ",
r += 1
print
c += 1
I want to print such as
1 25 999
123.40 56.79 13.57
1 23.45 6. 7.80
but somehow it is printing
1
25
999
123.40
56.79
13.57
1
23.45
6
7.8
and i cannot figure out what is wrong with my coding.
after the r+1, you have a lone print statement. it is at the wrong indention level - move it to the left by 4 spaces (or one tab) and it should work fine.
The print statement should'nt in the 2nd while loop. just:
Alist = ["1,25,999",
"123.4,56.7890,13.571",
"1,23.45,6,7.8"]
c = 0
while c < len(Alist):
r = 0
tokens = Alist[c].split(',')
while r < len(Alist[c].split(',')):
if '.' in tokens[r]:
print "%7.2f" %float(tokens[r]), " ",
else :
print "%3d" %float(tokens[r]), " ",
r += 1
print
c += 1
In [59]: %paste
myList = ["1,25,999",
"123.4,56.7890,13.571",
"1,23.45,6,7.8"]
rows = [r.split(',') for r in myList]
widths = {i:max(len(c) for c in col) for i,col in enumerate(itertools.izip_longest(*rows, fillvalue=""))}
for row in rows:
for i,val in enumerate(row):
print " "*((widths[i] - len(val))/2), val, " "*((widths[i] - len(val))/2) if not (widths[i]-len(val))%2 else " "*((widths[i] - len(val)+1)/2),
print
## -- End pasted text --
1 25 999
123.4 56.7890 13.571
1 23.45 6 7.8

How to print part of a line PART 2

Using Groovy, I wish to grab two parts of a tab-separated line. Take the example line:
one fish two fish red fish blue fish ----(each character tab /t separated)
Suppose I want to print one and then I want to print red fish blue
How can I do this?
Alternatively, suppose I want to print one and then a count of the number of characters (words) following red? Or between two and blue?
A previous question yielded this response for printing everything following a certain part of the line:
c = ~/.*red(.*)/
m = line =~ c
if (m) {
println m[0][1]
}
to yield fish blue fish but I'm not comptetent enough with regex's to modify this appropriately. I've tried a few iterations, inserting /t in there and modifying my capturing expression but I've not figured it out. This is three or four questions in one, any help is appreciated. Thanks!!
def a = [:].withDefault{[]}
def b = [:].withDefault{[]}
def c = 0
def d = 0
def e = 0
def f = 0
seuss = "one\tfish\ttwo\tfish\tred\tfish\tblue\tfish"
a = seuss.split (/\t/)
for (i =0; i<a.size(); i++) {
if (d != 0) {
c = c + 1
}
if ( a[i] == "red") {
d = i
}
}
println a[4] + '\t' + c
for (i =0; i<a.size(); i++) {
if ( a[i] == "blue") {
e = 0
}
if (e != 0) {
f = f + 1
}
if ( a[i] == "two") {
e = i
}
}
println a[0] + '\t' + f