I need to turn phonenumbers into international format. I have a list of phone number examples
rows = [
(datetime.time(20, 35, 30), '0707262078',),
(datetime.time(20, 38, 18), '+46706332602',),
(datetime.time(20, 56, 35), '065017063'),
(datetime.time(21, 45, 1), '+46730522807',),
(datetime.time(22, 13, 47), '0046733165812')
I need to replace all numbers starting with ex 07 with +467, all 06 with +466 and 00 with +. For above example I need the number to turn out 0707262078 to +46707262078, 065017063 to +4665017063 and 0046733165812 to +46733165812.
Dont know if its possible to do this in regex only or if I need to do it with other code.
Been trying with re.sub combined with lamda, my thought is to make a dictionary with the matching replaces like this:
repl_dict = {
'01': '+461',
'02': '+462',
'03': '+463',
'04': '+464',
'05': '+465',
'06': '+466',
'07': '+467',
'08': '+468',
'09': '+469',
'00': '+'
My try so far:
import re
for row in rows:
regex = re.compile(r'^\d{1}[0-9](\d*)'), re.S
DialedNumber = regex.sub(lambda match: repl_dict.get(, row[1]), row[1], row[1])

Your regex, ending in \d*, will match the entire number, and hence no entry is found in the dict. Also, there seems to be an unmatched parens and one too many row[1] in the call to sub.
You can simplify your regex to ^00? and your replacements dict to {'00': '+', '0': '+46'}. This will check whether the number starts with either one or two 0, making the replacement dict much simpler and less repetetive.
rows = [(datetime.time(20, 35, 30), '0707262078',), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel ',), (datetime.time(20, 56, 35), '065017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel ',), (datetime.time(22, 13, 47), '0046733165812')]
repl_dict = {'00': '+', '0': '+46'}
regex = re.compile(r'^00?')
for date, number in rows:
print(regex.sub(lambda match: repl_dict.get(, number))
+46706332602Ring via Mitel
+46730522807Ring via Mitel
If you only want the numeric part, you can pre- or postprocess the numbers with a second regex like [0-9+]*.

This is the naive approach based on repl_dict as given in your question.
def repl(match):
return repl_dict[]
pat = '^(' + '|'.join(repl_dict) + ')'
new_rows = [(tm, re.sub(pat, repl, ph)) for (tm, ph) in rows]
tobias_k's answer gives a better approach by improving your repl_dict and pattern.

Regex: ^[0-9]{2}
^ Asserts position at start of a line
[] Match a single character present in the list
{n} Matches exactly n times
Python code:
By #tobias_k you can use repl_dict.get(, instead of repl_dict.get( or
regex = re.compile(r'^[0-9]{2}')
for i in range(len(rows)):
rows[i] = (rows[i][0], regex.sub(lambda m: repl_dict.get( or, rows[i][1]))
[(datetime.time(20, 35, 30), '+46707262078'), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel '), (datetime.time(20, 56, 35), '+4665017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel '), (datetime.time(22, 13, 47), '+46733165812')]
Code demo

You can do it without a regex:
for row in rows:
for repl in repl_dict:
if row[1].startswith(repl):
print repl_dict[repl]+row[1][len(repl):]


How to add symbol in regular expression by condition?

I need to add 0 if a founded number is less than 10 in the regular expression:
My expressions:
searching string:
"createdAt": "($1)-($2)-($3)T($4):($5):($6)"
Input data:
"createdAt": [
Expected output data:
Actual output data (4 except 04 and 7 except 07):
So, how to add conditions in this case?
If you're ok with manipulating the array:
const json = {
"createdAt": [
const formattedJson = => n.toString().padStart(2, 0))
Or using regex on the date string:
const date = "2022-4-15T13:43:7"
const formattedDate = date.replace(/\b(\d)\b/g, '0$1')
Breakdown of /\b(\d)\b/g
\b - Word boundary. In this case we're wrapping a boundary around any single digit.
\d - Any digit 0-9
() - Capturing group. This allows us to reference anything between the parentheses at a later point.
This is the replacement '0$1'. The $1 is how we reference the capturing group. If we had multiple capturing groups, we'd reference the second one with $2, the third with $3, etc..
In this case we're just saying - look for any single digit and replace it with a 0 and the digit itself.

Flutter - Dart - remove hidden character from String input

I'm trying to compare 2 strings, however my comparison always fails.
For reference, the one string is a filename I'm getting from the phones storage and it look like it ends with an apostrophe, although its not visible anywhere.
Please consider the following dart code:
import 'dart:convert';
void main() {
const Utf8Codec utf8 = Utf8Codec();
String input = 'chatnum.txt';
String stringwithapostrophe = 'chatnum.txt\'';
String compInput = utf8.encode(input).toString();
String compComp = utf8.encode(stringwithapostrophe).toString();
print (compInput);
print (compComp);
if (compInput == compComp) {
print ('Yes it matches');
} else {
print ('No it does not');
This output's a result of:
[99, 104, 97, 116, 110, 117, 109, 46, 116, 120, 116]
[99, 104, 97, 116, 110, 117, 109, 46, 116, 120, 116, 39]
No it does not
So how can I remove that last apostrophe from the String?
I've tried .removeAt and .removeLast. But I just can't crack this.
I applied regex to it. That sorted it:
String filenametosend = (basename(f.toString()))
"]"), '');
This way too:
final apostrophe = '\'';
final length = stringwithapostrophe.length;
if (length > 0 && stringwithapostrophe[length - 1] == apostrophe) {
stringwithapostrophe = stringwithapostrophe.substring(0, length - 1);
Or this way (remove all):
final apostrophe = '\'';
stringwithapostrophe = stringwithapostrophe.replaceAll(apostrophe, '');
Remove (any) last:
final length = stringwithapostrophe.length;
stringwithapostrophe = length > 0
? stringwithapostrophe.substring(0, length - 1)
: stringwithapostrophe;

Value between opening and closing bracket [duplicate]

I'm trying to match a mathematical-expression-like string, that have nested parentheses.
import re
p = re.compile('\(.+\)')
str = '(((1+0)+1)+1)'
print p.findall(s)
I wanted it to match all the enclosed expressions, such as (1+0), ((1+0)+1)...
I don't even care if it matches unwanted ones like (((1+0), I can take care of those.
Why it's not doing that already, and how can I do it?
As others have mentioned, regular expressions are not the way to go for nested constructs. I'll give a basic example using pyparsing:
import pyparsing # make sure you have this installed
thecontent = pyparsing.Word(pyparsing.alphanums) | '+' | '-'
parens = pyparsing.nestedExpr( '(', ')', content=thecontent)
Here's a usage example:
>>> parens.parseString("((a + b) + c)")
( # all of str
( # ((a + b) + c)
( # (a + b)
['a', '+', 'b'], {}
), # (a + b) [closed]
], {}
) # ((a + b) + c) [closed]
], {}
) # all of str [closed]
(With newlining/indenting/comments done manually)
Edit: Modified to eliminate unnecessary Forward, as per Paul McGuire's suggestions.
To get the output in nested list format:
res = parens.parseString("((12 + 2) + 3)")
[[['12', '+', '2'], '+', '3']]
There is a new regular engine module being prepared to replace the existing one in Python. It introduces a lot of new functionality, including recursive calls.
import regex
s = 'aaa(((1+0)+1)+1)bbb'
result ='''
(?<rec> #capturing group rec
\( #open parenthesis
(?: #non-capturing group
[^()]++ #anyting but parenthesis one or more times without backtracking
| #or
(?&rec) #recursive substitute of group rec
\) #close parenthesis
['(1+0)', '((1+0)+1)', '(((1+0)+1)+1)']
Related bug in regex:
Regex languages aren't powerful enough to matching arbitrarily nested constructs. For that you need a push-down automaton (i.e., a parser). There are several such tools available, such as PLY.
Python also provides a parser library for its own syntax, which might do what you need. The output is extremely detailed, however, and takes a while to wrap your head around. If you're interested in this angle, the following discussion tries to explain things as simply as possible.
>>> import parser, pprint
>>> pprint.pprint(parser.st2list(parser.expr('(((1+0)+1)+1)')))
[7, '('],
[7, '('],
[8, ')']]]]],
[14, '+'],
[318, [2, '1']]]]]]]]]]]]]]]],
[8, ')']]]]]]]]]]]]]]]],
[4, ''],
[0, '']]
You can ease the pain with this short function:
def shallow(ast):
if not isinstance(ast, list): return ast
if len(ast) == 2: return shallow(ast[1])
return [ast[0]] + [shallow(a) for a in ast[1:]]
>>> pprint.pprint(shallow(parser.st2list(parser.expr('(((1+0)+1)+1)'))))
[318, '(', [314, [318, '(', [314, '1', '+', '0'], ')'], '+', '1'], ')'],
The numbers come from the Python modules symbol and token, which you can use to build a lookup table from numbers to names:
map = dict(token.tok_name.items() + symbol.sym_name.items())
You could even fold this mapping into the shallow() function so you can work with strings instead of numbers:
def shallow(ast):
if not isinstance(ast, list): return ast
if len(ast) == 2: return shallow(ast[1])
return [map[ast[0]]] + [shallow(a) for a in ast[1:]]
>>> pprint.pprint(shallow(parser.st2list(parser.expr('(((1+0)+1)+1)'))))
['atom', '(', ['arith_expr', '1', '+', '0'], ')'],
The regular expression tries to match as much of the text as possible, thereby consuming all of your string. It doesn't look for additional matches of the regular expression on parts of that string. That's why you only get one answer.
The solution is to not use regular expressions. If you are actually trying to parse math expressions, use a real parsing solutions. If you really just want to capture the pieces within parenthesis, just loop over the characters counting when you see ( and ) and increment a decrement a counter.
Stack is the best tool for the job: -
import re
def matches(line, opendelim='(', closedelim=')'):
stack = []
for m in re.finditer(r'[{}{}]'.format(opendelim, closedelim), line):
pos = m.start()
if line[pos-1] == '\\':
# skip escape sequence
c = line[pos]
if c == opendelim:
elif c == closedelim:
if len(stack) > 0:
prevpos = stack.pop()
# print("matched", prevpos, pos, line[prevpos:pos])
yield (prevpos, pos, len(stack))
# error
print("encountered extraneous closing quote at pos {}: '{}'".format(pos, line[pos:] ))
if len(stack) > 0:
for pos in stack:
print("expecting closing quote to match open quote starting at: '{}'"
In the client code, since the function is written as a generator function simply use the for loop pattern to unroll the matches: -
line = '(((1+0)+1)+1)'
for openpos, closepos, level in matches(line):
print(line[openpos:closepos], level)
This test code produces following on my screen, noticed the second param in the printout indicates the depth of the parenthesis.
1+0 2
(1+0)+1 1
((1+0)+1)+1 0
From a linked answer:
From the LilyPond convert-ly utility (and written/copyrighted by myself, so I can show it off here):
def paren_matcher (n):
# poor man's matched paren scanning, gives up
# after n+1 levels. Matches any string with balanced
# parens inside; add the outer parens yourself if needed.
# Nongreedy.
return r"[^()]*?(?:\("*n+r"[^()]*?"+r"\)[^()]*?)*?"*n
convert-ly tends to use this as paren_matcher (25) in its regular expressions which is likely overkill for most applications. But then it uses it for matching Scheme expressions.
Yes, it breaks down after the given limit, but the ability to just plug it into regular expressions still beats the "correct" alternatives supporting unlimited depth hands-down in usability.
Balanced pairs (of parentheses, for example) is an example of a language that cannot be recognized by regular expressions.
What follows is a brief explanation of the math for why that is.
Regular expressions are a way of defining finite state automata (abbreviated FSM). Such a device has a finite amount of possible state to store information. How that state can be used is not particularly restricted, but it does mean that there are an absolute maximum number of distinct positions it can recognize.
For example, the state can be used for counting, say, unmatched left parentheses. But because the amount of state for that kind of counting must be completely bounded, then a given FSM can count to a maximum of n-1, where n is the number of states the FSM can be in. If n is, say, 10, then the maximum number of unmatched left parenthesis the FSM can match is 10, until it breaks. Since it's perfectly possible to have one more left parenthesis, there is no possible FSM that can correctly recognize the complete language of matched parentheses.
So what? Suppose you just pick a really large n? The problem is that as a way of describing FSM, regular expressions basically describe all of the transitions from one state to another. Since for any N, an FSM would need 2 state transitions (one for matching a left parenthesis, and one for matching right), the regular expression itself must grow by at least a constant factor multiple of n
By comparison, the next better class of languages, (context free grammars) can solve this problem in a totally compact way. Here's an example in BNF
expression ::= `(` expression `)` expression
| nothing
I believe this function may suit your need, I threw this together fast so feel free to clean it up a bit. When doing nests its easy to think of it backwards and work from there =]
def fn(string,endparens=False):
exp = []
idx = -1
for char in string:
if char == "(":
idx += 1
elif char == ")":
idx -= 1
if idx != -1:
exp[idx] = "(" + exp[idx+1] + ")"
exp[idx] += char
if endparens:
exp = ["("+val+")" for val in exp]
return exp
You can use regexps, but you need to do the recursion yourself. Something like the following does the trick (if you only need to find, as your question says, all the expressions enclosed into parentheses):
import re
def scan(p, string):
found = p.findall(string)
for substring in found:
stripped = substring[1:-1]
found.extend(scan(p, stripped))
return found
p = re.compile('\(.+\)')
string = '(((1+0)+1)+1)'
all_found = scan(p, string)
print all_found
This code, however, does not match the 'correct' parentheses. If you need to do that you will be better off with a specialized parser.
Here is a demo for your question, though it is clumsy, while it works
import re s = '(((1+0)+1)+1)'
def getContectWithinBraces( x , *args , **kwargs):
ptn = r'[%(left)s]([^%(left)s%(right)s]*)[%(right)s]' %kwargs
Res = []
res = re.findall(ptn , x)
while res != []:
Res = Res + res
xx = x.replace('(%s)' %Res[-1] , '%s')
res = re.findall(ptn, xx)
if res != []:
res[0] = res[0] %('(%s)' %Res[-1])
return Res
getContectWithinBraces(s , left='\(\[\{' , right = '\)\]\}')
my solution is that: define a function to extract content within the outermost parentheses, and then you call that function repeatedly until you get the content within the innermost parentheses.
def get_string_inside_outermost_parentheses(text):
content_p = re.compile(r"(?<=\().*(?=\))")
r =
def get_string_inside_innermost_parentheses(text):
while '(' in text:
text = get_string_inside_outermost_parentheses(text)
return text
You should write a proper parser for parsing such expression (e.g. using pyparsing).
Regular expressions are not an appropriate tool for writing decent parsers.
Many posts suggest that for nested braces,
For example, see: Regular expression to detect semi-colon terminated C++ for & while loops
Here is a complete python sample to iterate through a string and count braces:
# decided for nested braces to not use regex but brace-counting
import re, string
texta = r'''
nonexistent.\note{Richard Dawkins, \textit{Unweaving the Rainbow: Science, Delusion
and the Appetite for Wonder} (Boston: Houghton Mifflin Co., 1998), pp. 302, 304,
306-309.} more text and more.
This is a statistical fact, not a
guess.\note{Zheng Wu, \textit{Cohabitation: An Alternative Form
of Family Living} (Ontario, Canada: Oxford University Press,
2000), p. 149; \hbox{Judith} Treas and Deirdre Giesen, ``Title
and another title,''
\textit{Journal of Marriage and the Family}, February 2000,
more and more text.capitalize
pos = 0
foundpos = 0
openBr = 0 # count open braces
while foundpos <> -1:
openBr = 0
foundpos = string.find(texta, r'\note',pos)
# print 'foundpos',foundpos
pos = foundpos + 5
# print texta[pos]
result = ""
while foundpos > -1 and openBr >= 0:
pos = pos + 1
if texta[pos] == "{":
openBr = openBr + 1
if texta[pos] == "}":
openBr = openBr - 1
result = result + texta[pos]
result = result[:-1] # drop the last } found.
result = string.replace(result,'\n', ' ') # replace new line with space
print result

Dictionary: Alphabetize the elements of a list and count its occurences

Hi so I've been trying to count the elements in the list that I have made, and when I do it
The result should be:
a 2
above 2
across 1
and etc..
here's what Ive got:
word = []
with open('Lateralus.txt', 'r') as my_file:
for line in my_file:
temporary_holder = line.split()
for i in temporary_holder:
for i in range(0,len(word)): word[i] = word[i].lower()
for count in word:
if count in word:
word[count] = word[count] + 1
word[count] = 1
for (word,many) in word.items():
#Kimberly, as I understood from your code, you want to read a text file of alphabetic characters.
You want to also ignore the cases of alphabetic characters in file. Finally, you want to count the occurences of each unique letters in the text file.
I will suggest you to use dictionary for this. I have written a sample code for this task which
satisfy the following 3 conditions (please comment if you want different result by providing inputs and expected outputs, I will update my code based on that):
Reads text file and creates a single line of text by removing any spaces in between.
It converts upper case letters to lower case letters.
Finally, it creates a dictionary containing unique letters with their frequencies.
» Lateralus.txt
ab abc ab c
» Code
import json
char_occurences = {}
with open('Lateralus.txt', 'r') as file:
all_lines_combined = ''.join([line.replace(' ', '').strip().lower() for line in file.readlines()])
print all_lines_combined # abcdefghijkabcdefgkjhiihdcabefgkjmkmkmkmkmoopkdpkdpkdababcdfqababcabc
print len(all_lines_combined) # 69 (7 lines of 11 characters, 8 spaces => 77-8 = 69)
while all_lines_combined:
ch = all_lines_combined[0]
char_occurences[ch] = all_lines_combined.count(ch)
all_lines_combined = all_lines_combined.replace(ch, '')
# Pretty printing char_occurences dictionary containing occurences of
# alphabetic characters in a text file
print json.dumps(char_occurences, indent=4)
"a": 8,
"c": 6,
"b": 8,
"e": 3,
"d": 7,
"g": 3,
"f": 4,
"i": 3,
"h": 3,
"k": 10,
"j": 3,
"m": 5,
"o": 2,
"q": 1,
"p": 3

VBSCript - Access - Clean Text Escape Regex

Am using a vbscript file. -> .vbs extension file.
To insert pieces of text into the access database.
Basically need to be able to put whatever characters are possible to be inserted , without throwing much issues.
Using This :
Function CleanUp (input)
Dim objRegExp, outputStr
Set objRegExp = New Regexp
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "[^\w+]"
outputStr = objRegExp.Replace(input, " ")
CleanUp = outputStr
End Function
But missing out a lot of special characters , just want be able to have the most commonly used characters like brackets , percentage , dot , comma etc inserted safely.
Can you suggest a better Regex.
Help with Parameter Query :
I am using a .vbs file to perform my insert , basically a script file which I execute on my system to populate text files into access .mdb file.
Dim objConn,objRS,strSQL,rsins
Set objConn = CreateObject("ADODB.Connection")
Set objRS = CreateObject("ADODB.Recordset")
filenpath = "D:\MDBFILES\"
filenname = "test.mdb"
objConn.Open("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ="& filenpath & filenname)
strSQL = "insert into [mytable] (F1,F2,F3Date,F4,F5Integer,F6Double) values
('" & rdoc & "','" & rtype & "','" & CDate(rdate) & "','" &
CleanUp(Trim(arrCells(0))) & "','" & CDbl(arrCells(1)) & "','" &
CDbl(Trim(arrCells(2))) & "')"
set rsins = objConn.Execute(strSQL)
This works perfectly for me. The insert statement is within a loop , where the values are updated continuously.
Please advise how to create a parameter query and set the parameters with each execution.
Some notes on a parameter query:
Set cmd = CreateObject("ADODB.Command")
cmd.ActiveConnection = con ''A connection
cmd.CommandType = 4 ''adCmdStoredProc =4, A stored query will be used
cmd.CommandText = "TheNameOfThequery"
''adInteger=3, adVarWChar = 202
''Parameters are in the same order in which they occur in the query
cmd.Parameters.Append cmd.CreateParameter("#param1", 3, 1, , param1)
cmd.Parameters.Append cmd.CreateParameter("#param2", 202, 1, 50, param2)
''Action query, so execute
Edit re new information
strSQL = "insert into [mytable] (F1,F2,F3Date,F4,F5Integer,F6Double) "
strSQL = strSQL & " Values (#1,#2,#3,#4,#5,#6)"
Set cmd = CreateObject("ADODB.Command")
cmd.ActiveConnection = objConn
cmd.CommandType = 1 ''adCmdStoredProc =4, adCmdText=1
cmd.CommandText = strSQL
''adInteger=3, adVarWChar = 202, adDate = 7
''Parameters are in the same order in which they occur in the query
cmd.Parameters.Append cmd.CreateParameter("#1", 202, 1, 50, rdoc)
cmd.Parameters.Append cmd.CreateParameter("#2", 202, 1, 50, rtype)
''Not sure about this, because you have quotes on your date, so it may be text
cmd.Parameters.Append cmd.CreateParameter("#3", 7, 1, , CDate(rdate))
cmd.Parameters.Append cmd.CreateParameter("#4", 202, 1, 50, Trim(arrCells(0)))
cmd.Parameters.Append cmd.CreateParameter("#5", 202, 1, 50, Trim(arrCells(1)))
cmd.Parameters.Append cmd.CreateParameter("#6", 202, 1, 50, Trim(arrCells(2)))
''Action query, so execute
cmd.Execute recs
''msgbox "updated " & recs
You can update text into Access in one statement, but you would need a schema.ini, because you have a non-standard delimiter, for example Handle TransferText Errors
As an aside, I would be inclined to use:
objConn.Open("Provider=Microsoft.ACE.OLEDB.12.0;Data Source="& filenpath & filenname)
objConn.Open("Provider=Microsoft.Jet.OLEDB.4.0;Data Source="& filenpath & filenname)
Common practice for sanitizing input is to define a list of valid characters and replace all non-matching characters with a safe character. Space usually are not considered safe. It's better to use underscores instead.
objRegExp.Global = True
objRegExp.Pattern = "[^a-zA-ZäÄöÖüÜ0-9.,()_-]"
outputStr = objRegExp.Replace(input, "_")