I am not getting expected result with my python code. Pl check - regex

Expected Result = (444)333-4444', '444-555-3424'
Actual Result = [('(444)333-4444', '(444)', '', '333', '-', '4444', '', '', ''), ('444-555-3424', '444', '-', '555', '-', '3424', '', '', '')]
tell_op = re.compile(r'''(
(\d{3}|\(\d{3}\))? # area code
(\s|-|\.)? # separator
(\d{3}) # first 3 digits
(\s|-|\.) # separator
(\d{4}) # last 4 digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? # extension
)''', re.VERBOSE)
oo = tell_op.findall('this is my phone number (444)333-4444, 444-555-3424')
print(oo)

You can make all the inner groups non-capturing groups:
tell_op = re.compile(r'''(
(?:\d{3}|\(\d{3}\))? # area code
(?:\s|-|\.)? # separator
(?:\d{3}) # first 3 digits
(?:\s|-|\.) # separator
(?:\d{4}) # last 4 digits
(?:\s*(?:ext|x|ext.)\s*(?:\d{2,5}))? # extension
)''', re.VERBOSE)
This will give you
['(444)333-4444', '444-555-3424']

Related

Python: My regular expression doesn't match on some string cases

I try to creating a regular expression and so far I made this code:
(-?['0|1']{1}.[00000000e+00| ]?){1}\s(-?['0|1']{1}.[00000000e+00| ]?){1}
My goal is to detect pattern that ({string pattern}{blank}{string pattern}).
This is my string pattern:
'0'
'-0.'
'1.'
'-1.'
'1.00000000e+00'
'0.00000000e+00'
'-0.00000000e+00'
'-1.00000000e+00'
'0. ' (The blanks can be at least 1 to 8 characters long.)
'-0. ' (The blanks can be at least 1 to 8 characters long.)
'1. ' (The blanks can be at least 1 to 8 characters long.)
'-1. ' (The blanks can be at least 1 to 8 characters long.)
My code is mostly successful in test cases, but problems occur in some test cases.
(e.g. error occurred with '00000000e+00' or ' ')
Especially, it is too difficult for me because there can be at least 1 to 8 blank(' ') characters.
This is my test case:
['0. 0.']
['0. 1.']
['1. 0.']
['1. 1.']
['-0. -0.']
['-0. 0.']
['0. -0.']
['1. -0.']
['1. -1.']
['-1. 1.']
['-1. -1.']
['-1.00000000e+00 0.'] # Fail
['0. -1. '] # Fail
['0. 0. '] # Fail
['-0.00000000e+00 1.00000000e+00'] # Fail
['-0. 1.00000000e+00'] # Fail
Please give me some advice.
You could use
(-?[01]\.(?:00000000e\+00| {1,8})?)\s(-?[01]\.(?:00000000e\+00| {1,8})?)
The pattern matches:
( Capture group 1
-?[01]\. Match an optional - either 0 or 1 and a . (note to escape the dot)
(?: Non capture group for the alternation |
00000000e\+00| {1,8} Match either 00000000e+00 or 1-8 spaces
)? Close non capture group and make it optional
) Close group 1
\s Match a single whitespace char
(-?[01]\.(?:00000000e\+00| {1,8})?) Capture group 2, the same pattern as capture group 1
Regex demo
Note that \s could also match a newline, and if you want the match only you can omit the capture groups.
There is no language tagged, but if supported you might shorten the pattern recursing the first sub pattern as the pattern uses the same part twice.
(-?[01]\.(?:0{8}e\+00| {1,8})?)\s(?1)
Regex demo
Would you please try the following:
import re
l = ['0. 0.',
'0. 1.',
'1. 0.',
'1. 1.',
'-0. -0.',
'-0. 0.',
'0. -0.',
'1. -0.',
'1. -1.',
'-1. 1.',
'-1. -1.',
'-1.00000000e+00 0.',
'0. -1. ',
'0. 0. ',
'-0.00000000e+00 1.00000000e+00',
'-0. 1.00000000e+00']
for s in l:
if re.match(r'-?[0|1]\.?(?:0{8}e\+00|\s{1,8})?\s-?[0|1]\.?(?:0{8}e\+00|\s{1,8})?$', s):
print("match")
else:
print("no match")
Explanation of regex -?[0|1]\.?(?:0{8}e\+00|\s{1,8})?:
-? matches a dash character of length 0 or 1
[0|1]\.? matches 0 or 1 followed by an optional dot character
0{8}e\+00 matches a substring 00000000e+00
\s{1,8} matches whitespaces of length between 1 and 8
(?:0{8}e\+00|\s{1,8})? matches either or none of two regexes above
Apparently you have two false impressions.
You seem to think of [ ] as a group construct while it denotes a character class.
You seem to think you'd have to include the string delimiting quotes in the pattern.
Since one could interpret your question to the effect that you want to test for two numbers of -1, 0 or 1, and others already gave regex answers, here's a regex-free alternative for that problem:
test = ['0. 0.', '0. 1.', '1. 0.', '1. 1.', '-0. -0.', '-0. 0.', '0. -0.', '1. -0.',
'1. -1.', '-1. 1.', '-1. -1.', '-1.00000000e+00 0.', '0. -1. ', '0. 0. ',
'-0.00000000e+00 1.00000000e+00', '-0. 1.00000000e+00', 'x y', '-1 0 1']
for t in test:
print([t], end='\t')
s = t.split()
try:
if len(s) != 2: raise ValueError
for f in s:
g = float(f)
if g!=-1 and g!=0 and g!=1: raise ValueError
except ValueError:
print('Fail')
else:
print('Pass')

String Formatting in Python2.7 with result from Net commands

I created a batch file that writes user names to a file. It works perfectly and cleans up net user and writes the user names to a file so it would look like this:
Administrator Michael Guest
Pianoman Billy George
I don't know how many usernames there will be so my question is: how can I clean up this white space between the undetermined number of names since I don't know the length of names I'll be dealing with and thus not know how many spaces there will be.
My python program is supposed to read these names from a file and turn them into a list. I was planning on just using .split(" ") so ideally someone could suggest a way to get the difference down to one space between each name. I already looked at .format method, and it doesn't seem to be up to the task. I'm also open if there is a somewhat readable way (doubtable) to format this in batch.
BTW: I considered simply redirecting the output from dir /B C:\Users but this doesn't work in situation.
Use .split() without sep argument:
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the optional second
argument sep is absent or None, the words are separated by
arbitrary strings of whitespace characters (space, tab, newline,
return, formfeed). If the second argument sep is present and not
None, it specifies a string to be used as the word separator. The
returned list will then have one more item than the number of
non-overlapping occurrences of the separator in the string. If
maxsplit is given, at most maxsplit number of splits occur, and
the remainder of the string is returned as the final element of the
list (thus, the list will have at most maxsplit+1 elements). If
maxsplit is not specified or -1, then there is no limit on the
number of splits (all possible splits are made).
The behavior of split on an empty string depends on the value of
sep. If sep is not specified, or specified as None, the result
will be an empty list. If sep is specified as any string, the result
will be a list containing one element which is an empty string.
Example:
>>> x='Administrator CLIENT1 Guest'
>>> x.split(' ')
['Administrator', '', '', '', '', '', '', '', '', '', '', '', 'CLIENT1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '','Guest']
>>> x.split()
['Administrator', 'CLIENT1', 'Guest']
>>>
Another approach:
>>> import string
>>> x='Administrator CLIENT1 Guest'
>>> string.split(x,' ')
['Administrator', '', '', '', '', '', '', '', '', '', '', '', 'CLIENT1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '','Guest']
>>> string.split(x)
['Administrator', 'CLIENT1', 'Guest']
>>>

javascript made '', after split regex

Hello i'm trying regex string.
Just i want remove number. I have this string
'dfmaks1,4412klaikd33,442'
var re = new RegExp("[0-9\,]");
var test = 'dfmaks1,4412klaikd33,442';
console.log(test.split(re));
Up in code result is
[ 'dfmaks', '', '', '', '', '', 'klaikd', '', '', '', '', '', '' ]
Why make '',s?
enter image description here
You may "invert" your regex to match any char other than digit/comma (i.e. this negated character class - [^0-9,]), 1 or more repetitions (add + after the character class), using
var test = 'dfmaks1,4412klaikd33,442';
console.log(test.match(/[^0-9,]+/g));
The String#match, when used with a regex with a global modifier (/g), will yield an array of all found non-overlapping match values.

Regex: how to separate strings by apostrophes in certain cases only

I am looking to capitalize the first letter of words in a string. I've managed to put together something by reading examples on here. However, I'm trying to get any names that start with O' to separate into 2 strings so that each gets capitalized. I have this so far:
\b([^\W_\d](?!')[^\s-]*) *
which omits selecting the X' from any string X'XYZ. That works for capitalizing the part after the ', but doesn't capitalize the X'. Further more, i'm becomes i'M since it's not specific to O'. To state the goal:
o'malley should go to O'Malley
o'malley's should go to O'Malley's
don't should go to Don't
i'll should go to I'll
(as an aside, I want to omit any strings that start with numbers, like 23F, that seems to work with what I have)
How to make it specific to the strings that start with O'? Thx
if you use the following pattern:
([oO])'([\w']+)|([\w']+)
then you can access each word by calling:
match[0] == 'o' || match[1] == 'name' #if word is "o'name"
match[2] == 'word' #if word is "word"
if it is one of the two above, the others will be blank, ie if word == "word" then
match[0] == match[1] == ""
since there is no o' prefix.
Test Example:
>>> import re
>>> string = "o'malley don't i'm hello world"
>>> match = re.findall(r"([oO])'([\w']+)|([\w']+)",string)
>>> match
[('o', 'malley', ''), ('', '', "don't"), ('', '', "i'm"), ('', '', 'hello'), ('', '', 'world')]
NOTE: This is for python. This MIGHT not work for all engines.

regexp rule which returns column entries of text database

given a simple delimiter separated text database, I want to construct a regexp rule, which returns the column / field entries.
given the following two example lines
entry1 = '|123|some|string |101112 |'
entry2 = '|123|some| |101112 |'
i want to get the following output:
values1 = '123', 'some', 'string', '101112'
values2 = '123', 'some', '', '101112'
so far I'm using the following regexp and regexprep combination:
values = regexp(regexprep(entry '[\s]', ''), '\|', 'split')
which unfortunately returns the following:
values1 = '' '123' 'some' 'string' '101112' ''
values2 = '' '123' 'some' '' '101112' ''
but I want to get (no extra '' before the 123 and not extra '' after '101112'):
values1 = '123', 'some', 'string', '101112'
values2 = '123', 'some', '', '101112'
given my regexp rule, why do I get the '' at the beginning and the end? How do I have to change my regexp rule, to only return the field values?
I am not sure it is exactly what you are asking for, but you can use strread:
strread(entry1(2:end),'%d','delimiter','|')
ans =
123
456
789
101112
Empty strings are there because you tell matlab to split at | characters. And splitting means that you cut there. If there is nothing before |, you'll get empty string. For example, splitting this (subresult after regexprep):
'|123|456|789|101112|'
results in (imagine cutting the string at |):
'', '123', '456', '789', '101112', ''
So, either split the string between the first and the last |:
nospaces = regexprep(entry, '\s', '')
betweenpipes = nospaces(2:size(nospaces,2)-1)
values = regexp(betweenpipes, '\|', 'split')
..or don't use split at all and just search for the required pattern:
regexp(entry, '(?=\)(?:\s*)(\d+)(?:\s*)(?=\)', 'match')
Regexp explained:
look for |, but don't remember it: (?=\|)
skip possible whitespace but don't remember it: (?:\s*)
match a number: (\d+)
skip possible whitespace but don't remember it: (?:\s*)
look for |, but don't remember it: (?=\|)
I'm writing this from memory as I don't have matlab here, so there may be some bugs..