Regex match string where symbol is not repeated - regex

I have like this strings:
group items % together into% FALSE
characters % that can match any single TRUE
How I can match sentences where symbol % is not repeated?
I tried like this pattern but it's found first match sentence with symbol %
[%]{1}

You may use this regex in python to return failure for lines that have more than one % in them:
^(?!([^%]*%){2}).+
RegEx Demo
(?!([^%]*%){2}) is a negative lookahead that fails the match if % is found twice after line start.

You could use re.search as follows:
items = ['group items % together into%', 'characters % that can match any single']
for item in items:
output = item
if re.search(r'^.*%.*%.*$', item):
output = output + ' FALSE'
else:
output = output + ' TRUE'
print(output)
This prints:
group items % together into% FALSE
characters % that can match any single TRUE

Just count them (Python):
>>> s = 'blah % blah %'
>>> s.count('%') == 1
False
>>> s = 'blah % blah'
>>> s.count('%') == 1
True
With regex:
>>> re.match('[^%]*%[^%]*$','gfdg%fdgfgfd%')
>>> re.match('[^%]*%[^%]*$','blah % blah % blah')
>>> re.match('[^%]*%[^%]*$','blah % blah blah')
<re.Match object; span=(0, 16), match='blah % blah blah'>
re.match must match from start of string, use ^ (match start of string) if using re.search, which can match in the middle of a string.
>>> re.search('^[^%]*%[^%]*$','gfdg%fdgfgfd%')
>>> re.search('^[^%]*%[^%]*$','gfdg%fdgfgfd')
<re.Match object; span=(0, 12), match='gfdg%fdgfgfd'>

I am assuming that "sentence" in your question is the same as a line in the input text. With that assumption, you can use the following:
^[^%\r\n]*(%[^%\r\n]*)?$
This, along with the multi-line and global flags, will match all lines in the input string that contain 0 or 1 '%' symbols.
^ matches the start of a line
[^%\r\n]* matches 0 or more characters that are not '%' or a new line
(...)? matches 0 or 1 instance of the contents in parentheses
% matches '%' literally
$ matches the end of a line

Related

Trouble sorting a list after using regex

The code below is parsing data from this text sample:
rf-Parameters-v1020
supportedBandCombination-r10: 128 items
Item 0
BandCombinationParameters-r10: 1 item
Item 0
BandParameters-r10
bandEUTRA-r10: 2
bandParametersUL-r10: 1 item
Item 0
CA-MIMO-ParametersUL-r10
ca-BandwidthClassUL-r10: a (0)
bandParametersDL-r10: 1 item
Item 0
CA-MIMO-ParametersDL-r10
ca-BandwidthClassDL-r10: a (0)
supportedMIMO-CapabilityDL-r10: fourLayers (1)
I am having trouble replacing the first 'a' from the "ca-BandwidthClassUL-r10" line with 'u' and placing it before 'm' in the final output: [2 a(0) u m]
import re
regex = r"bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*\r?\nca-BandwidthClassUL-r10*: *(\w.*)(" \
r"?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(" \
r"\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*) "
regex2 = r"^.*bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*\r?\nca-BandwidthClassUL-r10*: *(\w.*)(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*)(?:\r?\n(?!bandEUTRA-r10:).*)*\r?\nbandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *(\w.*)\nsupportedMIMO-CapabilityDL-r10: *(.*)"
my_file = open("files.txt", "r")
content = my_file.read().replace("fourLayers", 'm').replace("twoLayers", " ")
#print(content)
#if 'BandCombinationParameters-r10: 1 item' in content:
result = ["".join(m) for m in re.findall(regex, content, re.MULTILINE)]
print(result)
You might use an optional part where you capture group 2.
Then you can print group 3 concatenated with u if there is group 2, else only print group 3.
As you are already matching the text in the regex, you don't have to do the separate replacement calls. You can use the text in the replacement itself.
bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*(?:\r?\n(ca-BandwidthClassUL-r10)?: *(\w.*))(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *\w.*\nsupportedMIMO-CapabilityDL-r10:
Regex demo | Python demo
For example
import re
regex = r"bandEUTRA-r10: *(\d+)(?:\r?\n(?!ca-BandwidthClassUL-r10:).*)*(?:\r?\n(ca-BandwidthClassUL-r10)?: *(\w.*))(?:\r?\n(?!ca-BandwidthClassDL-r10:).*)*\r?\nca-BandwidthClassDL-r10*: *\w.*\nsupportedMIMO-CapabilityDL-r10:"
s = "here the example data with and without ca-BandwidthClassUL-r10"
matches = re.finditer(regex, s, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
result = "{0}{1} m".format(
match.group(1),
match.group(3) + " u" if match.group(2) else match.group(3)
)
print(result)
Output
2a (0) u m
2a (0) m

matching two or more characters that are not the same

Is it possible to write a regex pattern to match abc where each letter is not literal but means that text like xyz (but not xxy) would be matched? I am able to get as far as (.)(?!\1) to match a in ab but then I am stumped.
After getting the answer below, I was able to write a routine to generate this pattern. Using raw re patterns is much faster than converting both the pattern and a text to canonical form and then comaring them.
def pat2re(p, know=None, wild=None):
"""return a compiled re pattern that will find pattern `p`
in which each different character should find a different
character in a string. Characters to be taken literally
or that can represent any character should be given as
`know` and `wild`, respectively.
EXAMPLES
========
Characters in the pattern denote different characters to
be matched; characters that are the same in the pattern
must be the same in the text:
>>> pat = pat2re('abba')
>>> assert pat.search('maccaw')
>>> assert not pat.search('busses')
The underlying pattern of the re object can be seen
with the pattern property:
>>> pat.pattern
'(.)(?!\\1)(.)\\2\\1'
If some characters are to be taken literally, list them
as known; do the same if some characters can stand for
any character (i.e. are wildcards):
>>> a_ = pat2re('ab', know='a')
>>> assert a_.search('ad') and not a_.search('bc')
>>> ab_ = pat2re('ab*', know='ab', wild='*')
>>> assert ab_.search('abc') and ab_.search('abd')
>>> assert not ab_.search('bad')
"""
import re
# make a canonical "hash" of the pattern
# with ints representing pattern elements that
# must be unique and strings for wild or known
# values
m = {}
j = 1
know = know or ''
wild = wild or ''
for c in p:
if c in know:
m[c] = '\.' if c == '.' else c
elif c in wild:
m[c] = '.'
elif c not in m:
m[c] = j
j += 1
assert j < 100
h = tuple(m[i] for i in p)
# build pattern
out = []
last = 0
for i in h:
if type(i) is int:
if i <= last:
out.append(r'\%s' % i)
else:
if last:
ors = '|'.join(r'\%s' % i for i in range(1, last + 1))
out.append('(?!%s)(.)' % ors)
else:
out.append('(.)')
last = i
else:
out.append(i)
return re.compile(''.join(out))
You may try:
^(.)(?!\1)(.)(?!\1|\2).$
Demo
Here is an explanation of the regex pattern:
^ from the start of the string
(.) match and capture any first character (no restrictions so far)
(?!\1) then assert that the second character is different from the first
(.) match and capture any (legitimate) second character
(?!\1|\2) then assert that the third character does not match first or second
. match any valid third character
$ end of string

Regex to match a string based on particular character count [duplicate]

This question already has answers here:
Regular expression to match exact number of characters?
(2 answers)
Closed 4 years ago.
It is not a duplicate question, other questions are about the repetition of a regex and my question is how can we get/limit a particular character count in regex for validation, I am looking for a regex to match a string only when the string has the count of character ' as 1.
Example:
patt = #IDontKnow
s = "Shubham's"
if re.match(patt, s):
print ("The string has only ONE '")
else:
print ("The String has either less or more than ONE ' count")
I guess what you are looking for is this:
import re
pat = "^[^\']*\'[^\']*$"
print (re.match(pat, "aeh'3q4'bl;5hkj5l;ebj3'"))
print (re.match(pat, "aeh3q4'bl;5hkj5l;ebj3'"))
print (re.match(pat, "aeh3q4bl;5hkj5l;ebj3'"))
print (re.match(pat, "'"))
print (re.match(pat, ""))
Which gives output:
None
None
<_sre.SRE_Match object; span=(0, 21), match="aeh3q4bl;5hkj5l;ebj3'">
<_sre.SRE_Match object; span=(0, 1), match="'">
None
What "^[^\']*\'[^\']*$" does?
^ matches the beginning of string
[^\']* - * matches 0 or more characters from set defined in []. Here, we have a set negated using ^. The set is defined as one character - ', which is escaped so it looks \'. Altogether, this group matches any number of any characters except '
\' - matches one and only character '
$ - matches end of the string. Without it partial matches would be possible which could contain more ' characters. You can compare with above:
print (re.match("^[^\']*\'[^\']*", "aeh'3q4'bl;5hkj5l;ebj3'"))
<_sre.SRE_Match object; span=(0, 7), match="aeh'3q4">
Why not just use .count()?
s = "Shubham's"
if s.count("\'") == 1:
print ("The string has only ONE '")
else:
print ("The String has either less or more than ONE ' count")

Python 2.7 RE Search by condition

When I am using re.search, I have some problem.
For example:
a = '<span class="chapternum">1 </span>abc,def.</span>'
How can I search the number '1'?
Or how to search by matching digit start with ">" and end with writespace?
I tried:
test = re.search('(^>)(\d+)(\s$)', a)
print test
>> []
It is fail to get the number "1"
^ and $ indicate the beginning and the end of the string. If you get rid of them you have your answer:
>>> test = re.search('(>)(\d+)(\s)', a)
>>> test.groups()
('>', '1', ' ')
Not sure that you need the first and last groups though (capturing with parenthesis):
>>> a = '<span class="chapternum">23 </span>abc,def.</span>'
>>> test = re.search('>(\d+)\s', a)
>>> test.group(1)
'23'

Incrementing the last digit in a Python string

I'd like to increment the last digit of user provided string in Python 2.7.
I can replace the first digit like this:
def increment_hostname(name):
try:
number = re.search(r'\d+', name).group()
except AttributeError:
return False
number = int(number) + 1
number = str(number)
return re.sub(r'\d+', number, name)
I can match all the digits with re.findall then increment the last digit in the list but I'm not sure how to do the replace:
number = re.findall(r'\d+', name)
number = numbers[-1]
number = int(number) + 1
number = str(number)
Use negative look ahead to see that there are no digits after a digit, pass a function to the re.sub() replacement argument and increment the digit in it:
>>> import re
>>> s = "foo 123 bar"
>>> re.sub('\d(?!\d)', lambda x: str(int(x.group(0)) + 1), s)
'foo 124 bar'
You may also want to handle 9 in a special way, for example, replace it with 0:
>>> def repl(match):
... digit = int(match.group(0))
... return str(digit + 1 if digit != 9 else 0)
...
>>> s = "foo 789 bar"
>>> re.sub('\d(?!\d)', repl, s)
'foo 780 bar'
UPD (handling the new example):
>>> import re
>>> s = "f.bar-29.domain.com"
>>> re.sub('(\d+)(?!\d)', lambda x: str(int(x.group(0)) + 1), s)
'f.bar-30.domain.com'