Regex for matching groups but excluding a specific combination of groups

Regex for matching groups but excluding a specific combination of groups - regex

I'm trying to match two groups in an expression, each group represents a single letter in initials as part of a name, for example in George R. R. Martin the first group would match the first R and the second group would match the second R, I have something like this:
\b([a-zA-Z])[\.{0,1} {0,1}]{1,2}([a-zA-Z])\b
However, I'd like to exclude a specific combination of those groups, say when the first group matches the letter d and the second group matches the letter r.
Is that possible?

You may restrict matches with a negative lookahead:
\b(?![dD]\.? ?[rR]\b)([a-zA-Z])\.? ?([a-zA-Z])\b
^^^^^^^^^^^^^^^^^^^
See the regex demo
Note:
The (?![dD]\.? ?[rR]\b) lookahead should be better placed after the word boundary, so that the check only gets triggered upon encountering a word boundary, not at every location in string
The lookahead is negative, it fails the match if the pattern inside it matches the text
It matches: a d or D with [dD], then an optional literal dot with \.?, an optional space with ?, an r or R with [rR] and a trailing word boundary \b.
The main pattern is a more generic pattern - \b([a-zA-Z])\.? ?([a-zA-Z]):
\b - leading word boundary
(?![dD]\.? ?[rR]\b) - the negative lookahead
([a-zA-Z]) - Group 1 capturing an ASCII letter
\.? - an optional dot
? - an optional space
([a-zA-Z]) - Group 2 capturing an ASCII letter
\b - a trailing word boundary

Related

Regex to add underscore between number and unit (or replace whitespace with underscore between number and unit)

I have a long text that contains data like:
23cm,
23m,
60 cm,
60 m,
So sometimes there is a space between number and unit. Sometimes there isn't one.
How to add an underscore in each case, so the result would be:
23_cm,
23_m,
60_cm,
60_m
The search pattern for a part of it is probably (\d) (?:cm|m), but I can figure out the rest.

We can use capturing groups. The following example uses \2 and \3 for the capturing groups. Some languages would use $2 and $3.
See https://regex101.com/r/KxYyrb/1
input string
23cm, 23m, 60 cm, 60 m,
pattern
((\d+)\s?(m|cm))
replace using
\2_\3
output
23_cm, 23_m, 60_cm, 60_m,

You can use
(\d)\s?(c?m)\b
The replacement pattern is $1_$2.
See the regex demo.
Details:
(\d) - Capturing group 1: a digit
\s? - an optional whitespace char
(c?m) - Capturing group 2: an optional c and an m
\b - a word boundary (else, the regex will match m in men, for example).

I suggest replacing matches of
(?<=\d) ?(?=c?m,)
with an underscore. If a space is present it is matched; else the (zero-width) location between the last digit and 'cm' or 'm' is matched.
Demo
The regular expression can be broken down as follows. (I have enclosed the space in a character class to make it visible to the reader.)
(?<= # begin a positive lookbehind
\d # match a digit
) # end positive lookbehind
[ ]? # optionally match a space
(?= # begin a positive lookahead
c?m, # optionally match a 'c' followed by 'm,'
) # end positive lookahead
If the comma is not always present replace (?=c?m,) with (?=c?m\b), \b being a word boundary.

Regex match specific strings

I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars

The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo

You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]

You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.

You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*

Negating duplicate words pattern

I am new to regex and have the following pattern that detects duplicate words separated with dashes
\b(\w+)-+\1\b
// matches: hey-hey
// not matches: hey-hei
What I really need is a negated version of this pattern.
I've tried negative lookahead, but no good.
(?!\b(\w+)-+\1\b)

You can use
\b(\w+)-+(?!\1\b)\w+
See the regex demo. Details:
\b - a word boundary
(\w+) - Group 1: one or more word chars
-+ - one or more hyphens
(?!\1\b)\w+ - one or more word chars that are not equal to the first capturing group value.

Working with regex for alphanumeric

I'm trying a regex fro Alpha Numeric of length 7 (with positions 1,3,4 as characters and positions 2,5,6,7 as digits).
[a-zA-Z]|[0-9]|[a-zA-Z]|[a-zA-Z]|[0-9]|[0-9]|[0-9]
Can someone help me?

The sequence "character, digit, character, character, digit, digit, digit" is expressed in regex as
[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}

If you're working in PCRE (with say, PHP):
^([a-zA-Z])([0-9])(?1){2}(?2){3}$
Breakdown:
^ - from the start of the string
([a-zA-Z]) - match and capture a single character in the ranges given: a-z, A-Z
([0-9]) - match and capture a single character in the ranges given: 0-9
(?1){2} - redo the regex in the first group twice (recursive subpattern)
(?2){3} - redo the regex in the second group 3 times (recursive subpattern)
$ - the end of the string
If you want to match this in the middle of a sentence, exchange ^ and $ for \b - which will match a word boundary
See the demo
If you're not using PCRE:
^[a-zA-Z][0-9][a-zA-Z]{2}[0-9]{3}$
Which does the same thing, but has some copy-paste involved

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex for matching groups but excluding a specific combination of groups - regex

Related

Regex to add underscore between number and unit (or replace whitespace with underscore between number and unit)

Regex match specific strings

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Negating duplicate words pattern

Working with regex for alphanumeric

Categories

Resources