Regex: Match partial string

Regex: Match partial string - regex

I need some help - my skills here falls short :) (and I don't know if it is possible with pure regex)
Case: I have some text inputs in the form of:
input1: "abc,clutter,01;xyz,clutter,02;" (should match)
input2: "abc,clutter,02;zyz,clutter,01;" (no match)
input3: "abc,clutter,02;abc,txt,txt,01;xyz,clutter,01" (should match)
Then match should be
Starts with abc (anywhere in the input)
Everything in between - unless ,02; is in-between
Ends with ,01;
So something like:
abc(.*)(?!,02;),01;
.. but this also matches input2, and that was not the intension :)

You might use for example a repeating pattern matching all chars except , and ;
\babc(?:,(?!02,)[^,;\n]+)*,01;
\babc A word boundary, match abc
(?: Non capture group
,(?!02,)[^,;\n]+ Negative lookahead, assert not 02, and match any char except , ; or a newline
)* Close the group and optionally repeat
,01; Match literally
Regex demo
If abc should only be matched one, you can also add that to the negative lookahead
\babc(?:,(?!(?:02|abc),)[^,;\n]+)*,01;
Regex demo

Related

Match with optional positive lookahead

I've got 2 strings in the format:
Some_thing_here_1234 Match Me 1 & 1234 Match Me 1_1
In both cases I want the resultant match to be 1234 Match Me 1
So far I've got (?<=^|_)\d{4}\s.+ which works but in the case of string 2 also captures the _1 at the end. I thought I could use a lookahead at the end with an optional such as (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) but it always seems to revert to the second option and so the _1 gets through.
Any help would be great

You can use
(?<=^|_)\d{4}\s[^_]+
See the regex demo.
Details:
(?<=^|_) - a positive lookbehind that matches a location that is immediately preceded with either start of string or a _ char (equal to (?<![^_]))
\d{4} - four digits
\s - a whitespace
[^_]+ - one or more chars other than _.

Your second pattern (?<=^|_)\d{4}\s.+(?=_\d{1}$|$) is greedy and at the end of the string the second alternative |$ will match so you will keep matching the whole line.
Note that you can omit {1}
If you want to use an optional part in the lookahad, you can make the match non greedy and optionally match :_\d in the lookahead followed by the end of the string.
(?<=^|_)\d{4}\s.+?(?=(?:_\d)?$)
See a regex demo.

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]

You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.

You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*

Match group followed by group with different ending

For example, let's say I have a list of words:
words.txt
accountable
accountant
accountants
accounted
I want to match "accountant\naccountants"
I've tried /(\n\w+){2}s/, but \w+ seems to be perfectly matching different things.
My RegEx also matches the following undesirable texts:
action
actionables
actionable
actions
Am I reaching out too far in what regex can do?

You could for example use a capture group, and match a newline followed by a backreference to the same captured text and an s char.
If the first word can also be at the start of the string, instead of being preceded by a newline, you can use an anchor ^ instead.
^(\w+)\n\1s$
^ Start of string
(\w+) Capture group 1, match 1+ word chars
\n\1s Match a newline, backreference \1 to match the same text as group 1 and an s char
$ End of string
Regex demo

How to exclude non-numeric character in regex

I have a string which goes like this
Section 78(1) of the blabla
These are my regex
\b\s(?!\b(\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)\b\S*
Expected output is: of the blabla
This regex works but it does not exclude "of" because of the (). Can anyone help me? Thank you

Try this pattern: .+\d\)?
Explanation:
.+ - match one or more times of any charaters
\d - match digit
\)? - match ) zero or one time
Because of greediness of + it will match until last digit, if it's in bracket, then match following bracket.
Demo
Alternatively use \d+(?:\(\d+\))?(.+)
Then desired output is in first capturing group.
Demo

It seems all you need to change is to remove the \b before \S* and replace the \S* with .+ or .* (if the match can be an empty string).
\s(?!\b(?:\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)(.+)
See the regex demo, grab Group 1 value. Note I turned the first group matching digits in the negative lookahead into a non-capturing group to avoid clutter in the resulting match list.
VB.NET demo:
Dim r As New Regex("\s(?!\b(?:\d{1,3}|\d{1,2}[a-zA-Z]|\d{5,})\b)(.+)")
Dim s As String
s = "Section 78(1) of the blabla"
For Each m As Match In r.Matches(s)
Console.WriteLine(m.Groups(1).Value)
Next
Result: of the blabla.

Regexp matching a string - positive lookahead

Regexp: (?=(\d+))\w+\1
String: 456x56
Hi,
I am not getting the concept, how this regex matches "56x56" in the string "456x56".
The lookaround, (?=(\d+)), captures 456 and put into \1, for (\d+)
The wordcharacter, \w+, matches the whole string("456x56")
\1, which is 456, should be followed by \w+
After backtracking the string, it should not find a match, as there is no "456" preceded by a word character
However the regexp matches 56x56.

5) Regex engines concludes that it cannot find a match if it start searching from 4, so it skips one character and searches again. This time, it captures two digits into \1 and ends up matching 56x56
If you want to match only whole strings, use ^(?=(\d+))\w+\1$
^ matches beginning of string
$ matches end of string

You don't anchor your regex, as has been said. Another problem is that \w also matches digits... Now look at how the regex engine proceeds to match with your input:
# begin
regex: |(?=(\d+))\w+\1
input: |456x56
# lookahead (first group = '456')
regex: (?=(\d+))|\w+\1
input: |456x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack on \w+
regex: (?=(\d+))\w+|\1
input: 456x5|6
# And again, and again... Until the beginning of the input: \1 cannot match
# Regex engine therefore decides to start from the next character:
regex: |(?=(\d+))\w+\1
input: 4|56x56
# lookahead (first group = '56')
regex: (?=(\d+))|\w+\1
input: 4|56x56
# \w+
regex: (?=(\d+))\w+|\1
input: 456x56|
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x5|6
# \1 cannot be satisfied: backtrack
regex: (?=(\d+))\w+|\1
input: 456x|56
# \1 satified: match
regex: (?=(\d+))\w+\1|
input: 4<56x56>

The points you listed are almost entirely, but not quite, wrong!
1) The group (?=(\d+)) matches a sequence of one or more digits
not necessarily 456
2) \w captures only characters, not digits
3) \1 the is a back reference to the match in the group
So the role expression means find a sequence of digits followed by s sequence of word characters with are followed by the same sequence that was found in front of the characters. Hence the match 56x56.

Well that's what makes it a positive lookahead
(?=(\d+))\w+\1
You are correct when you say the first \d+ will match 456, so \1 must also be 456, but if that's the case: the expression won't match the string.
The only common characters of before the x and after the x are 56, and that's what it will do to get a positive match.

The operator + is greedy and backtracks as necessary. The lookahead (?=(\d+)) will match 456 then 56 if the regex fails then 6 if the regex fails. First attempt: 456. It matches, the group 1 contains 456. Then we have \w+ which is greedy and takes 456x56, there is nothing left but we still have to match \1 i.e. 456. Thus: failure. Then \w+ backtraks one step at a time till we get to the beginning of the regex. And it still fails.
We consume a character from the string. Next backtrack is trying to lookahead match with substring 56. it matches and the group 1 contains 56. \w+ matches until the end of the string and gets 456x56 and then we try to match 56: failure. So \w+ bactracks until we have 56 left in the string and then we have a global match and regex success.
You should try it with regex buddy debug mode.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex: Match partial string - regex

Related

Match with optional positive lookahead

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Match group followed by group with different ending

How to exclude non-numeric character in regex

Regexp matching a string - positive lookahead

Categories

Resources