Regex match specific strings - regex

I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars

The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo

You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.

Related

Regex lookaround to find anything up to an already searched group

I'm trying to analyze search queries of a particular pattern.
The pattern is:
How many/much _____ is/are _____.
Given this pattern, the blanks are unknown to me but I want to extract any statement that follows this pattern above.
My challenge is finding a way to do a lookaround on is/are up to but not including many/much and anything after but not including is/are.
Here's my regex so far:
(([hH]ow many?)|([hH]ow much?))|(?<=is)|(are)|(i|s|n|a|o|f){1,2}|((\")|(\“)|(\/)|(\'))
If you use this regex with the i flag to match case insensitive
^how\s+(?:much|many)\s+(.*?)\s(?:is|are)\s+(.*?)[.?]?$
Then it'll match these strings
How much bla is blabla.
How many bla are blablabla?
And the bla's will be in capture group 1 and 2.
Try this:
/(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s)(.+)(?=\sis|\sare)|(?<=is\s|are\s)(.+)/g
Review it at regex101
Lookarounds are placed behind and/or ahead of your capture group:
1st Capture Group
(?<=[Hh]ow\smany\s|[Hh]ow\smuch\s) /* "(H|h)ow"\space"many"\space OR
"(H|h)ow"\space"much"\space
must be before capture group */
(.+) /* capture group one or more of anything */
(?=\sis|\sare) /* \space"is" OR \space"are" must be after capture group */
| // OR
2nd Capture Group
(?<=is\s|are\s) /* "is"\space OR "are\space must be before capture group */
(.+) /* capture group one or more of anything */
If both parts for the _____ are mandatory you could use 2 capture groups:
\b[Hh]ow\s+(?:much|many)\s+(\S.*?)\s+(?:is|are)\s+([^.]+)\.
The pattern matches:
\b A word boundary to prevent a partial word match
[Hh]ow\s+ Match How or how and 1+ whitespace chars
(?:much|many)\s+ Match either much or many and 1+ whitespace chars
(\S.*?)\s+ Capture in group 1 at least a single non whitespace char, then as least as possible chars and match 1+ whitespace chars
(?:is|are)\s+ Match either is or are and 1+ whitespace chars
([^.]+) Capture in group 2 1+ chars other than a dot
\. Match a dot
See a regex demo.

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]
You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.
You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*

Regex - add a zero after second period

I have the following example of numbers, and I need to add a zero after the second period (.).
1.01.1
1.01.2
1.01.3
1.02.1
I would like them to be:
1.01.01
1.01.02
1.01.03
1.02.01
I have the following so far:
Search:
^([^.])(?:[^.]*\.){2}([^.].*)
Substitution:
0\1
but this returns:
01 only.
I need the 1.01. to be captured in a group as well, but now I'm getting confuddled.
Does anyone know what I am missing?
Thanks!!
You may try this regex replacement with 2 capture groups:
Search:
^(\d+\.\d+)\.([1-9])
Replacement:
\1.0\2
RegEx Demo
RegEx Details:
^: Start
(\d+\.\d+): Match 1+ digits + dot followed by 1+ digits in capture group #1
\.: Match a dot
([1-9]): Match digits 1-9 in capture group #2 (this is to avoid putting 0 before already existing 0)
Replacement: \1.0\2 inserts 0 just before capture group #2
You could try:
^([^.]*\.){2}\K
Replace with 0. See an online demo
^ - Start line anchor.
([^.]*\.){2} - Negated character 0+ times (greedy) followed by a literal dot, matched twice.
\K - Reset starting point of reported match.
EDIT:
Or/And if \K meta escape isn't supported, than see if the following does work:
^((?:[^.]*\.){2})
Replace with ${1}0. See the online demo
^ - Start line anchor.
( - Open 1st capture group;
(?: - Open non-capture group;
`Negated character 0+ times (greedy) followed by a literal dot.
){2} - Close non-capture group and match twice.
) - Close capture group.
Using your pattern, you can use 2 capture groups and prepend the second group with a dot in the replacement like for example \g<1>0\g<2> or ${1}0${2} or $10$2 depending on the language.
^((?:[^.]*\.){2})([^.])
^ Start of string
((?:[^.]*\.){2}) Capture group 1, match 2 times any char except a dot, then match the dot
([^.].*) Capture group 2, match any char except a dot
Regex demo
A more specific pattern could be matching the digits
^(\d+\.\d+\.)(\d)
^ Start of string
(\d+\.\d+\.) Capture group 1, match 2 times 1+ digits and a dot
(\d) Capture group 2, match a digit
Regex demo
For example in JavaScript
const regex = /^(\d+\.\d+\.)(\d)/;
[
"1.01.1",
"1.01.2",
"1.01.3",
"1.02.1",
].forEach(s => console.log(s.replace(regex, "$10$2")));
Obviously, there will be tons of solutions for this, but if this pattern holds (i.e. always the trailing group that is a single digit)... \.(\d)$ => \.0\1 would suffice - to merely insert a 0, you don't need to match the whole thing, only just enough context to uniquely identify the places targeted. In this case, finding all lines ending in a . followed by a single digit is enough.

Regex start with any number and it should end without zeros

I am trying to create a Regex with groups that should group 1234.0500- to 1234.05-.
What I have tried is:
^([0-9]+)(\.)([1-9]*)0*(-?)$
but it does not match 1234.0500-. Here is the example https://regex101.com/r/koSZoB/1. The regex should also group
1234.0000
0.9000
to
1234
0.9
In your pattern, this part ([1-9]*)0*(-?)$ matches optional digits 1-9 followed by optional zeroes and then an optional hyphen at the end of the string. It will succeed until the first zero:
0500
^
But the match will fail as it can not match (-?)$
You could use 3 capturing groups and use those in the replacement.
After group 1, you could either match a dot followed by only zeroes which should be removed, or capture in group 2 matching from the dot till the lats digits 1-9 and remove the trailing zeroes.
^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$
Explanation
^ Start of string
(\d+) Capture group 1, match 1+ digits
(?: Non capture group, match either
\.0+ Match a . and 1+ zeroes
| Or
(\.\d*[1-9])0+ Capture ., 0+ digits followed by a digit 1-9 and match the following 1+ zeroes to be removed
) Close group
(-?) Capture optional -
$ End of string
Regex demo
There is no language tagged, but for example in Javascript
const pattern = /^(\d+)(?:\.0+|(\.\d*[1-9])0+)(-?)$/;
[
"1234.0500-",
"1234.05500-",
"1234.0550588500-",
"1234.0000",
"0.9000",
"12.1222",
"12.1222-",
].forEach(s => console.log(s.replace(pattern, "$1$2$3")));
The third capture group doesn't include zeroes meaning that the 0 in 05 is making the match fail.
I would suggest making the third capture group non-greedy by adding a ?: ^([0-9]+)(\.)([0-9]*?)0*(-?)$ This will make it match the minimum amount of zeroes possible instead of the maximum. With the last group being greedy it should work.

How can I extract non digit characters and digit characters in the end of a string?

I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.