I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?
Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.
Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$
Related
I want to capture all the strings from multi lines data. Supposed here the result and here’s my code which does not work.
Pattern: ^XYZ/[0-9|ALL|P] I’m lost with this part anyone can help?
Result
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/ALL
XYZ/P1
XYZ/P2,3
XYZ/P4,5-7
XYZ/P1-4,5-7,8-9
Changed to
XYZ/1
XYZ/1,2-5
XYZ/5,7,8-9
XYZ/2-4,6-8,9
XYZ/A12345 after the slash limited to 6 alphanumeric chars
XYZ/LH-1234567890 after the /LH- limited to 10 numeric chars
The pattern could be:
^XYZ\/(?:ALL|P?[0-9]+(?:-[0-9]+)?(?:,[0-9]+(?:-[0-9]+)?)*)$
The pattern in parts matches:
^ Start of string
XYZ\/ Match XYX/ (You don't have to escape the / depending on the pattern delimiters)
(?: Outer on capture group for the alternatives
ALL Match literally
| Or
P? Match an optional P
[0-9]+(?:-[0-9]+)? Match 1+ digits with an optional - and 1+ digits
(?: Non capture group to match as a whole
,[0-9]+(?:-[0-9]+)? Match ,and 1+ digits and optional - and 1+ digits
)* Close the non capture group and optionally repeat it
) Close the outer non capture group
$ End of string
Regex demo
You can use this regex pattern to match those lines
^XYZ\/(?:P|ALL|[0-9])[0-9,-]*$
Use the global g and multiline m flags.
Btw, [P|ALL] doesn't match the word "ALL".
It only matches a single character that's a P or A or L or |.
I have the following test strings:
Battery Bank 1
Dummy 32 Segment 12
System
Modbus 192.168.0.1 Group
I need a regex that can match and group these as follows:
Group 1: Battery Bank
Group 2: 1
Group 1: Dummy 32 Segment
Group 2: 12
Group 1: System
Group 2: null
Group 1: Modbus 192.168.0.1 Group
Group 2: null
Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
This regex is not doing what I need as everything is captured into the first group.
([\w ]+)( \d+)?
https://regex101.com/r/GEtb5G/1/
Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
You may use this group that allows an empty match in 2nd capture group:
^(.+?) *(\d+|)$
Updated RegEx Demo
RegEx Details:
^: Start
(.+?): Match 1+ of any character (lazy) in capture group #1
*: Match 0 or more spaces
(\d+|): Match 1+ digits or nothing in 2nd capture group
$: End
You can use
^\s*(.*[^\d\s])(?:\s*(\d+))?\s*$
See the regex demo (note \s are replaced with spaces since the test string in the demo is a single multiline string).
If the regex is to be used with a multiline flag to match lines in a longer multiline text, you can use
^[^\S\r\n]*(.*[^\d\s])(?:[^\S\r\n]*(\d+))?[^\S\r\n]*$
See the regex demo.
Details:
^ - start of a string
\s* - zero or more whitespaces
(.*[^\d\s]) - Group 1: any zero or more chars other than line break chars as many as possible and then a char other than a digit and whitespace
(?:\s*(\d+))? - an optional sequence of
\s* - zero or more whitespaces
(\d+) - Group 2: one or more digits
\s* - zero or more whitespaces
$ - end of string.
In the second regex, [^\S\r\n]* matches any zero or more whitespaces other than LF and CR chars.
I need one regex to capture a string up to a :, but the problem is that the : is not always there.
At this moment I am able to capture the groups when I have the : but not when I dont.
Not sure what I am doing wrong.
strings to capture
XXX 1 A:B (working)
XXX 1 A: (working)
XXX A (not working)
My regex:
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*)(?=\:)(?:.)*$
You can use
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>.*?)(?::.*)?$
See the regex demo. Details:
^ - start of string
(?P<grp1>[A-Z]{3,10}) - Group "grp1": three to ten uppercase letters
\s - a whitespace
(?P<grp2>.*?) - Group "grp2": any zero or more chars other than line break chars, as few as possible
(?::.*)? - an optional group matching any zero or more chars other than line break chars as many as possible
$- end of string.
Optionally match a single : after it
^(?P<grp1>[A-Z]{3,10})\s(?P<grp2>[^:\r\n]*)(?::[^:\r\n]*)?$
^ Start of string
(?P<grp1>[A-Z]{3,10}) Group grp1
\s Match a whitspace char
(?P<grp2>[^:\r\n]*) Group 2 grp2 Match any char except : or a newline
(?::[^:\r\n]*)? Optionally match a single : between optional chars other than : or a newline
$ End of string
Regex demo
I have a regex that captures the following expression
XPT 123A
Now I need to add "something" to my regex to capture the remaining string as a group
XPT 123A I AM VERY HAPPY
So XPT would be group 1, 123A group 2, and I AM VERY HAPPY group 3.
Here is my regex (also here http://regexr.com/4mocf):
^([A-Z]{2,4}).((?=\d)[a-zA-Z\d]{0,4})
EDIT:
I dont want to name my groups (editing b/c some people thought it was a dup of another question)
Assuming Group 3 is optional, you may use
^([A-Z]{2,4}) (\d[a-zA-Z\d]{0,3})(?: (.*))?$
^([A-Z]{2,4})\s+(\d[a-zA-Z\d]{0,3})(?:\s+(.*))?$
The \s+ matches any 1+ whitespace chars.
See the regex demo.
Details
^ - start of string
([A-Z]{2,4}) - Group 1: two, three or four uppercase ASCII letters
\s+ - 1+ whitespaces
(\d[a-zA-Z\d]{0,3}) - Group 2: a digit followed with 0 or more alphanumeric chars
(?:\s+(.*))? - an optional non-capturing group matching 1 or 0 occurrences of:
\s+ - 1+ whitespaces
(.*) - Group 3: any 0+ chars other than line break chars as many as possible
$ - end of string
Just add the following suffix to your regex to capture the rest of the line:
(?<rest>.+)?$
I know that ^. is first character and (\d+)(?!.*\d) is last number. I've tried using | between these and have been trying to find code for the second character, but with no success.
This is in R.
Take for example:
'ABCD some random words and spaces 1234' should output 'A4' when I do
sub([regex here], "", 'ABCD some random words and spaces 1234')
If you used ^.|(\d+)(?!.*\d), the pattern would only match the first char and remove it with sub, and would remove the first char and the last 1+ digits if used with gsub without backreferences in the replacement pattern. See this pattern demo.
You can use
sub("^(.).*(\\d).*$", "\\1\\2", "ABCD some random words and spaces 1234")
See the R demo and the regex demo.
This TRE regex pattern matches:
^ - start of string
(.) - Group 1 capturing any char
.* - 0+ any chars as many as possible up to the last...
(\\d) - Group 2 capturing a digit
.* - the rest of the string
$ - end of string.
The \\1\\2 replacement pattern re-inserts the values captured with Group 1 and Group 2 back to the result.