I have a regex that captures the following expression
XPT 123A
Now I need to add "something" to my regex to capture the remaining string as a group
XPT 123A I AM VERY HAPPY
So XPT would be group 1, 123A group 2, and I AM VERY HAPPY group 3.
Here is my regex (also here http://regexr.com/4mocf):
^([A-Z]{2,4}).((?=\d)[a-zA-Z\d]{0,4})
EDIT:
I dont want to name my groups (editing b/c some people thought it was a dup of another question)
Assuming Group 3 is optional, you may use
^([A-Z]{2,4}) (\d[a-zA-Z\d]{0,3})(?: (.*))?$
^([A-Z]{2,4})\s+(\d[a-zA-Z\d]{0,3})(?:\s+(.*))?$
The \s+ matches any 1+ whitespace chars.
See the regex demo.
Details
^ - start of string
([A-Z]{2,4}) - Group 1: two, three or four uppercase ASCII letters
\s+ - 1+ whitespaces
(\d[a-zA-Z\d]{0,3}) - Group 2: a digit followed with 0 or more alphanumeric chars
(?:\s+(.*))? - an optional non-capturing group matching 1 or 0 occurrences of:
\s+ - 1+ whitespaces
(.*) - Group 3: any 0+ chars other than line break chars as many as possible
$ - end of string
Just add the following suffix to your regex to capture the rest of the line:
(?<rest>.+)?$
Related
I am trying to do regex parsing and matching and optionally discard the rest of the string.
My strings are of type:
[GROUP 1][delimiter 1][GROUP 2][delimiter 2][GROUP 3][delimiter 3 - optional][REST OF THE STRING - optional]
For example:
07. Neospace - Into The Night (Chris Van Buren)
13. Atomic Space Orchestra - Starfleet
I am trying to capture GROUP 1, GROUP 2 and GROUP 3 while ignoring REST OF THE STRING
The following regex works well if [delimiter 3] is present:
(\d+)\. (.*) - (.*)(?: \()
I am getting "07", "Neospace" and "Into The Night".
But for the second string, there is no match, because my last non-capturing group is mandatory.
When I'm trying to make last group optional like this:
(\d+)\. (.*) - (.*)(?: \()? non-capturing group stops working and I am getting "Into The Night (Chris Van Buren)" for the GROUP 3 - which is NOT what I want.
If the 3rd group has ( as a delimiter, you can use a negated character class to exclude matching a ( char.
Note that using * as a quantifier can also match an empty string between the delimiters.
If the match should be at the start of the string, you can prepend the pattern with ^
(\d+)\. (.*?) - ([^(\n]*)
Explanation
(\d+) Capture group 1, match 1+ digits
\. Match .
(.*?) Capture group 2, match 0+ times any character, as few as possible
- Match literally
([^(\n]*) Capture group 3, match 0+ times any character except ( or a newline
See a regex demo.
I have the following regular expressions that extract everything after first two alphabets
^[A-Za-z]{2})(\w+)($) $2
now I want to the extract nothing if the data doesn't start with alphabets.
Example:
AA123 -> 123
123 -> ""
Can this be accomplished by regex?
Introduce an alternative to match any one or more chars from start to end of string if your regex does not match:
^(?:([A-Za-z]{2})(\w+)|.+)$
See the regex demo. Details:
^ - start of string
(?: - start of a container non-capturing group:
([A-Za-z]{2})(\w+) - Group 1: two ASCII letters, Group 2: one or more word chars
| - or
.+ - one or more chars other than line break chars, as many as possible (use [\w\W]+ to match any chars including line break chars)
) - end of a container non-capturing group
$ - end of string.
Your pattern already captures 1 or more word characters after matching 2 uppercase chars. The $ does not have to be in a group, and this $2 should not be in the pattern.
^[A-Za-z]{2})(\w+)$
See a regex demo.
Another option could be a pattern with a conditional, capturing data in group 2 only if group 1 exist.
^([A-Z]{2})?(?(1)(\w+)|.+)$
^ Start of string
([A-Z]{2})? Capture 2 uppercase chars in optional group 1
(? Conditional
(1)(\w+) If we have group 1, capture 1+ word chars in group 2
| Or
.+ Match the whole line with at least 1 char to not match an empty string
) Close conditional
$ End of string
Regex demo
For a match only, you could use other variations Using \K like ^[A-Za-z]{2}\K\w+$ or with a lookbehind assertion (?<=^[A-Za-z]{2})\w+$
I need to take only a number (a float number) from a text, but I can't remove the whitespaces...
** Update
I have a problem with this method, I only need to consider numbers and ',' between '- EUR' and 'Fee' as rule.
You can use
- EUR\W*(.*?)\W*Fee
See the regex demo.
Variations of the regex that might work in different regex engines:
- EUR\W*\K.*?(?=\W*Fee)
(?<=- EUR\W*).*?(?=\W*Fee)
Details:
- EUR - literal text
\W* - zero or more non-word chars
(.*?) - Group 1: any zero or more chars other than line break chars as few as possible
\W*- zero or more non-word chars
Fee - a string.
You could also match the number format in capture group 1
- EUR\b\D*(\d+(?:,\d+)?)\s+Fee\b
- EUR\b Match - EUR and a word boundary
\D* Match 0+ times any char except a digit
( Capture group 1
\d+(?:,\d+)? Match 1+ digits with an optional decimal part
) Close group 1
\s+Fee\b Match 1+ whitespace chars, Fee and a word boundary
Regex demo
this is working i removed the , from (.) in test string.
Regex example - working
I have the following test strings:
Battery Bank 1
Dummy 32 Segment 12
System
Modbus 192.168.0.1 Group
I need a regex that can match and group these as follows:
Group 1: Battery Bank
Group 2: 1
Group 1: Dummy 32 Segment
Group 2: 12
Group 1: System
Group 2: null
Group 1: Modbus 192.168.0.1 Group
Group 2: null
Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
This regex is not doing what I need as everything is captured into the first group.
([\w ]+)( \d+)?
https://regex101.com/r/GEtb5G/1/
Basically, capture everything (including numbers) into group 1 unless the string ends with a whitespace followed by 1 or more digits. If it does, capture this number into group 2.
You may use this group that allows an empty match in 2nd capture group:
^(.+?) *(\d+|)$
Updated RegEx Demo
RegEx Details:
^: Start
(.+?): Match 1+ of any character (lazy) in capture group #1
*: Match 0 or more spaces
(\d+|): Match 1+ digits or nothing in 2nd capture group
$: End
You can use
^\s*(.*[^\d\s])(?:\s*(\d+))?\s*$
See the regex demo (note \s are replaced with spaces since the test string in the demo is a single multiline string).
If the regex is to be used with a multiline flag to match lines in a longer multiline text, you can use
^[^\S\r\n]*(.*[^\d\s])(?:[^\S\r\n]*(\d+))?[^\S\r\n]*$
See the regex demo.
Details:
^ - start of a string
\s* - zero or more whitespaces
(.*[^\d\s]) - Group 1: any zero or more chars other than line break chars as many as possible and then a char other than a digit and whitespace
(?:\s*(\d+))? - an optional sequence of
\s* - zero or more whitespaces
(\d+) - Group 2: one or more digits
\s* - zero or more whitespaces
$ - end of string.
In the second regex, [^\S\r\n]* matches any zero or more whitespaces other than LF and CR chars.
I have a set of strings with fairly inconsistent naming, that should be structured enough to be divided into groups though.
Here's an excerpt:
test test 1970-2020 w15.txt
test 1970-2020 w15.csv
test 1990-99 q1 .txt
test 1981 w15 .csv
test test w15.csv
I am trying to extract information by groups (test-name, (year)?, suffix, type) using the following RegEx:
(.*)\s+([0-9]+(\-[0-9]+)?\s+)?((w|q)[0-9]+(\s+)?)(\..*)$
It works except for the optional group matching the years (interval of year's, single year or no year at all).
What am I missing to make the pattern work?
Here's also a link to RegEx101 for testing:
https://regex101.com/r/wG3aM3/817
You could make the pattern a bit more specific and make the content of the year optional
^(.*?)\s+((?:\d{4}(?:-(?:\d{4}|\d{2}))?)?)\s+([wq][0-9]+)\s*(\.\w+)$
Explanation
^ Start of string
(.*?) Capture group 1 Match 0+ times any char except a newline non greedy
\s+ Match 1+ whitespace chars
( Capture group 2
(?: Non capture group
\d{4}(?:-(?:\d{4}|\d{2}))? Match 4 digits and optionally - and 2 or 4 digits
)? Close non capture group and make the year optional
) Close group 2
\s+ Match 1+ whitespace chars
([wq][0-9]+) Capture group 3 Match either w or q and 1+ digits 0-9
\s* Match 0+ whitespace chars
(\.\w+) Capture group 4, match a dot and 1+ word characters
$ End of string
Regex demo
Note that \s could also match a newline.