Regex to extract values from look behind groups along with subsequent repetitions - regex

In a JAVA program, I need to match a text input with a regular expression pattern. Simplistically, the text input looks like this: "100|200|123,124,125".
The output from the above match should find three matches, where all matches will return the two fixed subgroups - 100 and 200 and the variable repeating sub-group 123/124/125.
Match 1 - 123
Match 2 - 124
Match 3 - 125.
Each of these match output should also include 100 and 200 in two separate groups.
So basically, matches will target extracting patterns such as '100|200|123', '100|200|124', '100|200|125'.
I have used this regex: (?<=(?:(?<first>\d+)\|(?<second>\d+)\|)|,)(?<vardata>\d+)(?=,|$).
But I get this error: + A quantifier inside a look-behind makes it non-fixed width

As stated in comments above, you cannot use variable length assertions in lookbehind in Java regex.
However you can use this regex based on \G:
(?:(\d+)\|(\d+)\||(?<!^)\G,)(\d+)
RegEx Demo
RegEx Details:
\G asserts position at the end of the previous match or the start of the string for the first match.
You will get comma separated numbers in group(3) in a loop while group(1) and group(2) will give you first 2 numbers from input string.

Related

Regular Expression to Validate Monaco Number Plates

I would like to have an expression to validate the plates of monaco.
They are written as follows:
A123
123A
1234
I started by doing:
^[a-zA-Z0-9]{1}?[0-9]{2}?[a-zA-Z0-9]{1}$
But the case A12A which is false is possible with that.
You can use
^(?!(?:\d*[a-zA-Z]){2})[a-zA-Z\d]{4}$
See the regex demo. Details:
^ - start of string
(?!(?:\d*[a-zA-Z]){2}) - a negative lookahead that fails the match if there are two occurrences of any zero or more digits followed with two ASCII letters immediately to the right of the current location
[a-zA-Z\d]{4} - four alphanumeric chars
$ - end of string.
You can write the pattern using 3 alternatives specifying all the allowed variations for the example data:
^(?:[a-zA-Z][0-9]{3}|[0-9]{3}[a-zA-Z]|[0-9]{4})$
See a regex demo.
Note that you can omit {1} and
To not match 2 chars A-Z you can write the alternation as:
^(?:[a-zA-Z]\d{3}|\d{3}[a-zA-Z\d]|\d[a-zA-Z\d][a-zA-Z\d]\d)$
See another regex demo.
So it needs 3 connected digits and 1 letter or digit.
Then you can use this pattern :
^(?=.?[0-9]{3})[A-Za-z0-9]{4}$
The lookahead (?=.?[0-9]{3}) asserts the 3 connected digits.
Test on Regex101 here

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Match sequence at the very beginning of a string along with other sequences in the middle

I'm trying to find a way to match 2 patterns in the same regular expression, but I can't seem to be able to combine both. These are a few strings I'm trying to match against:
0 800-204-4000
0800-204 4000
0 800 -204 - 4000
800 204 4000
What I'm trying to do is find a regular expression that matches a zero at the beginning of a string if it exists and all subsequent white spaces and dashes. So I was able to match the first zero if it exists using /^0?/ and match all empty spaces and dashes using /[\s-]*/g but how exactly can I combine both into the same expression?
Edit:
So I want to match the first 0 IF it exists and all following spaces and dashes. So in the examples above, what should be matched is in brackets:
[0 ]800[-]204[-]4000
[0]800[-]204[ ]4000
[0 ]800[ -]204[ - ]4000
800[ ]204[ ]4000
The regex provided in the answers do not work. Check it out: https://regex101.com/r/dMN6xR/1
(^0|[\s-])[\s-]*
To avoid empty matches, set up a match group () that matches starting zero ^0 or | whitespace/- [\s-], and then match rest of whitespace/- [\s-]*
To allow whitespace before the zero, just add that right after the start anchor like this
(^[\s-]*0|[\s-])[\s-]*.
You combine the regex by using the or operator:
/(^0?)|[\s-]*/g
When simplified, it would be
/(^0)?[\s-]+/g
Drawing on the others' answers, you can use /^0[-\s]*|[-\s]+/g. I've used [- ] in the demo because I don't like how difficult it is to read when newlines are removed.

Regex - Remove the final character

I have the following Regex
(?:(?:zero|one|two|three|four|five|six|seven|eight|nine|\[0-9‌​\])\s*){4,}
As you can see, it matches numbers with whitespace.
Question
How do I stop it from matching the final whitespace character?
For example:
1 2 3 4 5<whitespace>
should rather be:
1 2 3 4 5
The way you wrote the regex, trailing whitespaces will always be a part of a match, and there is no way to get rid of them. You need to rewrite the pattern repeating the number matching part inside a group that you need to assign the limiting quantifier with the min value decremented.
Schematically, it looks like
<NUMPATTERN>(?:\s+<NUMPATTERN>){3,}
See the regex demo.
In PCRE and Ruby, you may repeat capture group patterns with (?n) syntax (to shorten the pattern):
(zero|one|two|three|four|five|six|seven|eight|nine|[0-9])(?:\s+\g<1>){3,}
See the regex demo

Regular expression not working

I want to extract from the following regex (?<=^\d+\s*).*?\t trying to extract from the following text just the resources\blahblah:
10 _Resources\index.test FAIL
11 _Resources\index.test FAIL
12 Resources\index.test FAIL
13set\Relicensing Statement.test FAIL
but it captures the following text:
0 _Resources\index.test
1 _Resources\index.test
2 Resources\index.test
3set\Relicensing Statement.test
I just want the lines like Resources\index.test and not the starting numbers, no spaces, why is failing? If I just execute ^\d+\s*and matches with the any number of digits and space, but do not works with prefix.
Since you commented you were using Notepad++, how about matching ^\d+\s*([^\t]*).*$ and replacing by \1 ?
From NSRegularExpression (I saw it was tagged):
Look-behind assertion. True if the parenthesized pattern matches text
preceding the current input position, with the last character of the
match being the input character just before the current position. Does
not alter the input position. The length of possible strings matched
by the look-behind pattern must not be unbounded (no * or +
operators.)
The same problem holds in most of the languages.
Can't you extract $1 from (?:^\d+\s*)(.*?\t)?