Match repeated groups after keyword using regex - regex

VB2010 Using regex I cant seem to get this seemingly easy regex to work. I first look for a line with a keyword TRIPS that has my data and then from that line I want to extract repeated groups of data made up of an alpha code and then a number.
MODES 1 0 0
OVERH X 28 H 0 Z 198
TRIPS X 23 D 1 Z 198
ITEMSQ 1 0 0
COSTU P 16 E 180
CALLS 0 0
I have
^TRIPS (?<grp>[A-Z]\s{1,4}\d{1,3})
Which gives me one match and the first group "X 23". So I extend it by allowing it to match up to 4 groups.
^TRIPS (?<grp>[A-Z]\s{1,4}\d{1,3}){0,4}
but I get one match with still only one group.

You aren't allowing for white space between the groups. You need to do something like this:
^TRIPS ((?<grp>[A-Z]\s{1,4}\d{1,3})\s+){0,4}

Related

Regular Expression for parsing a sports score

I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.
I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.

Lookaround backtracks right before closing bracket

Note: this question is an outcome from another answer that as of now all its comments are removed.
In case of using a lookaround construct within a RegEx there is a backtrack or a kind of that takes place right before closing bracket. As I'm aware this backtrack comes to output of Perl and PCRE debuggers:
The question is what is this backtrack, why is it there and how is it interpreted as a backtrack?
The backtrack is a lie.
It's just a consequence of how the regex101 debugger is implemented. It uses a PCRE feature (flag) called PCRE_AUTO_CALLOUT. This flag tells the PCRE engine to invoke a user-defined function at every step of matching. This function receives the current match status as input.
The catch is that PCRE doesn't tell the callout when it really backtracks. Regex101 has to infer that from the match status.
As you can see, in the step before the "backtrack" occurs, the current matched text is a_, and just after you get out of the lookahead, it's reverted to a. Regex101 notices the matched text is shorter and therefore it infers that a backtrack must have happened, with the confusing outcome you noticed.
For reference, here's the internal PCRE representation of the pattern with auto-callout enabled:
$ pcretest
PCRE version 8.38 2015-11-23
re> /a(?=_)_b/DC
------------------------------------------------------------------
0 59 Bra
3 Callout 255 0 1
9 a
11 Callout 255 1 5
17 17 Assert
20 Callout 255 4 1
26 _
28 Callout 255 5 0
34 17 Ket
37 Callout 255 6 1
43 _
45 Callout 255 7 1
51 b
53 Callout 255 8 0
59 59 Ket
62 End
------------------------------------------------------------------
Capturing subpattern count = 0
Options:
First char = 'a'
Need char = 'b'
As you can see, there's no branching opcode there, just an Assert.

Select text in regex between 2 strings

I have the following line :
3EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E27 SON PIA PLUS
I'd like to get the string : EI AMANDINE MRV SHP 70 W. So I decided to select the strings between 1 (can also be 2, 3 or 99) and 0 (can also be 1, 2, 3, 4 or 5).
I tried :
(0|1|2|3|99)(.*)(0|1|2|3|4|5)
But I have this result :
EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E
that is not what I want to obtain.
Do you have an idea in regex to make that selection work ?
Thanks !
You were pretty close! Try this:
\b(?:0|1|2|3|99) ([^0|1|2|3|99].*?) (?:0|1|2|3|4|5)\b
Regex101
I think that you want to match "word" 4 to 9?
Your desired match will be in group 1
^(\S+\s){3}((\S+\s){6})
Enable the multiline option if you have a whole file of subject strings.
You can try with:
\s(?:[0-3]|99)\s([A-Z].*?)\b(?:[0-5])\b
DEMO
and get string by group $1. Or if your language support look around, try:
(?<=\s[0-3]\s|99)[A-Z].+?(?=\s[0-5]\s)
DEMO
to get match directly.
Another solution that is based on matching all initial space + digit sequences:
\b(?:(?:[0-3]|99)\b\s*)+(.*?)\s*\b(?:[0-5])\b
See demo
The result is in Group 1.
With \b(?:(?:[0-3]|99)\b\s*)+ the rightmost number from the allowed leading set is picked.
You can use following regex :
(?:(?:[0-3]|99)\s)+(.*?)\s(?:[0-5])\s
See demo https://regex101.com/r/iX6oE1/6
Also note that for matching a range of number you can use a character class instead of multiple OR.

Regex to match numbers and commas, but not numbers starting with 0 unless it's 0,

Well I tried to sum it up in the title.
I need a reg ex to match numbers and commas, but not numbers starting with 0 unless it's 0,number
My users enter hours in a field, so they have to be able to enter 0,3 hours, but they are not allowed to write 002 or 09.
I have this reg ex
^[0-9]*\,?[0-9]+$
How can I extend it to not allow start with 0 unless the 0 is followed by a comma
Another one :)
^(0|[1-9]\d*(|,\d+)|0,\d+)$
This one should suit your needs:
^0,\d*[1-9]|[1-9]\d*$
either 0,\d*[1-9]: a 0, followed by a comma, followed by 0 or more digit, followed by one digit between 1 and 9
or [1-9]\d*: a digit between 1 and 9, followed by zero or more digit
Matches:
0,3
0,03
3
30
Doesn't match:
0
0,0
0,30
03
You don't need to force everything into a single regex to do this.
It will be far clearer if you use multiple regexes, each one making a specific check.
if ( /^[0-9]+,[0-9]+$/ || /^[1-9][0-9]*$/ )
Here we are making two different checks. "Either this one matches, or the other one matches", and then you don't have to jam both conditions into one regex.
Let the expressive form of your host language be used, rather than trying to cram logic into a regex.

I'm trying to do a search/replace using regex for mass replacing on Notepad++

I need to add a parameter for each code and name, i tried using (.+) or (.*) for each number, but it didnt work. Each space means that is a different number and not every space has the same width. Example from this:
Abanda CDP 192 129 58 0 0 0 2 3 3
2.998 0.013 33.091627 -85.527029 2582661
To this:
Abanda CDP |code1=192 |code2=129 |code3=58 |code4=0 |code5=0 |code6=0 |code7=2 |code8=3 |code9=3
|code9=2.998 |code10=0.013 |code11=33.091627 |code12=-85.527029 |code13=2582661
Try ([0-9.-]+). The reason .+ doesn't work is because . matches whitespace as well. The reason you can't just use \S+ (non-spaces) is because you only want to match the numbers.