Select text in regex between 2 strings - regex

I have the following line :
3EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E27 SON PIA PLUS
I'd like to get the string : EI AMANDINE MRV SHP 70 W. So I decided to select the strings between 1 (can also be 2, 3 or 99) and 0 (can also be 1, 2, 3, 4 or 5).
I tried :
(0|1|2|3|99)(.*)(0|1|2|3|4|5)
But I have this result :
EAM7A 1 3 EI AMANDINE MRV SHP 70 W 0 SH3-A1 1 SHP 70W OVOIDE AI E
that is not what I want to obtain.
Do you have an idea in regex to make that selection work ?
Thanks !

You were pretty close! Try this:
\b(?:0|1|2|3|99) ([^0|1|2|3|99].*?) (?:0|1|2|3|4|5)\b
Regex101

I think that you want to match "word" 4 to 9?
Your desired match will be in group 1
^(\S+\s){3}((\S+\s){6})
Enable the multiline option if you have a whole file of subject strings.

You can try with:
\s(?:[0-3]|99)\s([A-Z].*?)\b(?:[0-5])\b
DEMO
and get string by group $1. Or if your language support look around, try:
(?<=\s[0-3]\s|99)[A-Z].+?(?=\s[0-5]\s)
DEMO
to get match directly.

Another solution that is based on matching all initial space + digit sequences:
\b(?:(?:[0-3]|99)\b\s*)+(.*?)\s*\b(?:[0-5])\b
See demo
The result is in Group 1.
With \b(?:(?:[0-3]|99)\b\s*)+ the rightmost number from the allowed leading set is picked.

You can use following regex :
(?:(?:[0-3]|99)\s)+(.*?)\s(?:[0-5])\s
See demo https://regex101.com/r/iX6oE1/6
Also note that for matching a range of number you can use a character class instead of multiple OR.

Related

Regular Expression for parsing a sports score

I'm trying to validate that a form field contains a valid score for a volleyball match. Here's what I have, and I think it works, but I'm not an expert on regular expressions, by any means:
r'^ *([0-9]{1,2} *- *[0-9]{1,2})((( *[,;] *)|([,;] *)|( *[,;])|[,;]| +)[0-9]{1,2} *- *[0-9]{1,2})* *$'
I'm using python/django, not that it really matters for the regex match. I'm also trying to learn regular expressions, so a more optimal regex would be useful/helpful.
Here are rules for the score:
1. There can be one or more valid set (set=game) results included
2. Each result must be of the form dd-dd, where 0 <= dd <= 99
3. Each additional result must be separated by any of [ ,;]
4. Allow any number of sets >=1 to be included
5. Spaces should be allowed anywhere except in the middle of a number
So, the following are all valid:
25-10 or 25 -0 or 25- 9 or 23 - 25 (could be one or more spaces)
25-10,25-15 or 25-10 ; 25-15 or 25-10 25-15 (again, spaces allowed)
25-1 2 -25, 25- 3 ;4 - 25 15-10
Also, I need each result as a separate unit for parsing. So in the last example above, I need to be able to separately work on:
25-1
2 -25
25- 3
4 - 25
15-10
It'd be great if I could strip the spaces from within each result. I can't just strip all spaces, because a space is a valid separator between result sets.
I think this is solution for your problem.
str.replace(r"(\d{1,2})\s*-\s*(\d{1,2})", "$1-$2")
How it works:
(\d{1,2}) capture group of 1 or 2 numbers.
\s* find 0 or more whitespace.
- find -.
$1 replace content with content of capture group 1
$2 replace content with content of capture group 2
you can also look at this.

How can I use Regex to capture a certain set of ages?

I have a set of data, like below;
1
2
3
4
5
6
7
8
9
10
1,1
1,2
1,3
2,12
11,13,15
7,8,12
And so on... I am trying to use Regex in to target a certain set of ages between 1-7, but I am getting matches on any double digit which contains any of these characters too. My regex is currently as below;
/^(1)|(2)|(3)|(4)|(5)|(6)|(7)|$/g
My current matches include 1,2,3,4,5,6,7 - perfect. However, it matches the line with 11,13,15 and 7,8,12 - not what I wanted.
Any advice would be appreciated on how to resolve? Thanks in advance, I am continuing to try to correct.
You can use word boundaries:
\b[1-7]\b
See a demo on regex101.com.
As pointed out by #Quantic, this matches numbers from 1-7 regardless where they are.
If you only want to have lines where there is a number between 1-7, you'll need to use anchors:
^[0-7]$
Or if you want to capture the number:
^([0-7])$
With this, you'll need the multiline flag, see a demo on regex101.com as well.
(?<!\d)[1-7](?!\d)
This looks for any digit 1-7 that does not have another digit on either side of it. (using negative lookbehind/lookahead)
regex101 test

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

Regex to match numbers and commas, but not numbers starting with 0 unless it's 0,

Well I tried to sum it up in the title.
I need a reg ex to match numbers and commas, but not numbers starting with 0 unless it's 0,number
My users enter hours in a field, so they have to be able to enter 0,3 hours, but they are not allowed to write 002 or 09.
I have this reg ex
^[0-9]*\,?[0-9]+$
How can I extend it to not allow start with 0 unless the 0 is followed by a comma
Another one :)
^(0|[1-9]\d*(|,\d+)|0,\d+)$
This one should suit your needs:
^0,\d*[1-9]|[1-9]\d*$
either 0,\d*[1-9]: a 0, followed by a comma, followed by 0 or more digit, followed by one digit between 1 and 9
or [1-9]\d*: a digit between 1 and 9, followed by zero or more digit
Matches:
0,3
0,03
3
30
Doesn't match:
0
0,0
0,30
03
You don't need to force everything into a single regex to do this.
It will be far clearer if you use multiple regexes, each one making a specific check.
if ( /^[0-9]+,[0-9]+$/ || /^[1-9][0-9]*$/ )
Here we are making two different checks. "Either this one matches, or the other one matches", and then you don't have to jam both conditions into one regex.
Let the expressive form of your host language be used, rather than trying to cram logic into a regex.

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))