Regex: How to make range to match minimum possible length? - regex

Regex: [0-9]{6-8}([0-9]{4})
Test Strings:
sfad 123456781234 afd sadfa fdads
sfd 12345671234 24312 fasdfa dsfafd
221 1234561234 safd safd23 34
Expected:
I need the end part 1234 captured in a group on each line.
Actual: No matches. :(
I would like to make [0-9]{6-8} to match the least possible characters so all these 3 strings would match. How do I make this lazy as it seems to be greedy now.
I need only regex solutions as this is part of a bigger solution. Here's a link to play with it: https://regex101.com/r/eF5pA9/1

[0-9]{6,8}([0-9]{4}) with g modifier matches all three.
Thanks #anubhava and #Anderson Pimentel.

Related

Is it possible to negate a group in a regular expression?

Let's say that we have this text:
2020-09-29
2020-09-30
2020-10-01
2020-10-02
2020-10-12
2020-10-16
2020-11-12
2020-11-23
2020-11-15
2020-12-01
2020-12-11
2020-12-30
I want to do something like this:
\d\d\d\d-(NOT10)-(30)
So i want to get all dates of any year, but not of the 10th month and it is important, that the day is 30.
I tried a lot to do this using negative lookahead asserations but i did not come up with any working regexes.
You can use negative lookaheads:
\d\d\d\d-(?!10)\d\d-30
The Part (?!10) ensures that no 10 follows at the point where it is inserted into the regex. Notice that you still need to match the following digits afterwards, thus the \d\d part.
Generally speaking you can not (to my knowledge) negate a part that then also matches parts of the string. But with negative lookaheads you can simulate this as I did above. The generalized idea looks something like:
(?!<special-exclusion-pattern>)<general-inclusion-pattern>
Where the special-exclusion-pattern matches a subset of the general-inclusion-pattern. In the above case the general inclusion pattern is \d\d and the special exclusion pattern ins 10.
Try :
/20\d{2}-(?:0[1-9]|1[12])-30/
Explanation :
20\d{2} it will match 20XX
(?:0[1-9]|1[12]) it will match 0X or 11, 12
30 it will match 30
Demo :https://regex101.com/r/O2F1eV/1
It's easiest to simply convert the substring (if present) that matches /^\d{4}-10-30$/ to an empty string, then split the resulting string on one or more newlines.
If your string were
2020-10-16
2020-10-30
2020-11-12
2020-11-23
and was held by the variable str, then in Ruby, for example,
str.sub(/^\d{4}-10-30$/,'')
#=> "2020-10-16\n\n2020-11-12\n2020-11-23\n"
so
str.sub(/^\d{4}-10-30$/,'').split
#=> ["2020-10-16", "2020-11-12", "2020-11-23"]
Whatever language you are using undoubtedly has similar methods.

How can I use Regex to capture a certain set of ages?

I have a set of data, like below;
1
2
3
4
5
6
7
8
9
10
1,1
1,2
1,3
2,12
11,13,15
7,8,12
And so on... I am trying to use Regex in to target a certain set of ages between 1-7, but I am getting matches on any double digit which contains any of these characters too. My regex is currently as below;
/^(1)|(2)|(3)|(4)|(5)|(6)|(7)|$/g
My current matches include 1,2,3,4,5,6,7 - perfect. However, it matches the line with 11,13,15 and 7,8,12 - not what I wanted.
Any advice would be appreciated on how to resolve? Thanks in advance, I am continuing to try to correct.
You can use word boundaries:
\b[1-7]\b
See a demo on regex101.com.
As pointed out by #Quantic, this matches numbers from 1-7 regardless where they are.
If you only want to have lines where there is a number between 1-7, you'll need to use anchors:
^[0-7]$
Or if you want to capture the number:
^([0-7])$
With this, you'll need the multiline flag, see a demo on regex101.com as well.
(?<!\d)[1-7](?!\d)
This looks for any digit 1-7 that does not have another digit on either side of it. (using negative lookbehind/lookahead)
regex101 test

RegEx multiple capture groups replaced in a string

I have a string of data...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
I want to take the following string and make 5 strings.
"123456712J456","D","TEST1"
"123456712J456","D","TEST2"
"123456712J456","D","TEST3"
"123456712J456","D","TEST4"
"123456712J456","D","TEST5"
I currently have the following regex...
//In a program like Textpad
<FIND> "\(.\{13\}\)","D","\([^~]*\)~\(.*\)
<REPLACE> "\1","D","\2"\n"\1","D","\3
//On the regex101 site
"(.{13})","D","([^~]*)~(.*)
Now if I run this 5 times it would work fine. The problem is there is an unknown number of lines to be made. For example...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
"123456712J457","D","TEST1~TEST2~TEST3"
"123456712J458","D","TEST1~TEST2"
"123456712J459","D","TEST1~TEST2~TEST3~TEST4"
I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea?
RESOURCES:
http://www.regular-expressions.info/captureall.html
http://regex101.com/
Try this.See demo.Just club match1 and rest of the matches.
http://regex101.com/r/yR3mM3/17
RegEx:
(.*,)|([^"~]+)
Example:
"1234567123456","T","TEST1~TEST2~TEST3~TEST4~TEST5"
Results:
MATCH 1
1. [0-20] `"1234567123456","T",`
MATCH 2
2. [21-26] `TEST1`
MATCH 3
2. [27-32] `TEST2`
MATCH 4
2. [33-38] `TEST3`
MATCH 5
2. [39-44] `TEST4`
MATCH 6
2. [45-50] `TEST5`

Regex: Match all, but ignore a specific word

Sample 1 String:
Aquaman Figure, XL DC Comics
Sample 2 String:
Rocket Raccoon, Mini Marvel
Regex:
/(DC Comics|Marvel)/
Match Sample 1:
DC Comics
Match Sample 2:
Marvel
Works perfectly in Regex101
How do I reverse this?
I want to match Aquaman Figure, XL and Rocket Raccoon, Mini only.
Edit:
/(.+)(?=Marvel)/ seems to do the job. It excludes Marvel from Rocket Raccon! How do I make this also work with DC comics?
/(.+)(?=Marvel)/ (or /(.+)(?=DC Comics|Marvel)/ for both) isn't going to work for something like:
John Marvel Bob
For which I presume you want the result to be:
John Bob
You'll only get John in the first match, and you'll get Marvel Bob in the second match (since look-ahead doesn't consume the looked-ahead characters).
Or something that doesn't contain either of the strings (since you require that the next characters match some given characters to get a match).
The easiest solution is probably just replacing the two desired sub-strings with empty strings. Replace:
DC Comics|Marvel
with:
(empty string)
Or you can repeatedly search for:
/(.*?)(DC Comics|Marvel|$)/
And just extract the first group (which will correspond to what matches .*, which is everything starting from the end of the last match up to just before "DC Comics", "Marvel" or the end of the string).
The reluctant quantifier ? is needed to prevent the .* from matching John Marvel Bob, rather than just John in John Marvel Bob Marvel.
re.findall(r"(.*)(?=Marvel|Comics)",input)
This does exactly what you are looking for.Its in python.input will be your string.

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))