What is wrong with this Regular Expression? - regex

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.

you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.

.*?idUser=([0-9]+).*?
That regex should work for you :o)

Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match

Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))

Related

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

RegEx multiple capture groups replaced in a string

I have a string of data...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
I want to take the following string and make 5 strings.
"123456712J456","D","TEST1"
"123456712J456","D","TEST2"
"123456712J456","D","TEST3"
"123456712J456","D","TEST4"
"123456712J456","D","TEST5"
I currently have the following regex...
//In a program like Textpad
<FIND> "\(.\{13\}\)","D","\([^~]*\)~\(.*\)
<REPLACE> "\1","D","\2"\n"\1","D","\3
//On the regex101 site
"(.{13})","D","([^~]*)~(.*)
Now if I run this 5 times it would work fine. The problem is there is an unknown number of lines to be made. For example...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
"123456712J457","D","TEST1~TEST2~TEST3"
"123456712J458","D","TEST1~TEST2"
"123456712J459","D","TEST1~TEST2~TEST3~TEST4"
I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea?
RESOURCES:
http://www.regular-expressions.info/captureall.html
http://regex101.com/
Try this.See demo.Just club match1 and rest of the matches.
http://regex101.com/r/yR3mM3/17
RegEx:
(.*,)|([^"~]+)
Example:
"1234567123456","T","TEST1~TEST2~TEST3~TEST4~TEST5"
Results:
MATCH 1
1. [0-20] `"1234567123456","T",`
MATCH 2
2. [21-26] `TEST1`
MATCH 3
2. [27-32] `TEST2`
MATCH 4
2. [33-38] `TEST3`
MATCH 5
2. [39-44] `TEST4`
MATCH 6
2. [45-50] `TEST5`

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.
You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$
This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)
use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$

How can I get a list of regex matches for a group?

I have a group which can occur any number of times in the input string. I need to get a list of all the matching items.
For example, for input:
example repeattext 1 anything here repeattext 2 anything repeattext 3
My regex is:
(repeattext \d)
I want to get the list of matches for the group. Is it possible to use regex here or do I need to parse it myself?
Yes, you can use regex here. Your existing regex will do fine.
See http://rubular.com/r/fS8c9C61rG for it in use on your example.
If numbers will ever become 10 or higher, consider this regex:
(repeattext \d+)
^
|
`- matches 1 or more repeating of previous
Use
result = subject.scan(/repeattext \d+/)
=> ["repeattext 1", "repeattext 2", "repeattext 3"]
See the docs for the .scan() method.