Regex - capture all repeated iteration - regex

I have a variable like this
var = "!123abcabc123!"
i'm trying to capture all the '123' and 'abc' in this var.
this regex (abc|123) retrieve what i want but...
My question is: when i try this regex !(abc|123)*! it retrieve only the last iteration. what will i do to get this output
MATCH 1
1. [1-4] `123`
MATCH 2
1. [4-7] `abc`
MATCH 3
1. [7-10] `abc`
MATCH 4
1. [10-13] `123`
https://regex101.com/r/mD4vM8/3
Thank you!!

If your language supports \G then you may free to use this.
(?:!|\G(?!^))\K(abc|123)(?=(?:abc|123)*!)
DEMO

Related

How to use a selective regex to perform replace in a pandas series?

I would like to use a regex when applying pandas.Series.str.replace. I am aware that it takes in regex, but my output is not as intended. Here is a simple example. Suppose I have
ser = pd.Series(['asd3', 'qwe3', 'asd4', 'zxc'])
I would like to turn the 'asd3' and 'asd4' into 'asd'. That is, simply removing any integer at the end. I am using the code:
ser.str.replace('asd([0-9])','')
Bote that I am using the ([0-9]) notation, which I interpret as saying: for any element of the series, if it looks like 'asd([0-9])', then replace the [0-9] with `` (that is, remove it). But what I get is
0
1 qwe3
2
3 zxc
whereas what I would like to get is:
0 asd
1 qwe3
2 asd
3 zxc
this is a simple example, and my regex string is uglier than that, but I hope this conveys the idea of what I intend to do.
In your case, .replace('asd([0-9])','') just removes asd and any digit after it.
Use
ser.str.replace('asd[0-9]+','asd')
or
ser.str.replace('(asd)[0-9]+',r'\1')
The .replace('asd[0-9]+','asd') will replace asd and any 1+ digits after it with asd, and in .replace('(asd)[0-9]+',r'\1'), the asd substring will be captured into Group 1 (due to the capturing parentheses) and 1+ digits will be matched, and the whole match will be replaced with the \1 placeholder that holds the value of Group 1 (that is, asd).

Convert a regex expression to erlang's re syntax?

I am having hard time trying to convert the following regular expression into an erlang syntax.
What I have is a test string like this:
1,2 ==> 3 #SUP: 1 #CONF: 1.0
And the regex that I created with regex101 is this (see below):
([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)
:
But I am getting weird match results if I convert it to erlang - here is my attempt:
{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
Also, I get more than four matches. What am I doing wrong?
Here is the regex101 version:
https://regex101.com/r/xJ9fP2/1
I don't know much about erlang, but I will try to explain. With your regex
>{ok, M} = re:compile("([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)").
>re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M).
{match,[{0, 28},{0,3},{8,1},{16,1},{25,3}]}
^^ ^^
|| ||
|| Total number of matched characters from starting index
Starting index of match
Reason for more than four groups
First match always indicates the entire string that is matched by the complete regex and rest here are the four captured groups you want. So there are total 5 groups.
([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)
<-------> <----> <---> <--------->
First group Second group Third group Fourth group
<----------------------------------------------------------------->
This regex matches entire string and is first match you are getting
(Zero'th group)
How to find desired answer
Here we want anything except the first group (which is entire match by regex). So we can use all_but_first to avoid the first group
> re:run("1,2 ==> 3 #SUP: 1 #CONF: 1.0", M, [{capture, all_but_first, list}]).
{match,["1,2","3","1","1.0"]}
More info can be found here
If you are in doubt what is content of the string, you can print it and check out:
1> RE = "([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)".
"([\\d,]+).*==>\\s*(\\d+)\\s*#SUP:\\s*(\\d)\\s*#CONF:\\s*(\\d+.\\d+)"
2> io:format("RE: /~s/~n", [RE]).
RE: /([\d,]+).*==>\s*(\d+)\s*#SUP:\s*(\d)\s*#CONF:\s*(\d+.\d+)/
For the rest of issue, there is great answer by rock321987.

RegEx multiple capture groups replaced in a string

I have a string of data...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
I want to take the following string and make 5 strings.
"123456712J456","D","TEST1"
"123456712J456","D","TEST2"
"123456712J456","D","TEST3"
"123456712J456","D","TEST4"
"123456712J456","D","TEST5"
I currently have the following regex...
//In a program like Textpad
<FIND> "\(.\{13\}\)","D","\([^~]*\)~\(.*\)
<REPLACE> "\1","D","\2"\n"\1","D","\3
//On the regex101 site
"(.{13})","D","([^~]*)~(.*)
Now if I run this 5 times it would work fine. The problem is there is an unknown number of lines to be made. For example...
"123456712J456","D","TEST1~TEST2~TEST3~TEST4~TEST5"
"123456712J457","D","TEST1~TEST2~TEST3"
"123456712J458","D","TEST1~TEST2"
"123456712J459","D","TEST1~TEST2~TEST3~TEST4"
I was hoping to be able to use a MULTI capture group to make this work. I found this PAGE talking about the common mistake between repeating a capturing group and capturing a repeated group. I need to capture a repeated group. For some reason I just could not make mine work right though. Anyone else have an idea?
RESOURCES:
http://www.regular-expressions.info/captureall.html
http://regex101.com/
Try this.See demo.Just club match1 and rest of the matches.
http://regex101.com/r/yR3mM3/17
RegEx:
(.*,)|([^"~]+)
Example:
"1234567123456","T","TEST1~TEST2~TEST3~TEST4~TEST5"
Results:
MATCH 1
1. [0-20] `"1234567123456","T",`
MATCH 2
2. [21-26] `TEST1`
MATCH 3
2. [27-32] `TEST2`
MATCH 4
2. [33-38] `TEST3`
MATCH 5
2. [39-44] `TEST4`
MATCH 6
2. [45-50] `TEST5`

How can I get a list of regex matches for a group?

I have a group which can occur any number of times in the input string. I need to get a list of all the matching items.
For example, for input:
example repeattext 1 anything here repeattext 2 anything repeattext 3
My regex is:
(repeattext \d)
I want to get the list of matches for the group. Is it possible to use regex here or do I need to parse it myself?
Yes, you can use regex here. Your existing regex will do fine.
See http://rubular.com/r/fS8c9C61rG for it in use on your example.
If numbers will ever become 10 or higher, consider this regex:
(repeattext \d+)
^
|
`- matches 1 or more repeating of previous
Use
result = subject.scan(/repeattext \d+/)
=> ["repeattext 1", "repeattext 2", "repeattext 3"]
See the docs for the .scan() method.

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))