Perl regular expressions - regex

I'm reading some code that involves regular expression and having some trouble.
Can someone please explain it and give an example of text it would parse?
if(/\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+)/)
{
$a = $1;
$b = $2;
}

One string it matches against is |STUFF1|STUFF2.
YAPE::Regex::Explain
(?-imsx:\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
STUFF 'STUFF'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
STUFF 'STUFF'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+)
\| look for a literal pipe character |.
\s* look for any number (zero or more) whitespace characters.
STUFF look for the string STUFF
(\d+) look for any number of digits (one or more), and save them to $1.
\s* look for any number of whitespace characters (zero or more)
then repeat once, and save the next digit sequence in $2.
If the regex matches, we know that $1 and $2 must be defined (i.e. they have a value).
In that case, we assign $1 to the variable $a and $2 to $b.
As no explicit string to match against is provided, the $_ variable is implicitly used.
Example text:
foo bar |STUFF123|STUFF456 baz bar foo
and
foo |
STUFF0
|STUFF1234567890bar

Related

Atom regexp: discarding multiline text around blocks

Assume I have this text:
blah blah Bob Loblaw Law blah
keep1 { i want this } blop
blah blob keep2 { and
this too } blaw blat
etc...
And I want to end up with
keep1 { i want this }
keep2 { and
this too }
or perhaps:
keep1 { i want this }
keep2 { and this too }
I haven't figured out how to get Atom's regexp find/replace mechanism to discard everything across multiple lines outside of a specific matching string. Hints?
update:
Of the many things I've tried, this gets me closest:
[\S\s]+?(keep\d\s+\{[\S\s]+?\})
which results in:
keep1 { i want this }
keep2 { and
this too }
blaw blat
etc...
This is probably good enough -- I can edit the trailing shards -- but it would be useful to know how to trim those as well.
You may use this simple regex replace in Atom for this task:
\b(keep\d+\s*{[^}]*})|.+?
Replace it with: $1
RegEx Demo
RegEx Details:
\b: Word boundary
(keep\d+\s*{[^}]*}): In capture group #1 match a string that starts with keep followed by 1+ digits followed by 0+ whitespaces followed by any text that is inside {...} spanning across the lines as well. This assumes { and } are balanced and there is no escaping of { and }.
|: OR
.+?: Lazily match 1+ of anything
PS: If you want to remove leading line break then use:
\n?\b(keep\d+\s*{[^}]*})|.+?
Atom Editor Demo
Before replacement:
After replacement:
Use
[\s\S]*?(keep\d\s+\{[^{}]*\})|(?:(?!keep\d\s+\{[^{}]*\})[\s\S])+$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
[\s\S]*? any character of: whitespace (\n, \r, \t,
\f, and " "), non-whitespace (all but \n,
\r, \t, \f, and " ") (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
keep 'keep'
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ")
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
\{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\} '}'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regular expression to extract fields with different whitespace

I'm processing some PDFs (each with a big table inside) with pdftotext and my output is a plain text file with different lines.
From the file, which I would like to read line by line, I want to extract only those lines with the following formatting and ignoring the rest:
YYYYYYYY A ZZZ BBBBBB BBB BBBBBB ZZZZZZZ C ZZZ DDDDDD DDD DD ZZZZZ E ZZZ FFFFFF FFF FFF ZZZZZ G (ZZZZ HHHH HHH HHHHH)
^ ^ ^ ^ ^ ^ ^ ^
Y = whitespace or text => IGNORED
A = 1 digit => CAPTURED
B = text with spaces => CAPTURED
C = 1 digit => CAPTURED
D = text with spaces => CAPTURED
E = 1 digit => CAPTURED
F = text with spaces => CAPTURED
G = 1 digit => CAPTURED
H = text with spaces => CAPTURED, if present
Z = whitespace (variable length >= 3 chars) => IGNORED
() = this part may or may not be present
I tried with the following regex:
^.+(\d)\s+(.{3,}?)\s{3,}(\d)\s+(.{3,}?)\s{3,}(\d)\s+(.{3,}?)\s{3,}(\d)\s+(.{3,}|)$
It works but RegexBuddy says that it leads to a "catastrophic backtracking". So what would be the correct way to handle it? I need to capture the groups A..H to do my processing.
EDIT: Adding a couple of testing lines with fictional words:
01/11/2020 03/11/2020
N. First header N. Secondo header N. Third head N. Last optional header
1 COURAGE AND STOCKS EVERYTHING TO 1 SHE SAYS SHE HAS THE ABILITY 1 CHILD BY THE TO 1 JITTERY X) CLASSIC - A6
2 SHE LIKES ALL THE SAME THINGS 2 CROWD YELLS AND SCREMS MORE THEN MEMESU 2 WEDGES PROBABLY ARE NOT 2 FRAGILE Y) BOILINGE - A6
3 SOUNDTRACK OF ME 3 DROPPING ALL THE MONTHS 3 THE BEST FOR RELATIONSHIPS AT 3 NATURAL Z) ENVIOUSLYER - A6
Nothere 4 GOING AND HE COULD HEAR THUNDER IN KUNE 4 WONDERED IF HER LIE 4 CHANGED FOREV LT.200/250 4
5 MUSTER N.2 GROUNDKEE OF MY. 230 5 LEMONADE SHOWERS OF 5 FOCUS JUST MOST 5
Expected results, line by line:
Fail (there are two dates)
Fail (empty line)
Fail (doesn't have numbers, format mismatch)
Fail (empty line)
Match
Fail (empty line)
Match
Match
Match (the "Nothere" text should be ignored of course)
Match
Use
^.*?\s(\d)\s+(\S.*?)\s+(\d)\s+(\S.*?)\s+(\d)\s+(\S.*?)\s+(\d)(?:\s+(\S.*))?$
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \5
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \6:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \6
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \7:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
) end of \7
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \8:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t,
\f, and " ")
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \8
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

What regex to use in this case

Correct code:
"key1=val1;key2=val2;key3=val3" -- Correct as each pair is having ";" at the end except the last pair
Incorrect code:
"key1=val1;key2=val2; key3=val3;" -- Invalid as last pair is having ";" at the end
"key1=val1;;;key2=val2;;;key3=val3" -- Invalid as there are multiple ";" in the middle
I got the regex below from some old link in stackoverflow, but it is not working in the above case:
^(?:\s*\w+\s*=\s*[^;]*;)+$
You might use
^\w+\s*=\s*\w+(?:;\s*\w+\s*=\s*\w+)*$
Explanation
^ Start of string
\w+\s*=\s*\w+ Match 1+ word chars, = and 1+ word chars with optional whitespace chars
(?: Non capture group
;\s*\w+\s*=\s*\w+ Match ; and the same patter as mentioned above
)* Close the group and repeat 0+ times
$ End of string
Regex demo
With the doubled backslashes
^\\w+\\s*=\\s*\\w+(?:;\\s*\\w+\\s*=\\s*\\w+)*$
Also, a shorter one:
^(?:\s*\w+\s*=\s*\w+(?:;(?!\s*$)|\s*$))+\s*$
See proof
Explanation
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end
of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
)+ end of grouping
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Try below:
.*\w;\w.*\w;\w.*[^;]$
Test here
Explanation:
.* --> matches any character
\w --> matches any word character
[^;]$ --> Will exclude any line ending with ;
I find things like this much easier without regex. For eample with JavaScript:
function isValid(string) {
return string.split(/;/).map(e => e.split(/=/)).every(e => e.length === 2);
}

How to match this pattern (with emoji)?

I have a text file of a few thousand entries built like that:
11111111111: text text text text text :: word11111111: text text text text :: word111111111:
Where:
11111111 is a big number
text text text text can be anything including emoji
word is one of 8 words
the second 111111111 is another number, but different.
I tried, but just couldn't match it.
I don't know how to treat the emoji, and another problem is the spaces are not consistent, sometimes is a whitespace, sometimes tab, and so on.
Description
^([0-9]+):\s*((?:(?!\s::).)*)\s::\s*([^:]+)\s*:\s*((?:(?!\s::).)*)\s::\s*([^:]+):$
This regular expression will do the following:
Capture the leading 11111111
Match the :
Capture the text text text text text which may contain emojis.
Match the ::
Capture the word11111111
match the :
Capture the text text text text text which may contain emojis.
Match the ::
Capture the word11111111
Match the :
Allow the : or :: to be delimiters
Do not include the spaces surrounding the delimiters to be included in the matches.
To see the image better, you can right click it and select open in new window
Example
Live Demo
https://regex101.com/r/qG7uZ7/1
Sample text
11111111111: text text text text text :: word11111111: text text text text :: word111111111:
Capture Groups from match
0. 11111111111: text text text text text :: word11111111: text text text text :: word111111111:
1. `11111111111`
2. `text text text text text`
3. `word11111111`
4. `text text text text`
5. `word111111111`
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
:: '::'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------

match the sequence between a word and the last occurding space

I am looking to extract specific set of characters matching after a particular word till the last space that occurs in the sequence.
Example:
FAILED on portal HTTP (10.1.1.1)
FAILED on portal TELNET 0 SSH (10.1.1.1)
I want the O/P to be:
HTTP
TELNET 0 SSH
Currently using the following RegEX and working on it:
.+((?<=portal)[^\s]]+
Will be helpful if anyone of you can help me out on this :)
Updated from comment:
Text:
1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal TELNET 0 SSH (10.1.2.8:64940)
Regex:
^(\d+).* (\S+\d) ([\w\s]+) (\w* ?AUTHENTICATION:SESSION) (.+) (([\w.]+):(\d+)).*
Typically the groups I would like to have from my sample string would be:
#1 - 1368028793000
#2 - 10.3.1.4
#3 - CISCO X
#4 - AUTHENTICATION:SESSION
#5 - User authentication attempt FAILED on portal
#6 - TELNET 0 SSH
#7 - 10.1.2.8
#8 - 6940
You could maybe try this:
(?<=portal\s)(.+)\s\(
Note that you have a missing closing bracket ) and a missing opening square bracket [, which I assume was a typo. And you need to escape the opening bracket which marks the start of the (10.1.1.1) bit.
All change according to new requirement.
Have a try with:
^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$
Here is a perl script that runs it:
my $re = qr/^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$/;
while(<DATA>) {
chomp;
my #l = ($_ =~ $re);
dump#l;
}
__DATA__
1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal HTTP (10.1.1.1)
1368028793000 10.3.1.4 CISCO X AUTHENTICATION:SESSION User authentication attempt FAILED on portal TELNET 0 SSH (10.1.2.8:64940)
output:
(
1368028793000,
"10.3.1.4",
"CISCO X",
"AUTHENTICATION:SESSION",
"User authentication attempt FAILED on portal",
"HTTP ",
"10.1.1.1",
undef,
)
(
1368028793000,
"10.3.1.4",
"CISCO X",
"AUTHENTICATION:SESSION",
"User authentication attempt FAILED on portal",
"TELNET 0 SSH ",
"10.1.2.8",
64940,
)
regex explain:
The regular expression:
(?-imsx:^(\d+)\s+([\d.]+)\s+([\w\s]+?)\s+(AUTHENTICATION:SESSION)\s+(.+?portal)\s(.+?)\(([\d.]+)(?::(\d+))?\)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[\d.]+ any character of: digits (0-9), '.' (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[\w\s]+? any character of: word characters (a-z,
A-Z, 0-9, _), whitespace (\n, \r, \t,
\f, and " ") (1 or more times (matching
the least amount possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \4:
----------------------------------------------------------------------
AUTHENTICATION:SES 'AUTHENTICATION:SESSION'
SION
----------------------------------------------------------------------
) end of \4
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \5:
----------------------------------------------------------------------
.+? any character except \n (1 or more times
(matching the least amount possible))
----------------------------------------------------------------------
portal 'portal'
----------------------------------------------------------------------
) end of \5
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
( group and capture to \6:
----------------------------------------------------------------------
.+? any character except \n (1 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \6
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \7:
----------------------------------------------------------------------
[\d.]+ any character of: digits (0-9), '.' (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \7
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
( group and capture to \8:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \8
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
You can use this regex
(?<=portal).+(?=\s)
.+ is greedy so it would match till end and then backtrack if necessary..