Optional element between .*? - regex

I'm trying to extract an optional element via PCRE from the following example.
I need to pull out the xxxx-xxx-xxxx-xxxx-xxxxx if ActivityID exists.
I'm guessing I need to use lookaheads or the like but I can't quite wrap my head around it.
</Level><Task>...<Correlation ActivityID='{xxxx-xxx-xxxx-xxxx-xxxxx}'/><Execution...</Channel>
This works if the element exists, saving to taco64:
<\/level>(?<taco16>.*?)ActivityID='{(?<taco64>.*)}'(?<taco32>.*?)<Computer>
Being optional drops everything into taco32.
<\/level>(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}')?(?<taco32>.*?)<Computer>

Use
<\/level>(?:(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}'))?(?<taco32>.*?)<Computer>
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
level> 'level>'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?<taco16> group and capture to taco16:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of taco16
--------------------------------------------------------------------------------
(?<taco64> group and capture to taco64:
--------------------------------------------------------------------------------
ActivityID='{ 'ActivityID=\'{'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of taco64
--------------------------------------------------------------------------------
}' '}\''
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
(?<taco32> group and capture to taco32:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of taco32
--------------------------------------------------------------------------------
<Computer> '<Computer>'

Related

Regex matches from right instead of left

I don't know if I diagnosed the issue correctly, but it seems the analyzer starts from right to left, instead of from left to right.
My regex is: \/root\/(.+)\/(.+)\/(.+)\/(.+)$, and my trial sentence is irrelevant/root/user/course/exercise/custom/folder/file.txt
The match is correct, but not the groups. I want to get:
group 1: user
group 2: course
group 3: exercise
group 4: custom/folder/file.txt (basically everything that goes after the previous groups).
I'm running this on TS and playing with it on regex101.com (ES set)
Use
\/root\/([^/]+)\/([^/]+)\/([^/]+)\/(.*)$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
root 'root'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex that matches everything before some square brackets and after

I want to create an expression that captures everything before and after some square brackets.
Such that:
Test - ho-server-01[IWM]/Memory Usage
Would capture:
Test - ho-server-01
Memory Usage
A few more examples:
Test - ho-server-01[IWM]/Memory Usage
IMWS Test - ho-server-01 [IWM]/Memory Usage
So far i have this ([^[]*)
It sounds like you want something like this:
^([^[]+)\[[^]]+\](.*)$
See it in action here: https://regex101.com/r/dtvekU/1
Use
(.*)\[[^\]\[]*\](.*)
See proof.
Explanation
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
[^\]\[]* any character except: '\]', '\[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\] ']'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2

Regex help - match one string but not another

I have been using this:
~^\/student-accommodation\/(?:[^\/]+?)\/([^\/]+)\/$
to match for URLs like
/student-accommodation/manchester/ropemaker-court-manchester/
But now I need to edit this regex so it also matches for URLs like the below. All these new URLs will follow the same pattern and add a string that starts with #utm-source. Importantly they won't have another / in them.
/student-accommodation/manchester/ropemaker-court-manchester/#utm_source=afs&utm_medium=email&utm_campaign=ropemakercourt_afs_dec20
But then I don't want the regex to match for URLs like the below:
/student-accommodation/manchester/ropemaker-court-manchester/en-suite/
Can anyone help? I am a novice at regex! Thanks
Use
^\/student-accommodation\/[^\/]+\/([^\/]+)\/(?:#utm_source.*)?$
See proof
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
student- 'student-accommodation'
accommodation
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^\/]+ any character except: '\/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
#utm_source '#utm_source'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Sort Email:Pass List by first occurrence of each email

I want to sort a email:password by first occurrence of each email.
Example list:
email#example.com:passsword1
email#example.com:passsword2
email#example.com:passsword3
email1#example.com:passsword1
email1#example.com:passsword2
email1#example.com:passsword2
So only
email#example.com:passsword1
email1#example.com:passsword1
should be kept as result.
With my limited Regex skills I worked out this one but I guess I misunderstand something:
^(.*)(\r?\n\1)+(?=:)
Use
^((.*:).*)(?:\r?\n\2.*)+
See proof, use g and m flags.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
\2 what was matched by capture \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)+ end of grouping

Making some info optional in regex

I am writing regex for below pattern: sftp://user:password#host[:port]/path
I have written the following sftp://(.+):(.+)#(.+):(\d+)/(.*) which matches the pattern, where group1 matches user, group2 matches password, group3 matches host name and group4 matches port number and group5 matches path
However, the port number can be optional parameter, I have tried the below regex where port group is followed by a ?.
sftp://(.+):(.+)#(.+)(:(\d+))?\/(.*)
Here group3 matches with host:port which is not what is expected.
How to make the regex where the port param is optional ?
Use
sftp://([^/#]+):([^/#]+)#([^/]+?)(?::(\d+))?/(.*)
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
sftp:// 'sftp://'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/#]+ any character except: '/', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^/#]+ any character except: '/', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^/]+? any character except: '/' (1 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \5