Making some info optional in regex - regex

I am writing regex for below pattern: sftp://user:password#host[:port]/path
I have written the following sftp://(.+):(.+)#(.+):(\d+)/(.*) which matches the pattern, where group1 matches user, group2 matches password, group3 matches host name and group4 matches port number and group5 matches path
However, the port number can be optional parameter, I have tried the below regex where port group is followed by a ?.
sftp://(.+):(.+)#(.+)(:(\d+))?\/(.*)
Here group3 matches with host:port which is not what is expected.
How to make the regex where the port param is optional ?

Use
sftp://([^/#]+):([^/#]+)#([^/]+?)(?::(\d+))?/(.*)
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
sftp:// 'sftp://'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/#]+ any character except: '/', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^/#]+ any character except: '/', '#' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^/]+? any character except: '/' (1 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
( group and capture to \5:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \5

Related

Regex matches from right instead of left

I don't know if I diagnosed the issue correctly, but it seems the analyzer starts from right to left, instead of from left to right.
My regex is: \/root\/(.+)\/(.+)\/(.+)\/(.+)$, and my trial sentence is irrelevant/root/user/course/exercise/custom/folder/file.txt
The match is correct, but not the groups. I want to get:
group 1: user
group 2: course
group 3: exercise
group 4: custom/folder/file.txt (basically everything that goes after the previous groups).
I'm running this on TS and playing with it on regex101.com (ES set)
Use
\/root\/([^/]+)\/([^/]+)\/([^/]+)\/(.*)$
See proof.
EXPLANATION
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
root 'root'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^/]+ any character except: '/' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Optional element between .*?

I'm trying to extract an optional element via PCRE from the following example.
I need to pull out the xxxx-xxx-xxxx-xxxx-xxxxx if ActivityID exists.
I'm guessing I need to use lookaheads or the like but I can't quite wrap my head around it.
</Level><Task>...<Correlation ActivityID='{xxxx-xxx-xxxx-xxxx-xxxxx}'/><Execution...</Channel>
This works if the element exists, saving to taco64:
<\/level>(?<taco16>.*?)ActivityID='{(?<taco64>.*)}'(?<taco32>.*?)<Computer>
Being optional drops everything into taco32.
<\/level>(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}')?(?<taco32>.*?)<Computer>
Use
<\/level>(?:(?<taco16>.*?)(ActivityID='{(?<taco64>.*)}'))?(?<taco32>.*?)<Computer>
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
level> 'level>'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
(?<taco16> group and capture to taco16:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
--------------------------------------------------------------------------------
) end of taco16
--------------------------------------------------------------------------------
(?<taco64> group and capture to taco64:
--------------------------------------------------------------------------------
ActivityID='{ 'ActivityID=\'{'
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of taco64
--------------------------------------------------------------------------------
}' '}\''
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
(?<taco32> group and capture to taco32:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of taco32
--------------------------------------------------------------------------------
<Computer> '<Computer>'

How to remove the hash tag from this regex

My current regex:
(\d.{17})[^#]*(\D+)(\d+)gr(\d+)
In group 2, they are still having the hashtag, I want to remove it from there. What should I change from my current regex?
201223E0MWJPJD2230#AdeSaputra290gr99000
2101023CNV6TT1109J#Fefe430gr142000
2101183EDTFPSA0128#Jessica500gr112000
201221E2QKWRY11413#EssyYosita880gr233500
2101123G9XQ7R41705#Meily1120gr329000
201228ECEWTJT50859#WidyaNatali1720gr457230
201227EEBX1K9K1020#Excelio112gr58900
2101112N4YNFB12016#DebyNath520gr156220
2101072R8A0QB22347#AlycieHandoTan700gr85000
Output:
group 1: 201223E0MWJPJD2230
group 2: #AdeSaputra
group 3: 290
group 4: 99000
remove [^] from your regex
(\d.{17})#*(\D+)(\d+)gr(\d+)
see this
\D matches any non-digits and it matches # in your input.
Add # to the pattern instead of the opposite [^#] so as to match it.
Use
^(\d.{17})#(\D+)(\d+)gr(\d+)$
See proof it works. Adding anchors to match entire strings.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
.{17} any character except \n (17 times)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\D+ non-digits (all but 0-9) (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
gr 'gr'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Sort Email:Pass List by first occurrence of each email

I want to sort a email:password by first occurrence of each email.
Example list:
email#example.com:passsword1
email#example.com:passsword2
email#example.com:passsword3
email1#example.com:passsword1
email1#example.com:passsword2
email1#example.com:passsword2
So only
email#example.com:passsword1
email1#example.com:passsword1
should be kept as result.
With my limited Regex skills I worked out this one but I guess I misunderstand something:
^(.*)(\r?\n\1)+(?=:)
Use
^((.*:).*)(?:\r?\n\2.*)+
See proof, use g and m flags.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
\n '\n' (newline)
--------------------------------------------------------------------------------
\2 what was matched by capture \2
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)+ end of grouping

Regex for URLs shows ".finance" as invalid/error

I got this regex for my laravel URL validator and if i put a ".finance" domain it shows it's against the regex. What is wrong? All other tested domain endings work so far.
$regex = '/^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/';
Replace [a-z]{2,5} with [a-z]{2,} to allow any two or more letters in the TLD.
Remove {1}, these are always unnecessary.
(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)? is way to repeptitive, shorten it with optional groups / chars to (?:https?:\/\/(?:www\.)?)?.
Use
/^(?:https?:\/\/(?:www\.)?)?[a-z0-9]+(?:[-.][a-z0-9]+)*\.[a-z]{2,}(?::[0-9]{1,5})?(\/.*)?$/
See proof.
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[-.] any character of: '-', '.'
--------------------------------------------------------------------------------
[a-z0-9]+ any character of: 'a' to 'z', '0' to '9'
(1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
[a-z]{2,} any character of: 'a' to 'z' (at least 2
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
[0-9]{1,5} any character of: '0' to '9' (between 1
and 5 times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string