regex match URL path only with specific chars?

regex match URL path only with specific chars? - regex

I search a regex in PHP to match a simple URL path with specific characters and not more.
My regex don't work exactly (flag 'gm' only for test. in working process please without 'g' for more exactly.):
/^\/[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?$/gm
URL path Examples with comment:
#match: YES
/
/trip-001
/trip-001/
/trip-001/summer-2019
/trip-001/summer-2019/
/trip-001/summer-2019/ibiza-001/
/trip-001/summer-2019/ibiza-001/PICT-001
#match: NO
//
trip-001
trip-001/
trip-001/summer-2019
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001
trip-001//
//trip-001/summer-2019
//trip-001//summer-2019
trip-001//summer-2019
//trip-001/summer-2019/
//trip-001//summer-2019//
trip-001//summer-2019/
trip-001/summer-2019//
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
//trip-001/summer-2019/ibiza-001/
//trip-001//summer-2019/ibiza-001/
//trip-001/summer-2019//ibiza-001/
//trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001/summer-2019/ibiza-001/PICT-001
# and similar
/trip-001/summer-2019/ibiza-001/PICT-001/
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
trip-001/summer-2019/ibiza-001/PICT-001/
trip-001/summer-2019/ibiza-001/whatever-987/PICT001
trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
I have no idea it works with {n}.
Only this charset: A-Z a-z 0-9 - / and exactly no more. Please no \d for digits.
It's for a !preg_match() in PHP.
EDIT: Leading slash is a must have. Double slash and more is not allowed. Trailing slash yes or no.

It appears the URL should only be valid if there are not more than 5 slashes.
You may adjust your pattern as
^(?!(?:[^\/]*\/){5})(?:(?:\/[A-Za-z0-9-]+){1,4}\/?|\/)$
See regex demo
Details
^ - start of string
(?!(?:[^\/]*\/){5}) - a negative lookahead that fails the match if there are 5 occurrences of / chars in the string
(?: - start of the non-capturing group:
(?:\/[A-Za-z0-9-]+){1,4}\/? - 1 to 4 occurrences of a / and 1+ ASCII alphanumeric or - chars and then an optional / char
| - or
\/ - a single / char in the string
) - end of the non-capturing group
$ - end of string.

Related

Regex Pattern that has to include something after /

Using Regex, I want to match any URL that includes the /it-jobs/ but must have something after the final /.
To be a match the URL must have /it-jobs/ + characters after the trailing / otherwise it should not match. Please refer to below example.
Example: www.website.com/it-jobs/ - is not a match
www.website.com/it-jobs/java-developer - is a match
www.website.com/it-jobs/php - is a match
www.website.com/it-jobs/angular-developer - is a match

You can use
/it-jobs/[^/\s]+$
To match the whole string, add .* at the pattern start:
.*/it-jobs/[^/\s]+$
See the regex demo.
Details:
.* - zero or more chars other than line break chars as many as possible
/it-jobs/ - a literal string
[^/\s]+ - any one or more chars other than / and whitespaces
$ - end of string.

Google Analytics exclude string apart from one variation using Regex

I'm trying to configure a Google Analytics filter that would exclude all pages matching ^/app$ and ^/app/.* apart from the following: /app/business-signup
It looks like Google doesn't support negative lookahead. I have found the 2 following relevant discussions, but I haven't managed to make it work. Google Analytics Regex include and exclude string without negative lookahead and RegExp alternative to negative lookahead match for Google Analytics
So far the following include filter is showing the most accurate results, but it's still excluding URLs that shouldn't be excluded.
^(/$|/app/business\-signup|/[^a][^p][^p][^/])
Expected excluded URLs:
/app
/app/
/app/abc
/app/abc?test=1
...
Expected included URLs:
/app/business-signup
/
/?test=1
/about
/about/abc
...

I think you can use:
^/(?:app/business-signup|(?:[^a\n].?.?|.?[^p\n].?|.?.?[^p\n])(?:/.*|$)|...[^/\n].*)?$
See an online demo
^/ - Start-line anchor followed by a literal forward slash;
(?: - Open non-capture group;
app/business-signup - Match the one option you want to exclude;
| - Or;
(?:[^a\n].?.?|.?[^p\n].?|.?.?[^p\n])(?:/.*|$) - Two nested non-capture groups. Where the 1st would match up to three characters where we excluded the 'a', 'p' and 2nd 'p' in order to exclude the word 'app' followed by the 2nd group to match a literal forward slash followed by 0+ more characters or end-line anchor;
| - Or;
...[^/\n].* - Match three characters followed by any character other than forward slash followed by 0+ characters;
)? - Close non-capture group and make it optional to allow a single forward slash;
$ - End-line anchor.
Note: you may just remove the newline characters from the pattern if need be.

You can use
^(/$|/app/business-signup|/(?:[^a]..|.[^p].|..[^p]).*)
See the regex demo. Details:
^ - start of string
(/$|/app/business\-signup|/(?:[^a]..|.[^p].|..[^p]).*) - Group 1:
/$ - / at the end of string
| - or
/app/business-signup - a /app/business-signup fixed string
| - or
/(?:[^a]..|.[^p].|..[^p]).* - a /, then either any char other than a and then any two chars, or any char + a char other than p + any char, or any two chars and then any char other than p, and the rest of the line.

Regex - count number of characters to validate match

My goal is to validate instagram profile links via a regular expression.
So for example this one is valid:
https://www.instagram.com/test.profile/
This one is not:
https://www.instagram.com/explore/tags/test/
Using this regex
(?:(?:http|https):\/\/)?(?:www\.)?(?:instagram\.com|instagr\.am)\/([A-Za-z0-9-_\.]+)
On this text: https://www.instagram.com/explore/tags/test/
produces a match https://www.instagram.com/explore, but this one I want to avoid and discharge.
LIVE DEMO HERE
My question: is possible to add an additional syntax in the regex to validate a match ONLY if the string contains exactly 4 slashes (/)?

You can make the / char obligatory if you add \/ at the end:
^(?:https?:\/\/)?(?:www\.)?(?:instagram\.com|instagr\.am)\/([\w.-]+)\/$
Note that [a-zA-Z0-9_] can most probably be replaced with \w (especially, if it is JavaScript, PHP, Java or Ruby) to make the pattern shorter. It won't hurt even in those regex flavors where \w is Unicode-aware by default (Python re, .NET).
See the regex demo. Details:
^ - start of string
(?:https?:\/\/)? - an optional http:// or https://
(?:www\.)? - an optional www. string
(?:instagram\.com|instagr\.am) - instagram.com or instagr.am
\/ - a / char
([\w.-]+)- Group 1: one or more letters, digits, _, . or - chars
\/ - a / char
$ - end of string.

Regex to properly match urls with a particular domain and also if there is a subdomain added

I have the following regex:
(^|^[^:]+:\/\/|[^\.]+\.)hello\.net
Which seems to work fors most cases such as these:
http://hello.net
https://hello.net
http://www.hello.net
https://www.hello.net
http://domain.hello.net
https://solutions.hello.net
hello.net
www.hello.net
However it still matches this which it should not:
hello.net.domain.com
You can see it here:
https://regex101.com/r/fBH112/1
I am basically trying to check if a url is part of hello.net. so hello.net and any subdomains such as sub.hello.net should all match.
it should also match hello.net/bye. So anything after hello.net is irrelevant.

You may fix your pattern by adding (?:\/.*)?$ at the end:
(^|^[^:]+:\/\/|[^.]+\.)hello\.net(?:\/.*)?$
See the regex demo. The (?:\/.*)?$ matches an optional sequence of / and any 0 or more chars and then the end of string.
You might consider a "cleaner" pattern like
^(?:\w+:\/\/)?(?:[^\/.]+\.)?hello\.net(?:\/.*)?$
See the regex demo. Details:
^ - start of string
(?:\w+:\/\/)? - an optional occurrence of 1+ word chars, and then :// char sqequence
(?:[^\/.]+\.)? - an optional occurrence of any 1 or more chars other than / and . and then .
hello\.net - hello.net
(?:\/.*)?$ - an optional occurrence of / and then any 0+ chars and then end of string

Issue matching exact word

I am building a website validator regex that can match a url.
Thing is, it 90% works! It goes in and out of my string match which is where the issue is.
My regex: (http(s?)://www.|www.|http(s?)://)+[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(/.)?
My string to test with:
1)(This should fail, but it passes) https://www.xy
2)(This should pass, which it does) https://www.xy.com
It keeps going into my group (http(s?)://) instead of the group ((http(s?)://www.)
Any idea on how to solve this?
URL i want to pass:
http://www.test.com
http://test.com
https://test.com
https://www.test.com
URL i want to fail:
http://www.bla
https://www.ggg
So, if it matches https://www. or http://www. it should use the correct group and then apply the rest of the regex where it checks that it contains.. test.com or etc.

You may use
^(?:https?:\/\/)?(?!www\.[^.]+$)(?:www\.)?[a-z0-9]+(?:[-.][a-z0-9]+)*\.[a-z]{2,5}(?::[0-9]{1,5})?(\/.*)?$
See the regex demo
Details
^ - start of string
(?:https?:\/\/)? - an optional http:// or https://
(?!www\.[^.]+$) - a negative lookahead that fails the match if immediately to the right of the current position there is www. and then any 1+ chars other than dot to the end of the string
(?:www\.)? - an optional www.
[a-z0-9]+ - 1+ lowercase letters and digits
(?:[-.][a-z0-9]+)* - 0 or more repetitions of - or . and then 1+ lowercase letters and digits
\. - a .
[a-z]{2,5} - two to five lowercase letters
(?::[0-9]{1,5})? - an optional sequence of : and 1 to 5 digits
(\/.*)? - an optional sequence of / and the rest of the line
$ - end of the string.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

regex match URL path only with specific chars? - regex

Related

Regex Pattern that has to include something after /

Google Analytics exclude string apart from one variation using Regex

Regex - count number of characters to validate match

Regex to properly match urls with a particular domain and also if there is a subdomain added

Issue matching exact word

Categories

Resources