Issue matching exact word - regex

I am building a website validator regex that can match a url.
Thing is, it 90% works! It goes in and out of my string match which is where the issue is.
My regex: (http(s?)://www.|www.|http(s?)://)+[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(/.)?
My string to test with:
1)(This should fail, but it passes) https://www.xy
2)(This should pass, which it does) https://www.xy.com
It keeps going into my group (http(s?)://) instead of the group ((http(s?)://www.)
Any idea on how to solve this?
URL i want to pass:
http://www.test.com
http://test.com
https://test.com
https://www.test.com
URL i want to fail:
http://www.bla
https://www.ggg
So, if it matches https://www. or http://www. it should use the correct group and then apply the rest of the regex where it checks that it contains.. test.com or etc.

You may use
^(?:https?:\/\/)?(?!www\.[^.]+$)(?:www\.)?[a-z0-9]+(?:[-.][a-z0-9]+)*\.[a-z]{2,5}(?::[0-9]{1,5})?(\/.*)?$
See the regex demo
Details
^ - start of string
(?:https?:\/\/)? - an optional http:// or https://
(?!www\.[^.]+$) - a negative lookahead that fails the match if immediately to the right of the current position there is www. and then any 1+ chars other than dot to the end of the string
(?:www\.)? - an optional www.
[a-z0-9]+ - 1+ lowercase letters and digits
(?:[-.][a-z0-9]+)* - 0 or more repetitions of - or . and then 1+ lowercase letters and digits
\. - a .
[a-z]{2,5} - two to five lowercase letters
(?::[0-9]{1,5})? - an optional sequence of : and 1 to 5 digits
(\/.*)? - an optional sequence of / and the rest of the line
$ - end of the string.

Related

Regex to properly match urls with a particular domain and also if there is a subdomain added

I have the following regex:
(^|^[^:]+:\/\/|[^\.]+\.)hello\.net
Which seems to work fors most cases such as these:
http://hello.net
https://hello.net
http://www.hello.net
https://www.hello.net
http://domain.hello.net
https://solutions.hello.net
hello.net
www.hello.net
However it still matches this which it should not:
hello.net.domain.com
You can see it here:
https://regex101.com/r/fBH112/1
I am basically trying to check if a url is part of hello.net. so hello.net and any subdomains such as sub.hello.net should all match.
it should also match hello.net/bye. So anything after hello.net is irrelevant.
You may fix your pattern by adding (?:\/.*)?$ at the end:
(^|^[^:]+:\/\/|[^.]+\.)hello\.net(?:\/.*)?$
See the regex demo. The (?:\/.*)?$ matches an optional sequence of / and any 0 or more chars and then the end of string.
You might consider a "cleaner" pattern like
^(?:\w+:\/\/)?(?:[^\/.]+\.)?hello\.net(?:\/.*)?$
See the regex demo. Details:
^ - start of string
(?:\w+:\/\/)? - an optional occurrence of 1+ word chars, and then :// char sqequence
(?:[^\/.]+\.)? - an optional occurrence of any 1 or more chars other than / and . and then .
hello\.net - hello.net
(?:\/.*)?$ - an optional occurrence of / and then any 0+ chars and then end of string

regex match URL path only with specific chars?

I search a regex in PHP to match a simple URL path with specific characters and not more.
My regex don't work exactly (flag 'gm' only for test. in working process please without 'g' for more exactly.):
/^\/[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?$/gm
URL path Examples with comment:
#match: YES
/
/trip-001
/trip-001/
/trip-001/summer-2019
/trip-001/summer-2019/
/trip-001/summer-2019/ibiza-001/
/trip-001/summer-2019/ibiza-001/PICT-001
#match: NO
//
trip-001
trip-001/
trip-001/summer-2019
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001
trip-001//
//trip-001/summer-2019
//trip-001//summer-2019
trip-001//summer-2019
//trip-001/summer-2019/
//trip-001//summer-2019//
trip-001//summer-2019/
trip-001/summer-2019//
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
//trip-001/summer-2019/ibiza-001/
//trip-001//summer-2019/ibiza-001/
//trip-001/summer-2019//ibiza-001/
//trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001/summer-2019/ibiza-001/PICT-001
# and similar
/trip-001/summer-2019/ibiza-001/PICT-001/
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
trip-001/summer-2019/ibiza-001/PICT-001/
trip-001/summer-2019/ibiza-001/whatever-987/PICT001
trip-001/summer-2019/ibiza-001/whatever-987/PICT001/
I have no idea it works with {n}.
Only this charset: A-Z a-z 0-9 - / and exactly no more. Please no \d for digits.
It's for a !preg_match() in PHP.
EDIT: Leading slash is a must have. Double slash and more is not allowed. Trailing slash yes or no.
It appears the URL should only be valid if there are not more than 5 slashes.
You may adjust your pattern as
^(?!(?:[^\/]*\/){5})(?:(?:\/[A-Za-z0-9-]+){1,4}\/?|\/)$
See regex demo
Details
^ - start of string
(?!(?:[^\/]*\/){5}) - a negative lookahead that fails the match if there are 5 occurrences of / chars in the string
(?: - start of the non-capturing group:
(?:\/[A-Za-z0-9-]+){1,4}\/? - 1 to 4 occurrences of a / and 1+ ASCII alphanumeric or - chars and then an optional / char
| - or
\/ - a single / char in the string
) - end of the non-capturing group
$ - end of string.

problems with regex url validator

I'm trying to create a regex to test if a url is valid or not. I had a good example to work off of, but I had to tweak it a bit to make it fit my purpose:
^(https?:\/\/)(www\.)?(\w*\.)+([\w\-_~:/?#[\]#!$&'()*+,;=.])*$
It works fine for the most part, but it matches the following, which drives me nuts:
http://www..example..com
I tried forever and I just can't get the magical combination of characters to get it to ignore the above use case. What am I doing wrong?
Here's a list of things I want the regex to match (all of them are matched):
http://www.example.com
https://www.example.com
https://www.example.com/
https://example.com/
https://blog.example.com/
https://my.blog.example.com/
https://my.blog.example.co.uk/
https://www.example.com/#test
https://www.example.com#test
https://www.example.com/test.php
https://www.example.com/test.php?test=yes&testmore=yesevenmore
https://www.example.com/test.php#test
https://www.example.com/test.php?test=yes&testmore2=yesevenmore&whatnumber=42#test
https://www.example.com/test
https://www.example.com/test/
https://www.example.com/test/?test=yes&testmore2=yesevenmore&whatnumber=42
https://www.example.com/test/#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://www.my.blog.example.com/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
https://my.blog.example.co.uk/?test=yes&testmore=yesevenmore&whatnumber=42#test
http://255.255.255.255
http://www.example.com:8008
http://www.example.com:8008/test/?test=yes&testmore=yesevenmore&whatnumber=42#test
Here's a list of things I DON'T want it to match:
www.example.com
example.com
*http://www.blog..example..com
*http://www..example.com
*http://www...example.com
*http://www..example..com
http://www.example.com | not valid
http://www.example.com|
255.255.255.255
* still matched
How can I prevent regex from matching the multidots?
Your pattern matches the dot literally \. as well as in the character class which is repeated 1+ times as a group and (\w*\.)+ also matches consecutive dots.
You could shorten the character class as some parts do not have to be escaped and \w also matches _
Using the characters from your character class that you accept to be valid you could repeat in a group matching what you want to allow excluding the dot and match a single dot at the end:
^https?:\/\/(?:[-\w~:/?#[\]#!$&'()*+,;=]+\.)*[-\w~:/?#[\]#!$&'()*+,;=]+$
That will match
^ Start of string
https?:\/\/ Match http:// or https://
(?: Non capturing group
[-\w~:/?#[\]#!$&'()*+,;=]+\. Match 1+ times any of listed, then match a .
)* Close group and repeat 0+ times
[-\w~:/?#[\]#!$&'()*+,;=]+ Match any of the listed 1+ times (note that there is no .)
$ End of string
Regex demo
A more specific variant:
^https?:\/\/\w+(?:\.\w+)*(?:[/#:][-\w~:/?#[\]#!$&'()*+,;=.]*)?$
Regex demo

Why doesn’t work when regex entering 1 letter after the optional character?

I've custom regex pattern for check correct username on url:
^[#](?:[a-z][a-z0-9_]*[a-z0-9])?$
This pattern work when I write usernames:
#username
#username_16
#username16
But not work when I write:
#u
First part of question:
How to rewrite this pattern for work in #u?
Second part of question:
How control characters limit or length after # symbol?
The [a-z] and [a-z0-9] are obligatory patterns inside the optional group, hence if there is something after #, there must be two chars at least.
Besides, your regex also matches a string that equals #.
To fix all these issues you may use
^#[a-z](?:[a-z0-9_]*[a-z0-9])?$
See the regex demo.
Now, to restrict the length of a string after # symbol, you may insert a (?=.{x,m}$) positive lookahead right after #. Say, to only match 3 or 4 chars after #, use:
^#(?=[a-z0-9_]{3,4}$)[a-z](?:[a-z0-9_]*[a-z0-9])?$
^^^^^^^^^^^^^^^^^^^
Or, since the consuming pattern will validate the rest
^#(?=.{3,4}$)[a-z](?:[a-z0-9_]*[a-z0-9])?$
^^^^^^^^^^^
See this regex demo
Details
^ - start of string
(?=.{3,4}$) - a positive lookahead that requires any 3 or 4 chars other than line break chars up to the end of the string immediately to the right of the current location (i.e. from the string start here)
# - a # char
[a-z] - a lowercase ASCII letter
(?:[a-z0-9_]*[a-z0-9])? - an optional non-capturing group matching 1 or 0 occurrences of
[a-z0-9_]* - 0+ lowercase ASCII letters, digits or _
[a-z0-9] - a lowercase ASCII letter or digits
$ - end of string.

Regex match if certain string is contained after last occurrence of specific character

For example, I want to check if the web url contains 'foo' after last slash, and match the entire url. So the following url should be a match:
https://www.facebook.com/messages/new/foobar
https://www.facebook.com/messages/t/barfoo
https://www.facebook.com/bfooar
https://foobar.com
https://foobar.com/foo
But the following shouldn't:
https://random.com/random
https://foobar.com/something
https://foobar.com/foo/bar
My approach is ((\\.*)*\\.*foo.*), but it seems doesn't work for any url that contains foo before the last slash. Is this pattern even doable in regex? Or I have to use something like split('\') in the code to achieve the desired pattern I want?
Thanks
You can use this regex:
^.*/[^/]*foo[^/]*$
RegEx Demo
Breakup:
^ - Start
.* - Match 0 or more characters (greedy)
/ - Match a /
[^/]* - Match 0 or more non-/ characters
foo - match foo
[^/]* - Match 0 or more non-/ characters
$ - End