Validating URL using regex - regex

I am trying to validate a URL with just a scheme and domain name (something like http://www.domainname.com). I am using this regex:
/^(http|https):\/\/[\w.\-]+\.[A-Za-z]{2,6}/
When I type http://www.ab, up to 6 characters it returns true, after that length it return false. How can I tackle this situation?

You can use regex like this : https?:\/\/www\..*?\.(com|uk|in) (you have to specify what all you want to match at the end.
demo here

Try this one:
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
Test it here: https://regex101.com/r/xR0oV9/1
Let me correct a bit your pattern, just for information.
Instead of (http|https) much shorter would be (https?) because http part will be in both cases, and s is optional.
Instead of this: [A-Za-z] you can just use lower case letters: [a-z] and add i modifier to the end of your pattern (after last slash /) which would mean case insensitive match.

This one from diegoperini is maybe a little bit longer but therefore it's nearly perfect (atleast in my eyes).
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
If you want to use it in C# you have to slightly change it. I've done this already some time ago.
^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$

Related

Regex: Non fixed-width look around assertions?

My college asked my to provide him with a regex that only matches if the test-string endswith
.rar or .part1.rar or part01.rar or part001.rar (and so on).
Should match:
foo.part1.rar
xyz.part01.rar
archive.rar
part3_is_the_best.rar
Should not match:
foo.r61
bar.part03.rar
test.sfv
I immediately came up with the regex \.(part0*1\.)?rar$. But this does match for bar.part03.rar.
Next I tried to add a negative look behind assertion: .*(?<!part\d*)\.(part\0*1\.)?rar$ That didn't work either, because look around assertions need to be fixed width.
Then I tried using a regex-conditional. But that didn't work either.
So my question: Can this even be solved by using pure regex?
An answer should either contain a link to regex101.com providing a working solution, or explain why it can't work by using pure regex.
You could use lookahead to verify the one case that fails your original regex (.rar with .part part that isn't 0*1) is discredited:
^(?!.*\.part0*[^1]\.rar$).*\.(part0*1\.)?rar$
See it in action
This is an old question, but here's another approach:
(?:\.part0*1\.rar|^(?<!\.)\w+\.rar)$
The idea is to match either:
A string that ends with .part0*1.rar (ie foo.part01.rar, foo.part1.rar, bar.part001.rar), OR
A string that ends with .rar and doesn't contain any other dots (.) before that.
Works on all your test cases, plus your extra foo.part19.rar.
https://regex101.com/r/EyHhmo/2

Why /^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i does not work as expected

I have this regex for email validation (assume only x#y.com, abc#defghi.org, something#anotherhting.edu are valid)
/^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$/i
But #abc.edu and abc#xyz.eduorg are both valid as to the regex above. Can anyone explain why that is?
My approach:
there should be at least one character or number before #
then there comes #
there should be at least one character or number after # and before .
the string should end with either edu, com, or org.
Try this
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
and it should become clear - you need to group those alternatives, otherwise you can match any string that has 'edu' in it, or any string that ends with org. To put it another way, your version matches any of these patterns
^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)
(edu)
(org)$
It's worth pointing out that the original poster is using this as a regex learning exercise. This would be a terrible regex for actual production use! It's a thorny problem - see Using a regular expression to validate an email address for a lot more depth.
Your grouping parentheses are incorrect:
/^[a-zA-Z0-9]+#[a-zA-Z0-9]+\.(com|edu|org)$/i
Can also just use one case as you're using the i modifier:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
N.B. you were also missing a + from the second set, I assume this was just a typo...
What you have written is the equivalent of matching something that:
Begins with [a-zA-Z0-9]+#[a-zA-Z0-9].com
contains edu
or ends with org
What you were looking for was:
/^[a-z0-9]+#[a-z0-9]+\.(com|edu|org)$/i
Your regex looks ok.
I guess you are looking using a find function in stead of a match function
Without specifying what you use it is a bit difficult, but in Python you would write
import re
pattern = re.compile ('^[a-zA-Z0-9]+#[a-zA-Z0-9]\.(com)|(edu)|(org)$')
re.match('#abc.edu') # fails, use this to validate an input
re.search('#abc.edu') # matches, finds the edu
Try to use it:
[a-zA-Z0-9]+#[a-zA-Z0-9]+.(com|edu|org)+$
U forget about + modificator if u want to catch any combinations of (com|edu|org)
Upd: as i see second [a-zA-Z0-9] u missed + too

regex for several domains

I am using a regular expression to determine when to fire a tracking tag or not.
If a visitor to one of the sites is on one of these three domains the tag should fire:
- www.grousemountainlodge.com
- www.glacierparkinc.com
- reserveglacierdenali.com
I actually have a regular expression that works. But I'm not confident and wanted to bounce it off the folk on this board.
This is what I have. Is there a simpler, more elegant or more robust regex to use for matching the 3 domains?
^(www\.)?((glacierparkinc|grousemountainlodge)\.com)$|(^reserveglacierdenali\.com)$
Following some answers, this regex should exlude other domains e.g. cats.glacierparkinc.com or similar.
I'm not sure whether glacierparkinc.com should match, without the www. prefix - from your list it seems that no, but from your regex it seems it will be matched.
In either case I guess you can simplify it a bit:
^(?:www\.(?:glacierparkinc|grousemountainlodge)|reserveglacierdenali)\.com$
Note the use of (?:) instead of just (): this means positive look-ahead assertion without capturing. Its a best practice not to capture when you don't need to - saving time and memory.
It must be at starting position with or not www.. So:
^(?:www\.)?(?:glacierparkinc|grousemountainlodge|reserveglacierdenali)\.
If it maches, then do something.
Regex live here.
Hope it helps.

Simple email-regexp doesn't allow hyphen before and after #

I found this simple regexp (i know it's probably not perfect) somewhere online to validate an email address.
/^(?:\w+\.?)*\w+#(?:\w+\.)+\w+$/
The problem is, that this regexp doesn't allow for the following case:
myname#test-domain.com
my-name#test-domain.com
Any ideas?
ps. I'm using this regexp within javascript.
If you simply want to add hyphens you can change the regexp to:
/^(?:\w+[\-\.])*\w+#(?:\w+[\-\.])*\w+\.\w+$/
To add other special chars e.g. like underscore just put them in the first (not the second) pair of square brackets, i.e. change [\-\.] to [\-\._].
Also have a look on this question and its anwer.

How to only match before the first dot?

I have the following regex.
^((?!example).)*$#Subdomain is reserved (example).
I would like to validate <subdomain>.example.org. However, since the domain name contains example, a match is occurring.
The validation should not match when the address is www.example.org
The validation should match when the address is example.example.org
Looks like you're missing the escape character from the period
^(example)\..*$
should work
It seems that a simple
^example\.
is enough. Or use string methods, depending on your language:
url.indexOf('example.') === 0
If input such as example.org is also possible, you can use
^example\..+\.
to force the appearance of two dots. But this would still fail for example.co.uk. It depends on your input.
A simple way might be to break it up into two:
^.+\.example\.org$
^(www)?\.example\.org$
If 1) matches and 2) does not, it's a subdomain of example.org; otherwise, it's not. (Although www technically is a subdomain, but you understand.)