Regexp to get domain from URL - regex

Anyone knows the Regexp for AS3 to get the domain name from URL?
E.g: "http://www.google.com/translate" will get "http://www.google.com"

There's is a very complete utility class for dealing with URIs in as3corelib. Perhaps you might want to use this instead of rolling your own.
import com.adobe.net.URI;
var uri:URI = new URI("http://www.google.com/translate");
trace(uri.authority); // traces www.google.com

This should do the work: http(s?)://([\w]+\.){1}([\w]+\.?)+
You can try this in GSkinner RegExr

Try this one:
.*\/\/([^\/:]+).*
Surrounds the regexp by the preferred separators, which might depend on language.
This diagram explain how it works.

The previous regex provided missed some url types i needed, so for the benefit of googlers, I use
(http://|https://)?(www.)?[A-z]*(.com|.co.uk|.us|.org|.net|.mobi)
in C#
var regexMatch = Regex.Match(input, #"(http://|https://)?(www.)?[A-z]*(.com|.co.uk|.us|.org|.net|.mobi)", RegexOptions.Multiline);
string domain = regexMatch.Value;
See example at RegExr.
You can remove remove "(...)?" parts to stop that part appearing the match, for example to remove the http matching that I require.
Probably not perfect, but works for me. This site is an excellent reference for building up expressions.

Regex to get the domain from the window.location.href property is provided below. Modifications have to made to a regex answer provided previously by OXMO456 to make it correct and accommodate more scenarios.
domainRegex = /(http(?:s?):\/\/(?:[\w]+(?:\.|\:)){1}(?:[\w]+\.?)+)/gi;
Regexr link: https://regexr.com/4vnnc

Related

Use Regex to match beginning and end part of URL in Google Analytics

I'm looking for a regex function to implement in a goal for Google Analytics.
Consider this URL: /dagje-uit/....variable part..../contact/vpv/bedankt
Regex should work when beginning of URL matches /dagje-uit/ and end part contains /contact/vpv/bedankt Everything in the middle can be variable.
Without result i've tried
(?=^/dagje-uit/.*)(?=.*/bedankt$).*
(?=^dagje-uit.*)(?=.*bedankt$).*
Thanks in advance!
Regards,
Pim
Forgive me if Google Analytics has some regex standards which I am overlooking but is it possible that your regex is failing because it does not account for the start of the whole of the URL? Adding .* to either end of your regex may help.
It also looks like your regex is over-complex for the conditions you have described. Could a simpler match be :
.*/dagje-uit/.*/contact/vpv/bedankt.*
or
http(s)?://.*/dagje-uit/.*/contact/vpv/bedankt.*
if you want to be a little more confident that it is a valid URL.

Matching URLs in text except for pre existing <a href=''...> links

I have the following regex:
var URLREGETX1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
that captures the protocol and url. It works well but I would like to make sure that it does not catch foo links.
I try to fiddle with (?!href=\") without much success.
https://regex101.com/r/fE7pY9/1
I made this work using a negative Look Behind.
(?<!RegExpThatShouldNotBeAPrefix)RegExpToMatch
According to regex101 this is however not supported by Javascript. But you should test this in the field.
I made it work by switching to python.
https://regex101.com/r/tU1fS3/1

Regex for youtube URL

I am using the following regex for validating youtube video share url's.
var valid = /^(http\:\/\/)?(youtube\.com|youtu\.be)+$/;
alert(valid.test(url));
return false;
I want the regex to support the following URL formats:
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrX1w5luM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
youtu.be/cCnrX1w5luM
I tried different regex but I am not getting a suitable one for share links. Can anyone help me to solve this.
Here's a regex I use to match and capture the important bits of YouTube URLs with video codes:
^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]+)(\S+)?$
Works with the following URLs:
https://www.youtube.com/watch?v=DFYRQ_zQ-gk&feature=featured
https://www.youtube.com/watch?v=DFYRQ_zQ-gk
http://www.youtube.com/watch?v=DFYRQ_zQ-gk
//www.youtube.com/watch?v=DFYRQ_zQ-gk
www.youtube.com/watch?v=DFYRQ_zQ-gk
https://youtube.com/watch?v=DFYRQ_zQ-gk
http://youtube.com/watch?v=DFYRQ_zQ-gk
//youtube.com/watch?v=DFYRQ_zQ-gk
youtube.com/watch?v=DFYRQ_zQ-gk
https://m.youtube.com/watch?v=DFYRQ_zQ-gk
http://m.youtube.com/watch?v=DFYRQ_zQ-gk
//m.youtube.com/watch?v=DFYRQ_zQ-gk
m.youtube.com/watch?v=DFYRQ_zQ-gk
https://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
http://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
//www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
https://www.youtube.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube.com/embed/DFYRQ_zQ-gk
http://www.youtube.com/embed/DFYRQ_zQ-gk
//www.youtube.com/embed/DFYRQ_zQ-gk
www.youtube.com/embed/DFYRQ_zQ-gk
https://youtube.com/embed/DFYRQ_zQ-gk
http://youtube.com/embed/DFYRQ_zQ-gk
//youtube.com/embed/DFYRQ_zQ-gk
youtube.com/embed/DFYRQ_zQ-gk
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
//www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://youtube-nocookie.com/embed/DFYRQ_zQ-gk
//youtube-nocookie.com/embed/DFYRQ_zQ-gk
youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtu.be/DFYRQ_zQ-gk?t=120
https://youtu.be/DFYRQ_zQ-gk
http://youtu.be/DFYRQ_zQ-gk
//youtu.be/DFYRQ_zQ-gk
youtu.be/DFYRQ_zQ-gk
https://www.youtube.com/HamdiKickProduction?v=DFYRQ_zQ-gk
The captured groups are:
protocol
subdomain
domain
path
video code
query string
https://regex101.com/r/vHEc61/1
You're missing www in your regex
The second \. should optional if you want to match both youtu.be and youtube (but I didn't change this since just youtube isn't actually a valid domain - see note below)
+ in your regex allows for one or more of (youtube\.com|youtu\.be), not one or more wild-cards.
You need to use a . to indicate a wild-card, and + to indicate you want one or more of them.
Try:
^(https?\:\/\/)?(www\.youtube\.com|youtu\.be)\/.+$
Live demo.
If you want it to match URLs with or without the www., just make it optional:
^(https?\:\/\/)?((www\.)?youtube\.com|youtu\.be)\/.+$
Live demo.
Invalid alternatives:
If you want www.youtu.be/... to also match (at the time of writing, this doesn't appear to be a valid URL format), put the optional www. outside the brackets:
^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/.+$
youtube/cCnrX1w5luM (with or without http://) isn't a valid URL, but the question explicitly mentions that the regex should support that. To include this, replace youtu\.be with youtu\.?be in any regex above. Live demo.
I know I'm like 2 years late to the party, but I was needing to write something up anyway, and seems to fit every test case that I can throw at it. Should be able to reference the first match ($1) to get the ID. Matches the http, https, www and non-www, youtube.com, youtu.be, /watch? and /watch.php? on youtube.com (youtu.be does not use these), and it supports matching even when there are other variables in the URL string (?t= for time, ?list= for playlists, etc).
(?:https?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]+)
Format for YouTube videos has changed. This regex works for all cases:
^(http(s)??\:\/\/)?(www\.)?((youtube\.com\/watch\?v=)|(youtu.be\/))([a-zA-Z0-9\-_])+
Tests here.
Based on so many other regex; this is the best I have got:
((http(s)?:\/\/)?)(www\.)?((youtube\.com\/)|(youtu.be\/))[\S]+
Test:
http://regexr.com/3bga2
Try this:
((http://)?)(www\.)?((youtube\.com/)|(youtu\.be)|(youtube)).+
http://regexr.com?36o7a
I took one of the answers from here and added support for a few edge cases that I noticed in my dataset. This should work for pretty much any valid url.
^(?:https?:)?(?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]{7,15})(?:[\?&][a-zA-Z0-9\_-]+=[a-zA-Z0-9\_-]+)*(?:[&\/\#].*)?$
I tried this one and it works fine for me.
(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)
You can check here https://regex101.com/r/Kvk0nB/1
https://regexr.com/62kgd
^((http|https)\:\/\/)?(www\.youtube\.com|youtu\.?be)\/((watch\?v=)?([a-zA-Z0-9]{11}))(&.*)*$
https://www.youtube.com/watch?v=YPz9zqakRbk
https://www.youtube.com/watch?v=YPz9zqakRbk&t=11
http://youtu.be/cCnrX1w5luM&y=12
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrXswsluM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
Check this pattern instead:
r'(?i)(http.//|https.//)*[A-Za-z0-9._%+-]+\.\w+'

Perl/lighttpd regex

I'm using regex in lighttpd to rewrite URLs, but I can't write an expression that does what I want (which I thought was pretty basic, apparently not, I'm probably missing something).
Say I have this URL: /page/variable_to_pass/ OR /page/variable_to_pass/
I want to rewrite the URL to this: /page.php?var=variable_to_pass
I've already got rules like ^/login/(.*?)$ to handle specific pages, but I wanted to make one that can match any page without needing one expression per page.
I tried this: ^/([^.?]*) but it matches the whole /page/variable_to_pass/ instead of just page.
Any help is appreciated, thanks!
This regexp should do what you need
/([^\/]+)/(.+)
First match would be page name, and the second - variable value
Try:
/([^.?])+/([^.?])+/
That should give you two matches.

how to detect links in a string using RegEx in as3?

I am trying to find the generic links in strings. I've found a very handy regex on RegExr, in the community expressions:
(https?://)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?(:\d{1,5})?
I tried to use it and it returns null, although the same string tested on RegExr works fine:
var linkRegEx:RegExp = new RegExp("(https?://)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?(:\d{1,5})?","g");
var link:String = 'generic links: www.google.com http://www.google.com google.com';
trace(linkRegEx.exec(link));//traces null
Is there anything I'm missing ?
you need to double the backslashes when you're using new RegExp. you might want to use the literal syntax, which doesn't impose such a requirement (assuming AS3 admits this syntax, I just know JS.
Looks like maybe you're trying to match the wrong variable? In line linkRegEx.exec(formattedStatus), formattedStatus isn't defined.