regex for url without protocol - regex

I need a regex to check a url with this format: "www.stackoverflow.com".
I do not want to allow http or https: "http://www.stackoverflow.com"
I've been looking for a good 45 minutes and can't find anything. only regex that allows both or that require "http".
The closest I've seen is "^([a-zA-Z0-9]+(.[a-zA-Z0-9]+)+.*)$" but this allows anything as long as it includes "."
Acceptable expression: www.example.com
Unacceptable expression: http://www.example.com, example.com etc.
Basically something that makes sure it starts with "www.". If possible I also want to make sure it ends with ".something". And all the other URL regex attributes like not allowing "!" etc.

try using this pattern
^(?!https?).*$
with i modifier for case insensitive.
Demo
Per comment below use this pattern
^(?!https?)www\..*$
or simply
^www\..*$

Related

Regex on domain and negation against language subfolders

Let's say my domains are:
www.test.com
www.test.com/en-gb
www.test.com/cn-cn
These are language sites, the first is the main US English site. In Google Analytics I want to set up a filter to only show me traffic of the first (US) domain. I could do this, I think:
^\/(en-gb|cn-cn).*$
If I EXCLUDE my Request URI with that filter pattern, then I should get a view for the en-US domain. However, I'm interested in understanding regex better so here is some test data and code which I am trying out on http://www.regextester.com/
Regular expression:
^\/(en-gb|cn-cn).*$
Test String
/cn-cn/about
/cn-cn/about/
/cn-cn
/cn-cn/about/test
/en-gb/
/en-gb
/en-gb-test/
/en-gb/aboutus/
/en-gb?q=1
/en-gb/?q=1
/about-us
/test?q=1
/aword/me/
/three
/about/en-gb/
/about/en-gb-test/
/test-yes/
/test/me/
/hello/world/
My questions:
If you try this out, you'll notice that /en-gb-test/ is actually matched with the Regex. How do I avoid this?
Also, let's say I wanted to have a rule to NEGATE this whole option. So rather than telling Google Analytics to "exclude", I am curious how I could write the opposite of this same rule. So basically, catch all URLs that are not in /en-gb and /cn-cn sub-folders.
Thanks in advance!
You may stop the regex from matching en-gb-test by making sure you may / or ? after it or the end of the string
^\/(en-gb|cn-cn)([\/?]|$)
See the regex demo. If you really need to get the rest of the string, add .* after [\/?]: ^\/(en-gb|cn-cn)([\/?]|$).
Details:
^ - start of string
\/ - a / (note that you do not need to escape / in GA regex)
(en-gb|cn-cn) - a capturing group with 2 alternatives, either en-gb or cn-cn
([\/?]|$) - a capturing group with two alternatives: a ? or / OR the end of the string.
In RE2 regex, you cannot use lookaheads that are crucial when you need to match something other than something else. It would look like ^(?!\/(en-gb|cn-cn)([\/?]|$)).*, but it is not possible with RE2.

Regex to look for url start value and end value

I'm using using regex to look for URL that starts with http or https and with a specific value.
^http|https\:\/\/www
This regex looks at the http/https in a URL and this works.
/[\/]\bvalue?\b[\/]/g
This regex looks for "value" in a url and this currently matches with
http://www.test.co.uk/value/
http://www.test.co.uk/folder/value/
Is there a possibility to put those two regex together? Basically I need to display URLs that doesn't contain http/https or /value/ in the URL path
You're looking to do this: /(?=^(https|http))|(\bvalue\b)/g
First half: (?=^(https|http)) which will look first for https and then for http. My personal opinion however is to reduce the code to look only for http, since by matching for http you can also match for https. You may think this behavior is not going to work, but logically it does. You can try that if you like and see what happens.
Second half: (\bvalue\b). You can be more specific such as it being between forward and back slashes, or not. I used the \b delimiter to avoid it being part of another string and it worked quite well.
The important part here is to unite them, so use the | operator and it yields the above result.
Test strings:
http://www.helloworldvalue/value/values/
https://www.helloworldvalue/values/svalue/value/value/vaaluevalue/
Try it and let me know if you have any questions in the comments below.

RegEx for URL ending with a query string

I've been using Ensighten for tag management. The way it manages conditions (which pages the tracking scripts are deployed onto) is by using RegEx for the protocol, host and path separately.
Right now, my current condition looks a bit like this:
Protocol: ^(https?)$
Host: ^((www|www-qa)\.example\.com)$
Path: ^(/section-one/page/?|/section-two/page/?|/section-three/page/?)$
This works fine. However, I've been asked to add a URL ending with a query string, and that's where I'm having an issue.
Essentially, I need to also target a URL with the following format:
http://www.example.com/section-one/page?&var123=456
How do I edit my RegEx for the URL path to include this path?
/section-one/page?&var[any numbers, letters, symbols]
Note that for this /section-one/, I only want to target /page or /page + a query string, no subpages. I don't want to target a specific query string. I also want the other pages already in my RegEx to remain included.
How do I write this expression? I have to stick to the "must match this RegEx" single-expression format.
Thanks!
Improvement of what you've got:
^/section-(one|two|three)/page/?$
Backslash can generally escape the special meaning of a question mark, though it depends on flavor.
^/section-(one|two|three)/page/?($|\?)
This assumes that the $ above is more than just a formality, i.e. that it matches the end of the URL.
If you need to use capturing parentheses to actually store the query string, the engine should give you the longest possible match for .* so that there is of course:
^/section-(one|two|three)/page/?($|\?.*)

Regex for youtube URL

I am using the following regex for validating youtube video share url's.
var valid = /^(http\:\/\/)?(youtube\.com|youtu\.be)+$/;
alert(valid.test(url));
return false;
I want the regex to support the following URL formats:
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrX1w5luM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
youtu.be/cCnrX1w5luM
I tried different regex but I am not getting a suitable one for share links. Can anyone help me to solve this.
Here's a regex I use to match and capture the important bits of YouTube URLs with video codes:
^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]+)(\S+)?$
Works with the following URLs:
https://www.youtube.com/watch?v=DFYRQ_zQ-gk&feature=featured
https://www.youtube.com/watch?v=DFYRQ_zQ-gk
http://www.youtube.com/watch?v=DFYRQ_zQ-gk
//www.youtube.com/watch?v=DFYRQ_zQ-gk
www.youtube.com/watch?v=DFYRQ_zQ-gk
https://youtube.com/watch?v=DFYRQ_zQ-gk
http://youtube.com/watch?v=DFYRQ_zQ-gk
//youtube.com/watch?v=DFYRQ_zQ-gk
youtube.com/watch?v=DFYRQ_zQ-gk
https://m.youtube.com/watch?v=DFYRQ_zQ-gk
http://m.youtube.com/watch?v=DFYRQ_zQ-gk
//m.youtube.com/watch?v=DFYRQ_zQ-gk
m.youtube.com/watch?v=DFYRQ_zQ-gk
https://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
http://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
//www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
https://www.youtube.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube.com/embed/DFYRQ_zQ-gk
http://www.youtube.com/embed/DFYRQ_zQ-gk
//www.youtube.com/embed/DFYRQ_zQ-gk
www.youtube.com/embed/DFYRQ_zQ-gk
https://youtube.com/embed/DFYRQ_zQ-gk
http://youtube.com/embed/DFYRQ_zQ-gk
//youtube.com/embed/DFYRQ_zQ-gk
youtube.com/embed/DFYRQ_zQ-gk
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
//www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://youtube-nocookie.com/embed/DFYRQ_zQ-gk
//youtube-nocookie.com/embed/DFYRQ_zQ-gk
youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtu.be/DFYRQ_zQ-gk?t=120
https://youtu.be/DFYRQ_zQ-gk
http://youtu.be/DFYRQ_zQ-gk
//youtu.be/DFYRQ_zQ-gk
youtu.be/DFYRQ_zQ-gk
https://www.youtube.com/HamdiKickProduction?v=DFYRQ_zQ-gk
The captured groups are:
protocol
subdomain
domain
path
video code
query string
https://regex101.com/r/vHEc61/1
You're missing www in your regex
The second \. should optional if you want to match both youtu.be and youtube (but I didn't change this since just youtube isn't actually a valid domain - see note below)
+ in your regex allows for one or more of (youtube\.com|youtu\.be), not one or more wild-cards.
You need to use a . to indicate a wild-card, and + to indicate you want one or more of them.
Try:
^(https?\:\/\/)?(www\.youtube\.com|youtu\.be)\/.+$
Live demo.
If you want it to match URLs with or without the www., just make it optional:
^(https?\:\/\/)?((www\.)?youtube\.com|youtu\.be)\/.+$
Live demo.
Invalid alternatives:
If you want www.youtu.be/... to also match (at the time of writing, this doesn't appear to be a valid URL format), put the optional www. outside the brackets:
^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/.+$
youtube/cCnrX1w5luM (with or without http://) isn't a valid URL, but the question explicitly mentions that the regex should support that. To include this, replace youtu\.be with youtu\.?be in any regex above. Live demo.
I know I'm like 2 years late to the party, but I was needing to write something up anyway, and seems to fit every test case that I can throw at it. Should be able to reference the first match ($1) to get the ID. Matches the http, https, www and non-www, youtube.com, youtu.be, /watch? and /watch.php? on youtube.com (youtu.be does not use these), and it supports matching even when there are other variables in the URL string (?t= for time, ?list= for playlists, etc).
(?:https?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]+)
Format for YouTube videos has changed. This regex works for all cases:
^(http(s)??\:\/\/)?(www\.)?((youtube\.com\/watch\?v=)|(youtu.be\/))([a-zA-Z0-9\-_])+
Tests here.
Based on so many other regex; this is the best I have got:
((http(s)?:\/\/)?)(www\.)?((youtube\.com\/)|(youtu.be\/))[\S]+
Test:
http://regexr.com/3bga2
Try this:
((http://)?)(www\.)?((youtube\.com/)|(youtu\.be)|(youtube)).+
http://regexr.com?36o7a
I took one of the answers from here and added support for a few edge cases that I noticed in my dataset. This should work for pretty much any valid url.
^(?:https?:)?(?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]{7,15})(?:[\?&][a-zA-Z0-9\_-]+=[a-zA-Z0-9\_-]+)*(?:[&\/\#].*)?$
I tried this one and it works fine for me.
(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)
You can check here https://regex101.com/r/Kvk0nB/1
https://regexr.com/62kgd
^((http|https)\:\/\/)?(www\.youtube\.com|youtu\.?be)\/((watch\?v=)?([a-zA-Z0-9]{11}))(&.*)*$
https://www.youtube.com/watch?v=YPz9zqakRbk
https://www.youtube.com/watch?v=YPz9zqakRbk&t=11
http://youtu.be/cCnrX1w5luM&y=12
http://youtu.be/cCnrX1w5luM
http://youtube/cCnrXswsluM
www.youtube.com/cCnrX1w5luM
youtube/cCnrX1w5luM
Check this pattern instead:
r'(?i)(http.//|https.//)*[A-Za-z0-9._%+-]+\.\w+'

Regex with URLs - syntax

We're using a proprietary tracking system that requires the use of regular expressions to load third party scripts on the URLs we specify.
I wanted to check the syntax of the regex we're using to see if it looks right.
To match the following URL
/products/18/indoor-posters
We are using this rule:
.*\/products\/18\/indoor-posters.*
Does this look right? Also, if there was a query parameter on the URL, would it still work? e.g.
/products/18/indoor-posters?someParam=someValue
There's another URL to match:
/products
The rule for this is:
.*\/products
Would this match correctly?
Well, "right" is a relative term. Usually, .* is not a good idea because it matches anything, even nothing. So while these regexes will all match your example strings, they'll also match much more. The question is: What are you using the regexes for?
If you only want to check whether those substrings are present anywhere in the string, then they are fine (but then you don't need regex anyway, just check for substrings).
If you want to somehow check whether it's a valid URL, then no, the regexes are not fine because they'd also match foo-bar!$%(§$§$/products/18/indoor-postersssssss)(/$%/§($/.
If you can be sure that you'll always get a correct URL as your input and just want to check whether they match you pattern, then I'd suggest
^.*\/products$
to match any URL that ends in /products, and
^.*\/products\/18\/indoor-posters(?:\?[\w-]+=[\w-]+)?$
to match a URL that ends in /products/18/indoor-posters with an optional ?name=value bit at the end, assuming only alphanumeric characters are legal for name and value.