match url that doesnt contain asp, apsx, css, htm.html,jpg - regex

Q-1. match url that doesn't contain asp, apsx, css, htm.html,jpg,
Q-2. match url that doesn't end with asp, apsx, css, htm.html,jpg,

You want to use the 'matches count' function, and make it match 0.
eg.
(matches all characters, then a dot, then anything that isnt aspx or css
^.*\.((aspx) | (css)){0}.*$
Edit,
added ^ (start) and $ (end line chars)

Q-1. This is better done using a normal string search, but if you insist on regex: (.(?!asp|apsx|css|htm|html|jpg))*.
Q-2. This is better done using a normal string search, but if you insist on regex: .*(?<!asp|css|htm|jpg)(?<!aspx|html)$.

If your regular expression implementation does allow lookaround assertions, try these:
(?:(?!aspx?|css|html?|jpg).)*
.*$(?<!aspx?|css|html?|jpg)

Related

URl regex validator HTML5

I am validating url on my form through regex.
^(?:http(s)?://)?[\w.-]+(?:.[\w.-]+)+[\w-._~:/?#[]#!\$&'()*+,;=.]+$
It validates all URL for example:
https://www.example.com
http://www.example.com
www.example.com
example.com
http://blog.example.com
http://www.example.com/product
http://www.example.com/products?id=1&page=2
http://www.example.com#up
http://255.255.255.255
255.255.255.255
However it also validates URL like
www.google
www.example
www.example.
www.google.
which are not accepted URL's
I am not too efficient with regex. Please help what needs to be changed
When using a regex in HTML5 pattern attribute you should escape characters very carefully, as those browsers that have ES6+ standard implemented might throw an exception when they "see" [\w\.-] (no need to escape dot, and once the pattern is compiled with u flag, it becomes an error).
Now, to fix the issue, you may add a (?!www\.[^.]+\.?$) lookahead after ^ to fail all inputs that start with www. and then have any 0 or more chars other than . and then an optional . at the end of the string.
You may use
^(?!www\.[^.]+\.?$)(?:https?:\/\/)?[\w.-]+(?:\.[\w.-]+)+[\w._~:/?#[\\\]#!$&'()*+,;=.-]+$
See the regex demo. Note I escaped both \ and ] in your pattern, I think you meant to match both (your original regex does not match \ with [\w\-\._~:/?#[\]#!\$&'\(\)\*\+,;=.]).
Note that the HTML5 pattern regex is anchored by default, you need no ^ and $ at the start/end:
pattern="(?!www\.[^.]+\.?$)(?:https?:\/\/)?[\w.-]+(?:\.[\w.-]+)+[\w._~:/?#[\\\]#!$&'()*+,;=.-]+"
But you may still keep them if you want.

Regular expression to match only domain from URL

I'm struggling with forming a regex that would match:
Just domain in case of URL
Whole string in case of no URL
Acceptance test (regex should match bold text):
http://mozart.co.uk
https://avocado.si/hmm
http://www.qwe123qwe.com
Starbucks
Benchmark 123
So far I've come up with this:
([^\/\/]+)(?:,|$)
It works fine, but not for URLs with trailing slash on the end. How can I modify the expression to include full path (everything on the right side of http(s)://) as well? Thank you.
This regex will match them if it starts with http:// or https:// until the next slash. If it doesn't start with http:// nor https:// then it will match the whole string. Close enough?
(?:^https?:\/\/([^\/]+)(?:[\/,]|$)|^(.*)$)
I should note that most languages have functions built in to properly parse URLs and these are preferable.
You should note that I've got 2 sets of capturing parentheses, so depending on your language that may be significant.
Maybe that ^(http[s]?:\/\/)?(.*)$. Play here: https://regex101.com/r/iZ2vL4/1
This will have Matching groups, the domain you want will be in the 4th matching group.
/^((http[s]?|ftp):\/\/)?\/?([^\/\.]+\.)*?([^\/\.]+\.[^:\/\s\.]{1,3}(\.[^:\/\s\.]{1,2})?(:\d+)?)($|\/)([^#?\s]+)?(.*?)?(#[\w\-]+)?$/mg
Regex101.com workbench to check out your URLs just paste them in the "TEST STRING" Textbox to test it out.
Don't recall where I got this... so I don't know who to credit. But it's pretty slick!

RegEx for REST url substitutions

I have an URL like that:
http://www.url.me/en/cats/dogs/potatoes/tomatoes/
I need to replace the first two REST parameters to get a result URL like that:
http://www.url.me/FIRST/cats/dogs/potatoes/tomatoes/
I tried this regex \/([^/]+)\/ but it's not working as expected in CF:
<cfset ret.REDIRECT = reReplace(currentUrl, "\/([^/]+)\/", "FIRST", "all") />
What do you suggest, both for the regex and the cf code?
Thank you.
Firstly, you do not need to escape / in regex. (Sometimes you'll see it escaped, such as in JavaScript regex literals, but that is the JS side being escaped, not the regex.)
However, even with that change it wont do what you want - you'll be replacing every other /-qualified segment instead of just the first one after the host part.
To do what you want, use something like this:
reReplace(CurrentUrl, "^(https?://[^/]+/)[^/]+/", "\1FIRST/")
The ^ anchors the replace to the start of the input.
The (..) part captures the protocol and hostname so they can be re-inserted with \1 in the replacement string.
The final [^/]+/ is what captures the first part of the request uri and replaces it with the FIRST/ in the replacement string.
(You can omit the trailing / if it's not required, or use (?=/) to assert that it is there without needing to put it in the replace side.)

MFC: How do I construct a good regular expression that validates URLs?

Here's the regular expression I use, and I parse it using CAtlRegExp of MFC :
(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))://)?([a-zA-Z0-9]+[\.]+[a-zA-Z0-9]+[\.]+[a-zA-Z0-9])
It works fine except with one flaw. When URL is preceded by characters, it still accepts it as a URL.
ex inputs:
this is a link www.google.com (where I can just tokenize the spaces and validate each word)
is...www.google.com (this string still matches the RegEx above :( )
Please help...
Thanks...
Use the IgnoreCase flag instead of catering for each case.
Stick a ^ at the beginning if you want the start of the string to be the start of the URL
You're missing a lot of characters from possible, valid URLs.
You need to tell the regex to only match at the start and end of the string. I'm not sure how you do that in VC++ - in most regexs you enclose the pattern with ^ and $. The ^ says "the start of the string" and the $ says "the end of the string."
^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?([a-zA-Z0-9]+[\\.]+[a-zA-Z0-9]+[\\.]+[a-zA-Z0-9])$
The second is matching because the string still contains a valid URL.
How about using CUrl (that is, 'C-Url', in ATL, not curl as in libcurl) which can 'parse' urls with CUrl::CrackUrl . If that function returns FALSE you assume it's not a valid URL.
That said, decomposing URL is sufficiently complex to warrant a proper parser, not a regex based decomposition. Cfr. rfc 2396 etc. for an overview on the complexities.
Start the regex with ^ to and end it with $ to have the regex match only if the entire sting matches (if that's what you want):
^(((h|H?)(t|T?)(t|T?)(p|P?)(s|S?))\://)?([a-zA-Z0-9]+[\.]+[a-zA-Z0-9]+[\.]+[a-zA-Z0-9])$
What about this one: (((f|ht)tp://)[-a-zA-Z0-9#:%_\+.~#?&//=]+) ?
This Regular Expression has been tested to work for the following
http|https://host[:port]/[?][parameter=value]*
public static final String URL_PATTERN = "(https?|ftp)://(www\\.)?(((([a-zA-Z0-9.-]+\\.){1,}[a-zA-Z]{2,4}|localhost))|((\\d{1,3}\\.){3}(\\d{1,3})))(:(\\d+))?(/([a-zA-Z0-9-._~!$&'()*+,;=:#/]|%[0-9A-F]{2})*)?(\\?([a-zA-Z0-9-._~!$&'()*+,;=:/?#]|%[0-9A-F]{2})*)?(#([a-zA-Z0-9._-]|%[0-9A-F]{2})*)?";
PS. It also validates on localhost link.
(Thoroughly written by me :-))

How do I decipher a dynamic URL magic in Django

url(r'^([a-zA-Z0-9/_-]+):p:(?P<sku>[a-zA-Z0-9_-]+)/$', 'product_display', name='product_display'),
url(r'^(?P<path>[a-zA-Z0-9/_-]+)$', 'collection_display', name='collection_display'),
That's my current regex:
My problem is this: I want to be able to match the product_display's regex without using :p: in the regex. I can do this by putting .html at the end to set it apart from the collection_display's regex, but that doesn't fix the problem that is; without the ":p:" in the regex as is above the URI "some-collection/other/other/sku.html" would match the regex all the way up to the ".html" disregarding the sku. How can I do this without using the ":p:" to end the collection regex. Anything will help.
Thanks
It looks like your sku can't contain slashes, so I would recommend using "/" as your delimiter. Then the ".html" trick can be used; it turns out that your collection_display regex doesn't match the dot, but to make absolutely sure, you can use a negative look-behind:
url(r'^([a-zA-Z0-9/_-]+)/(?P<sku>[a-zA-Z0-9_-]+)\.html$', 'product_display', name='product_display'),
url(r'^(?P<path>[a-zA-Z0-9/_-]+)(?<!\.html)$', 'collection_display', name='collection_display'),
Alternatively, always end your collection_display urls with a slash and product_display with ".html" (or vice versa).