Regular expression to prevent adjacent repeating dashes - regex

In my asp.net application I am restricting allowed URL formats with regular expressions.I need to create regular expression which will not allow adjacent dashes in URLs
01) allow URLs like
text1-text2.htm
text1-text2-textn.htm
02) prevent URLS like
text1--text2.htm
text1--text2-textn.htm

Try this regex:
/--/
If you found a match then it means the URL had two dashes.

url.Contains("--") will work for you, where the url variable is the url entered. Nice and concise, and you don't have to fuss with a RegEx.

The negative answer posted by Aziz is best, but just for completeness sake here is a regex that matches the kinds of strings you wish to accept (as opposed to reject):
You want a string made up of zero or more of the following:
a non-dash character, or
a dash followed by a non-dash
A regex for this is
/^(?:[^-]|-(?!-))*$/
Now you can adjust the [^-] part to accept not just any character at all, but only those characters permitted in a URL (that is, if you wish to match all possible urls except those with two consecutive dashes). To do this you will have to find the RFC that gives the URI syntax. Will be somewhat tedious, which is why the negative solution with /--/ combined with other checks is your best bet.

This will match a filename with 0 or more occurences of a single dash followed by a some word characters.
^\w+(-\w+)*\.\w+

Should be enough to search for the problem -{2,} and then do the negation. Ie as long as this regex (two or more dashes in a row) does not match, it's valid.
Or positive regex matching only urls you do want: ^([A-Za-z0-9]+-?)+\.htm$

Related

regular expression which can treat a string containing '#' as illegal input

I wrote a regular expression (https?:\/\/)+([a-x]*)?.[a-z]*.(com|io|cn|net) that can achieve:
Must start with http or https
Must end with com,cn,io or net
Domain names can only consist of numbers, letters, and underscores
Subdomain can be empty
the right answer can be 'http://123.cn' or 'https://www.123.cn'
but it also considered 'http://ww#.123.com' as the correct answer,
I wonder what's wrong with my expression, how to limit input '#'.
If you use a RegEx tester online (like regex101.com) it will tell you that it's matching because the . is not escaped as \. so it will match the # character.
Try: ^(https?:\/\/)([a-z0-9_]*\.)?[a-z0-9_]*\.(com|io|cn|net)$ and you may get what you're looking for.
Note your original RegEx did not include digits or the underscore in the domain names.

How can I use a regular expression to match words of a certain length but not urls?

For text such as
Save Favorites & Share expressions with friends or the Community.
A full Reference & Help is available in the Library, or watch the video Tutorial.
expressions can start some lines though eventuallys
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
http://regexr.com/foo.html?q=bar
https://mediatemple.net
mediatemple.net
I want to select words tha are 11 digits long.
I can use
/\b[a-zA-Z]{11}\b/g
(http://regexr.com/3digk)
but it also matches the urls
https://mediatemple.net
mediatemple.net
How can I avoid that? I use \b rather than a space to match at the start and end of lines
By using negative lookahead, you could exclude the words which have .something after them, this would exclude any URL and not touch the words in the end of the sentence (i.e. if a space is following the dot or the newline).
/\b[a-zA-Z]{11}\b(?!\.[^\s]+)/g
You can use negative look behind expression to ensure that your match is not preceded by "://".
Use (?<!//), which is a negative look behind that asserts the preceding chars are not "//":
/(?<!//)\b[a-zA-Z]{11}\b/g
See live demo.
If you want to be more specific and allow double slashes, eg "foo//elevenchars", you can use 2 negative look behinds - one for each protocol (look behinds must match fixed length):
/(?<!http://)(?<!https://)\b[a-zA-Z]{11}\b/g
See live demo, matching foo//elevenchars, but not the urls.

Regex for URL matching

I want to match below two URLs.
1. /,a=e[o],e[o]=function(){s=arguments},i.always(function(){e[o]=a,n[o]&&(n.jsonpcallback=r.jsonpcallback,fn.push(o)),s&&x.isfunction(a)&&a(s[0]),s=a=t}),
2. /,a[f]=function(){h=arguments},e.always(function(){a[f]=g,c[f]&&(c.jsonpcallback=d.jsonpcallback,ce.push(f)),h&&p.isfunction(g)&&g(h[0]),h=g=b}),
For that regex is :
^[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
But above mention Reg-ex match with :
https://www.test.com/test/test/test.php
I Want to write reg-ex where all special character like []{}()&,. In above two URL are compulsory but if this all mention special character is not available then reg-ex should not match.
Short Answer
^(?=.*\[)(?=.*])(?=.*\{)(?=.*})(?=.*\()(?=.*\))(?=.*&)(?=.*,)(?=.*\.)[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
Longer Answer
You can use a positive lookahead to ensure the result contains a set of characters.
For example, add this to the start:
(?=.*\[)
And it will only match results that contain an opening square bracket [
You can do this for each of the special characters that you need to ensure are present.
For example, if you want to ensure it contains all of the characters []{}()&,. then you would add this at the start:
(?=.*\[)(?=.*\])(?=.*\{)(?=.*\})(?=.*\()(?=.*\))(?=.*\&)(?=.*\,)(?=.*\.)
Just be sure to escape the relevant characters depending on your programming language and type of regex

Regex to check URL's

I need to test for URLs with the below patterns:
https://cloudhusethelp.zendesk.com
https://cloudhusethelp.zendesk.com/
https://cloudhusethelp.zendesk.com/en-us
https://cloudhusethelp.zendesk.com/da
https://cloudhusethelp.zendesk.com/fr
https://cloudhusethelp.zendesk.com/aa
The regex used is https\:\/\/cloudhusethelp\.zendesk\.com\/[A-z][A-z]
So this compares the URL with 2 alphabets at the end. The URL can end with any language or no language.
Should I write multiple regular expression to find the match for above condition or one condition can do it.
Any help is appreciated.
You can definitely do it with a single expression:
https\:\/\/cloudhusethelp\.zendesk\.com(\/[A-Za-z]{2}(-[A-Za-z]{2})?)?
The part that differs from your expression is at the end:
([A-Za-z]{2}(-[A-Za-z]{2})?)?
It is a nested optional expression that matches nothing, a pair of letters, or a pair of letters followed by dash and another pair of letters.
Demo.
the slash at the end is also optional, as the first example you provided dont have it.
https\:\/\/cloudhusethelp\.zendesk\.com(\/[A-z\-]{2}(\-[A-z\-]{2})?)?
demo

Help with capturing URL fragment in Django

I am working on a django project, and trying to match URLs in the following form:
/cards/series/SERIES NAME/
where I am trying to capture SERIES NAME
My base urls.py includes:
(r'^cards/', include('cards.urls')),
and then my cards/urls.py includes
(r'^series/(\w+)/$',
However, the regex is not matching (404). If I hard code the path like so:
(r'^series/foo/$',
Then I can get it to match on /cards/series/foo/
So, does anyone have an idea of what I am doing wrong, and why my regex isnt catching /cards/series/SERIES NAME/ ?
Update : The regex will match single words, but not multiple. So:
/cards/series/FOO/
matches, but:
/cards/series/FOO BAR/
or anything with a space does not match.
Update : Found the solution here:
how to unescape special characters in django urlpatterns
which is:
(r'^series/([\w ]+)/$',
You are not allowed to have spaces in URLs. They must be escaped.
http://en.wikipedia.org/wiki/Query_string#URL_encoding
So essentially, don't do this. If spaces are important they should wind up in your URL as things like /cards/series/SERIES%20NAME/. It is considered better practice to use hyphens these days, although underscores are okay too. Use of capital letters is legal, but will strike some people (like me) as poor form.
As for why the pattern doesn't work, Django query patterns are regular expressions:
http://en.wikipedia.org/wiki/Regular_expression
You are capturing the pattern \w+. And \w is a stand-in for Alphanumeric characters plus "_", while the ensuing + says to match one or more of the preceding element. Whitespace is not an alphanumeric character, so if you wanted to match spaces as well you'd have to add it to a set as you have found. Brackets are used to indicate a set of acceptable characters, so [\w ]+ would mean Alphanumeric characters plus "_" plus " "
Again, don't do it.