regular expression which can treat a string containing '#' as illegal input - regex

I wrote a regular expression (https?:\/\/)+([a-x]*)?.[a-z]*.(com|io|cn|net) that can achieve:
Must start with http or https
Must end with com,cn,io or net
Domain names can only consist of numbers, letters, and underscores
Subdomain can be empty
the right answer can be 'http://123.cn' or 'https://www.123.cn'
but it also considered 'http://ww#.123.com' as the correct answer,
I wonder what's wrong with my expression, how to limit input '#'.

If you use a RegEx tester online (like regex101.com) it will tell you that it's matching because the . is not escaped as \. so it will match the # character.
Try: ^(https?:\/\/)([a-z0-9_]*\.)?[a-z0-9_]*\.(com|io|cn|net)$ and you may get what you're looking for.
Note your original RegEx did not include digits or the underscore in the domain names.

Related

Elasticsearch - Match hex number of fixed number of digits

I have been trying to match using query_string and wildcard to exclude some values in my data.
I have values of the following type among others:
qa4689f54ad-XYXY
So the value starts with a ‘q’, then I have a hex number of 10-digits, followed by a hyphen and then the rest.
I tried the obvious q[a-fA-F0-9]{10}* expression (with the escape \) but it doesn’t match!
When I try the same regular expression on regex tester websites it matches perfectly.
I have gone thru maybe 10 questions related to regex in Elasticsearch but in vain.
Can someone please help? Thanks.
{10}* is not a valid construct in regular expressions.
You mean:
q[a-fA-F0-9]{10}.*
or (to make sure the hyphen is there):
q[a-fA-F0-9]{10}-.*
or (to make sure the match occurs at the start of the string)
^q[a-fA-F0-9]{10}-.*

Regex for URL matching

I want to match below two URLs.
1. /,a=e[o],e[o]=function(){s=arguments},i.always(function(){e[o]=a,n[o]&&(n.jsonpcallback=r.jsonpcallback,fn.push(o)),s&&x.isfunction(a)&&a(s[0]),s=a=t}),
2. /,a[f]=function(){h=arguments},e.always(function(){a[f]=g,c[f]&&(c.jsonpcallback=d.jsonpcallback,ce.push(f)),h&&p.isfunction(g)&&g(h[0]),h=g=b}),
For that regex is :
^[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
But above mention Reg-ex match with :
https://www.test.com/test/test/test.php
I Want to write reg-ex where all special character like []{}()&,. In above two URL are compulsory but if this all mention special character is not available then reg-ex should not match.
Short Answer
^(?=.*\[)(?=.*])(?=.*\{)(?=.*})(?=.*\()(?=.*\))(?=.*&)(?=.*,)(?=.*\.)[a-zA-Z0-9:\/\.,\[\]\=\(\) \{\}\=\&]{0,500}$
Longer Answer
You can use a positive lookahead to ensure the result contains a set of characters.
For example, add this to the start:
(?=.*\[)
And it will only match results that contain an opening square bracket [
You can do this for each of the special characters that you need to ensure are present.
For example, if you want to ensure it contains all of the characters []{}()&,. then you would add this at the start:
(?=.*\[)(?=.*\])(?=.*\{)(?=.*\})(?=.*\()(?=.*\))(?=.*\&)(?=.*\,)(?=.*\.)
Just be sure to escape the relevant characters depending on your programming language and type of regex

Regular expression to correct email address

I need help in writing one regular expression where I want to remove unwanted characters in the start and end of the email address. For example:
z>user1#hotmail.com<kt
z>user2#hotmail.pk<kt
z>puser3#yahoo.com<kt
z>npuser4#yaoo.uk<kt
After applying regular expression my emails should look like:
user1#hotmail.com
user2#hotmail.pk
puser3#yahoo.com
npuser4#yaoo.uk
Regular expression should not applied if email address is already correct.
You can try deleting matches of
^[^>]*>|<[^>]*$
(demo)
Debuggex Demo
Find ^[^>]*>([^<]*)<*.*$ and replace it with \1
Here's an example on regex101
I think you might be missing the point of a regular expression slightly. A regular expression defines the 'shape' of a string and return whether or not the string conforms to that shape. A simple expression for an email address might be something like:
[a-z][A-Z][0-9]*.?[a-z][A-Z][0-9]+#[a-z][A-Z][0-9]*.[a-z]+
But it is not simple to write one catch-all regular expression for an email address. Really, what you need to do to check it properly is:
Ensure there is one and only one '#'-sign.
Check that the part before the at sign conforms to a regular expression for this part:
Characters
Digits
Extended characters: .-'_ (that list may not be complete)
Check that the part after the #-sign conforms to the reg-ex for domain names:
Characters
Digits
Extended characters: . -
Must start with character or digit and must end with a proper domain name ending.
Try using a capturing group on anything between the characters you don't want. For example,
/>([\w|\d]+#[\w\d]+.\w+)</
Basically, any part that the regexp inside () matches is saved in a capturing group. This one matches anything that's inside >here< that starts with a bunch of characters or digits, has an #, has one or more word or digit characters, then a period, then some word characters. Should match any valid email address.
If you need characters besides >< to be matched, make a character class. That's what those square bracketed bits are. If you replace > with [.,></?;:'"] it'll match any of those characters.
Demo (Look at the match groups)

Regular expression to prevent adjacent repeating dashes

In my asp.net application I am restricting allowed URL formats with regular expressions.I need to create regular expression which will not allow adjacent dashes in URLs
01) allow URLs like
text1-text2.htm
text1-text2-textn.htm
02) prevent URLS like
text1--text2.htm
text1--text2-textn.htm
Try this regex:
/--/
If you found a match then it means the URL had two dashes.
url.Contains("--") will work for you, where the url variable is the url entered. Nice and concise, and you don't have to fuss with a RegEx.
The negative answer posted by Aziz is best, but just for completeness sake here is a regex that matches the kinds of strings you wish to accept (as opposed to reject):
You want a string made up of zero or more of the following:
a non-dash character, or
a dash followed by a non-dash
A regex for this is
/^(?:[^-]|-(?!-))*$/
Now you can adjust the [^-] part to accept not just any character at all, but only those characters permitted in a URL (that is, if you wish to match all possible urls except those with two consecutive dashes). To do this you will have to find the RFC that gives the URI syntax. Will be somewhat tedious, which is why the negative solution with /--/ combined with other checks is your best bet.
This will match a filename with 0 or more occurences of a single dash followed by a some word characters.
^\w+(-\w+)*\.\w+
Should be enough to search for the problem -{2,} and then do the negation. Ie as long as this regex (two or more dashes in a row) does not match, it's valid.
Or positive regex matching only urls you do want: ^([A-Za-z0-9]+-?)+\.htm$

Help with capturing URL fragment in Django

I am working on a django project, and trying to match URLs in the following form:
/cards/series/SERIES NAME/
where I am trying to capture SERIES NAME
My base urls.py includes:
(r'^cards/', include('cards.urls')),
and then my cards/urls.py includes
(r'^series/(\w+)/$',
However, the regex is not matching (404). If I hard code the path like so:
(r'^series/foo/$',
Then I can get it to match on /cards/series/foo/
So, does anyone have an idea of what I am doing wrong, and why my regex isnt catching /cards/series/SERIES NAME/ ?
Update : The regex will match single words, but not multiple. So:
/cards/series/FOO/
matches, but:
/cards/series/FOO BAR/
or anything with a space does not match.
Update : Found the solution here:
how to unescape special characters in django urlpatterns
which is:
(r'^series/([\w ]+)/$',
You are not allowed to have spaces in URLs. They must be escaped.
http://en.wikipedia.org/wiki/Query_string#URL_encoding
So essentially, don't do this. If spaces are important they should wind up in your URL as things like /cards/series/SERIES%20NAME/. It is considered better practice to use hyphens these days, although underscores are okay too. Use of capital letters is legal, but will strike some people (like me) as poor form.
As for why the pattern doesn't work, Django query patterns are regular expressions:
http://en.wikipedia.org/wiki/Regular_expression
You are capturing the pattern \w+. And \w is a stand-in for Alphanumeric characters plus "_", while the ensuing + says to match one or more of the preceding element. Whitespace is not an alphanumeric character, so if you wanted to match spaces as well you'd have to add it to a set as you have found. Brackets are used to indicate a set of acceptable characters, so [\w ]+ would mean Alphanumeric characters plus "_" plus " "
Again, don't do it.