How can I write a regular expression that matchs case insensitive emails? - regex

I have following code to find user with an email.
User.findOne({
email: { $regex: new RegExp('^' + req.body.email.toLowerCase() + '$','i') }
})
It finds the user with a given email by lowercase letters and case insensitive search.
The problem is, we have some emails like john+doe#johndoe.com and this regular expression doesn't match those emails.
What should I add to regular expression to find that kind of emails?

The issue is that you're using the e-mail address, as req.body.email, unescaped in a regular expression.
As you noticed, characters that have a special meaning in regexes, like +, will cause problems. Even worse, when a user enters .* as their e-mail address, your query will match any user, which is a security concern.
What you want is to escape the e-mail address input so any special characters will be searched for as-is (have their "special meaning" stripped from them).
The easiest way is to use a module like regex-escape that will do that for you:
var escape = require('regex-escape');
...
User.findOne({
email: { $regex: new RegExp('^' + escape(req.body.email) + '$','i') }
})
Since the regex is already set to match case-insensitive, there's not need to lowercase the string.

Description
I use this expression, it's not perfect as there are some edge cases which will slip by but those are easy enough to test by simply sending the test email:
^[_a-z0-9-+]+(?:\.[_a-z0-9-+]+)*#[a-z0-9-]+(?:\.[a-z0-9-]+)*(?:\.[a-z]{2,4})$
By adding A-Z to each of the character classes I've made the same expression case insensitive.
^[_a-zA-Z0-9-+]+(?:\.[_a-zA-Z0-9-+]+)*#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*(?:\.[a-zA-Z]{2,4})$
Example
Live Demo
https://regex101.com/r/uC5oG4/1
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
[_a-z0-9-+]+ any character of: '_', 'a' to 'z', '0' to
'9', '-', '+' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[_a-z0-9-+]+ any character of: '_', 'a' to 'z', '0'
to '9', '-', '+' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
[a-z0-9-]+ any character of: 'a' to 'z', '0' to '9',
'-' (1 or more times (matching the most
amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[a-z0-9-]+ any character of: 'a' to 'z', '0' to
'9', '-' (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[a-z]{2,4} any character of: 'a' to 'z' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------

Related

String must be alphanumeric and contain a certain substring

I'm working on adding a regex that determines whether a given input is valid. The input should be alpha numeric (underscores, dashes, periods also allowed) and between 1 and 60 characters. It should also contain a certain substring inside it (let's just say "foo.bar"). This is my attempt:
^.[a-zA-Z0-9_.-]{1,60}$
That does what I need, aside from the substring part. I'm not sure how to add the "the string must contain the substring foo.bar" requirement. FWIW I'm doing this in Ruby so I understand this means PCRE is being used.
As an example, this string should be valid:
aGreatStringWithfoo.barInIt1111
this shouldn't
aBadStringWithoutTheSubstringInIt
Use
^(?=.{1,60}$)[a-zA-Z0-9_.-]*foo\.bar[a-zA-Z0-9_.-]*$
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.{1,60} any character except \n (between 1 and
60 times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
foo 'foo'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
bar 'bar'
--------------------------------------------------------------------------------
[a-zA-Z0-9_.-]* any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '.', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string

Regex[Python] Extract from url path parameters

I have an URLs from the access log. Example:
/someService/US/getPersonFromAllAccessoriesByDescription/67814/alloy%20nudge%20w
/someService/NZ/asdNmasdf423-asd342e/getDealerFromSomethingSomething/FS443GH/front%20parking%20sen
I cannot make any assumption on the service name or the function name.
I'm trying to find a regex that can only match in the first log:
67814
alloy%20nudge%20w
and in the second:
asdNmasdf423-asd342e
FS443GH
front%20parking%20sen
with some heuristic, I tried to use [a-zA-Z0-9_%-]{15,}|[A-Z0-9]{5,} match only long strings but the function names(getPersonFromAllAccessoriesByDescription, getDealerFromSomethingSomething) also had been caught.
I was thinking about regex that can do the same as [a-zA-Z0-9_%-]{15,} but with condition that it must be at least one digit, so this way the function names will be skipped.
Thank you
Your heuristics is fine, use
\b(?=[a-zA-Z_%-]*[0-9])[a-zA-Z0-9_%-]{5,}
See proof.
Explanation
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[a-zA-Z_%-]* any character of: 'a' to 'z', 'A' to
'Z', '_', '%', '-' (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9_%-]{5,} any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9', '_', '%', '-' (at least 5
times (matching the most amount possible))

How to exclude images from regex email extraction

I am using some email extractor software to (surprise surprise) extract emails from websites. It uses the regex:
[A-Z0-9._%+-]+#[A-Z0-9.-]{3,65}\.[A-Z]{2,4}
But this churns out images as well as emails eg _212000482_1#80xauto.jpg
I can change this regex, but I cannot figure out how to exclude matches ending in .png, .jpg etc.
There is a lot of information on validating emails - and how hard this is - but all I want to do is exclude images from the result list.
Description
In your sample text the undesired substring resembles an email address, but conveniently ends in jpg. So with a negative lookahead we can just exclude the filename extensions.
(?!\S*\.(?:jpg|png|gif|bmp)(?:[\s\n\r]|$))[A-Z0-9._%+-]+#[A-Z0-9.-]{3,65}\.[A-Z]{2,4}
Example
Live Demo
https://regex101.com/r/mU7bO3/2
Sample text
droids#gmail.com _212000482_1#80xauto.jpg More.Droids#deathstar.com
Sample Matches
droids#gmail.com
More.Droids#deathstar.com
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
jpg 'jpg'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
png 'png'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
gif 'gif'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
bmp 'bmp'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[\s\n\r] any character of: whitespace (\n, \r,
\t, \f, and " "), '\n' (newline), '\r'
(carriage return)
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
$ before an optional \n, and the end of
a "line"
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[A-Z0-9._%+-]+ any character of: 'A' to 'Z', '0' to '9',
'.', '_', '%', '+', '-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
# '#'
----------------------------------------------------------------------
[A-Z0-9.-]{3,65} any character of: 'A' to 'Z', '0' to '9',
'.', '-' (between 3 and 65 times (matching
the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
[A-Z]{2,4} any character of: 'A' to 'Z' (between 2
and 4 times (matching the most amount
possible))
----------------------------------------------------------------------

REGEX - How to match set of string with at least one special character anywhere?

I have a problem in matching password using following regex.
^[A-Za-z\d[\!\#\#\$\%\^\&\*\(\)\_\+]{1,}]{6,}$
In above expression I want user to enter at least one special anywhere with remaining characters should be alphanumeric. The password length can't be less than six.
But the above expression is allowing user to enter not any special character. Could anyone please tell me how can I restrict the user to enter at least one special character?
How about:
^(?=[\w!##$%^&*()+]{6,})(?:.*[!##$%^&*()+]+.*)$
explanation:
The regular expression:
(?-imsx:^(?=[\w!##0^&*()+]{6,})(?:.*[!##0^&*()+]+.*)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
[\w!##0^&*()+]{6,} any character of: word characters (a-z,
A-Z, 0-9, _), '!', '#', '#', '0', '^',
'&', '*', '(', ')', '+' (at least 6
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[!##0^&*()+]+ any character of: '!', '#', '#', '0',
'^', '&', '*', '(', ')', '+' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Instead of complicating your regex, how about iterating over the chars and counting the special ones
count = 0
for char in string:
if isspecial(char):
count = count+1
if count > 1:
reject()

Decoding regular expression in Perl

Can anyone decode what this regular expression means in Perl:
while (/([0-9a-zA-Z\-]+(?:'[a-zA-Z0-9\-]+)*)/g)
Here is a breakdown of the regex:
( # start a capturing group (1)
[0-9a-zA-Z-]+ # one or more digits or letters or hyphens
(?: # start a non-capturing group
' # a literal single quote character
[a-zA-Z0-9-]+ # one or more digits or letters or hyphens
)* # repeat non-capturing group zero or more times
) # end of capturing group 1
The regex is in the form /.../g and in a while loop, which means that the code inside of the while will be run for each non-overlapping match of the regex.
There's a tool for that: YAPE::Regex::Explain
The regular expression:
(?-imsx:([0-9a-zA-Z\-]+(?:'[a-zA-Z0-9\-]+)*))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9a-zA-Z\-]+ any character of: '0' to '9', 'a' to
'z', 'A' to 'Z', '\-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
----------------------------------------------------------------------
' '\''
----------------------------------------------------------------------
[a-zA-Z0-9\-]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9', '\-' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
F.J's answer is a perfect breakdown. But... he left out an important piece, which is the /g at the end. It tells the parser to continue where it left off from last time. So the while loop will continue to loop over the string repeatedly until it gets the the point where there are no other points that match.