Regex - Issues with using Boundary to excluding words - regex

In my authentification web site, I'm using regex to control a blacklist password. (example of blacklisted password : 12345678, 123456789, baseball, football)
I would like to add new regex rule (using boundary), which will exclude words (black listed password). I have read some similar questions on StackOverflow and tried to declare it with something like this:
^(?!\b12345678\b|\b123456789\b|\bbaseball\b|\bfootball\b|\bsuperman\b).*$
this regex doesn't match the words above, it's correct. For exemple "Baseball" with a letter, number or special character (before or after the "baseball") must match.
But "baseball!" doesn't match contrary to "!baseball". Can you give me some advices how to do it?

But "baseball!" doesn't match contrary to "!baseball"…
baseball! doesn't match because your pattern doesn't allow baseball at the beginning (^ followed by a negative lookahead for baseball).
!baseball in contrast matches because ! is placed at the beginning, and the negative lookahead is done only there, not aft.
One could think of putting the .* at different places, but that will lead to nothing.
Just include the anchors ^ $ in the lookahead:
(?!^(12345678|123456789|baseball|football|superman)$)^.*$
(in fact, we could even drop the initial ^).

Related

Regex Email validation with some special cases [duplicate]

I am trying to make a regex match which is discarding the lookahead completely.
\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
This is the match and this is my regex101 test.
But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.
You can use the word boundary near # with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:
(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
See demo
List of matches:
support#github.com
s.miller#mit.edu
j.hopking#york.ac.uk
steve.parker#soft.de
info#company-hotels.org
kiki#hotmail.co.uk
no-reply#github.com
s.peterson#mail.uu.net
info-bg#software-software.software.academy
Additional notes on usage and alternative notation
Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:
(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:
(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)
See the regex demo.
I use this for multiple email addresses, separate with ‘;':
([A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*
For a single mail:
[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

Regex - Match only if input does not contain all letters

I understand the regex used to match if it contains all letters is ^[a-zA-Z]+$ so I thought a negation of this regex would be the answer and tried ^(?!^[a-zA-Z])+$ but it doesn't seem to work.
To provide some context, I'm creating a basic form using SurveyJS form creator that accepts custom validation only via regex. A certain form input should allow users to input anything and only throw an error if the user only fill in letters.
You can use
^(?![a-zA-Z]+$).*
The negative lookahead ensures the whole line does not only contain a-zA-Z till the $ (end of line - thats why its included inside the lookahead - not outside of it) and the .* afterwards would match anything that passes the negative lookahead.
Demo: https://regex101.com/r/QuC2SQ/1

Name validation - Adding a check to this regex to stop entering just identical characters

I'm trying to add another feature to a regex which is trying to validate names (first or last).
At the moment it looks like this:
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$)([a-z][a-z'-]{1,})$/i
https://regex101.com/r/pQ1tP2/1
The idea is to do the following
Don't allow just adding a title like Mr, Mrs etc
Ensure the first character is a letter
Ensure subsequent characters are either letters, hyphens or apostrophes
Minimum of two characters
I have managed to get this far (shockingly I find regex so confusing lol).
It matches things like O'Brian or Anne-Marie etc and is doing a pretty good job.
My next additions I've struggled with though! trying to add additional features to the regex to not match on the following:
Just entering the same characters i.e. aaa bbbbb etc
Thanks :)
I'd add another negative lookahead alternative matching against ^(.)\1*$, that is, any character, repetead until the end of the string.
Included as is in your regex, it would make that :
/^(?!^mr$|^mrs$|^ms$|^miss$|^dr$|^mr-mrs$|^(.)\1*$)([a-z][a-z'-]{1,})$/i
However, I would probably simplify your negative lookahead as follows :
/^(?!(mr|ms|miss|dr|mr-mrs|(.)\2*)$)([a-z][a-z'-]{1,})$/i
The modifications are as follow :
We're evaluating the lookahead at the start of the string, as indicated by the ^ preceding it : no need to repeat that we match the start of the string in its clauses
Each alternative match the end of the string. We can put the alternatives in a group, which will be followed by the end-of-string anchor
We have created a new group, which we have to take into account in our back-reference : to reference the same group, it now must address \2 rather than \1. An alternative in certain regex flavours would have been to use a non-capturing group (?:...)

How can I use a regular expression to match words of a certain length but not urls?

For text such as
Save Favorites & Share expressions with friends or the Community.
A full Reference & Help is available in the Library, or watch the video Tutorial.
expressions can start some lines though eventuallys
abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
http://regexr.com/foo.html?q=bar
https://mediatemple.net
mediatemple.net
I want to select words tha are 11 digits long.
I can use
/\b[a-zA-Z]{11}\b/g
(http://regexr.com/3digk)
but it also matches the urls
https://mediatemple.net
mediatemple.net
How can I avoid that? I use \b rather than a space to match at the start and end of lines
By using negative lookahead, you could exclude the words which have .something after them, this would exclude any URL and not touch the words in the end of the sentence (i.e. if a space is following the dot or the newline).
/\b[a-zA-Z]{11}\b(?!\.[^\s]+)/g
You can use negative look behind expression to ensure that your match is not preceded by "://".
Use (?<!//), which is a negative look behind that asserts the preceding chars are not "//":
/(?<!//)\b[a-zA-Z]{11}\b/g
See live demo.
If you want to be more specific and allow double slashes, eg "foo//elevenchars", you can use 2 negative look behinds - one for each protocol (look behinds must match fixed length):
/(?<!http://)(?<!https://)\b[a-zA-Z]{11}\b/g
See live demo, matching foo//elevenchars, but not the urls.

Username cannot contain repeating underscore or period

I have always struggled with these darn things. I recall a lecturer telling us all once that if you have a problem which requires you use regular expressions to solve it, you in fact now have 2 problems.
Well, I certainly agree with this. Regex is something we don't use very often but when we do its like reading some alien language (well for me anyway)... I think I will resolve to getting the book and reading further.
The challenge I have is this, I need to validate a username based on the following criteria:
can contain letters, upper and lower
can contain numbers
can contain periods (.) and underscores (_)
periods and underscores cannot be consecutive i.e. __ .. are not allowed but ._._ would be valid.
a maximum of 20 characters in total
So far I have the following : ^[a-zA-Z_.]{0,20}$ but of course it allows repeat underscores and periods.
Now, I am probably doing this all wrong starting out with the set of valid characters and max length. I have been trying (unsuccessfully) to create some look-around or look-behind or whatever to search for invalid repetitions of period (.) and underscore (_) not sure what the approach or methodology to break down this requirement into a regex solution is.
Can anyone assist with a recommendation / alternative approach or point me in the right direction?
This one is the one you need:
^(?:[a-zA-Z0-9]|([._])(?!\1)){5,20}$
Edit live on Debuggex
You can have a demo of what it matches here.
"Either an alphanum char ([a-zA-Z0-9]), or (|) a dot or an underscore ([._]), but that isn't followed by itself ((?!\1)), and that from 5 to 20 times ({5,20})."
(?:X) simply is a non-capturing group, i.e. you can't refer to it afterwards using \1, $1 or ?1 syntaxes.
(?!X) is called a negative lookahead, i.e. literally "which is not followed by X".
\1 refers to the first capturing group. Since the first group (?:...){5,20} has been set as non-capturing (see #1), the first capturing group is ([._]).
{X,Y} means from X to Y times, you may change it as you need.
Don't try to shove this into a single regex. Your single regex works fine for all criteria except #4. To do #4, just do a regex that matches invalid usernames and reject the username if it matches. For example (in pseudocode):
if username.matches("^[a-zA-Z_.]{0,20}$") and !username.matches("__|\\.\\.") {
/* accept username */
}
You can use two negative lookahead assertions for this:
^(?!.*__)(?!.*\.\.)[0-9a-zA-Z_.]{0,20}$
Explanation:
(?! # Assert that it's impossible to match the following regex here:
.* # Any number of characters
__ # followed by two underscores in a row
) # End of lookahead
Depending on your requirements and on your regex engine, you may replace [0-9A-Za-z_.] with [\w.].
#sp00n raised a good point: You can combine the lookahead assertions into one:
^(?!.*(?:__|\.\.))[0-9a-zA-Z_.]{0,20}$
which might be a bit more efficient, but is a little harder to read.
For your answer above
I've tried to do what it you says on the account but it still says
The account name shall be a combination of letter, number or underscore
then after i am try do that then app reject that account
So write me a sample of the correct registration data according to the name I want to register is PACIFIC CONCORD INTERNATIONAL
And put signs and underscores on this name correctly so that the site accepts it
Thank you