Regex mix of at least 2 characters (alphabetic, numeric, punctuation, special character) - regex

I am trying to create a regular expression for a password field that checks to see if the input contains a mix of at least two characters sets (alphabetic, numeric, punctuation, special character). In addition, the first and last character cannot be numeric and the length must be at least 8 characters long.
I have never dealt with conditional logic for regular expressions, so it's probably why I'm having such a hard time. So far, this (but it's not working as intended):
(?=.{8,})(\d.*[a-zA-Z])|(?=.{8,})([a-zA-Z].*\d)|(?=.{8,})(\W.*\d)|(?=.{8,})(\d.*\W)|(?=.{8,})(\W.*[a-zA-Z])|(?=.{8,})([a-zA-Z].*\W)|(?=.{8,})([a-z].*[A-Z])|(?=.{8,})([A-Z].*[a-z])

Personally, I wouldn't do this with a single regex. Why not run it through a set of simpler ones, just to save yourself the inevitable maintenance headaches down the line? Something like (in order, in pseudocode):
// First and last are non-numeric and length check
if (!regex_check(pass, /^[^0-9].{6}.*[^0-9]$/)) return false
regexes = {/[a-zA-Z]/, /[0-9]/, /\p{P}|\p{Sc}|\^/} // Different character categories
numCategories = 0
for r in regex
if (regex_check(pass, r)) numCategories += 1
if numCategories >= 2 return true
return false

Related

How to remove/replace specials characters from a 'dynamic' regex/string on ruby?

So I had this code working for a few months already, lets say I have a table called Categories, which has a string column called name, so I receive a string and I want to know if any category was mentioned (a mention occur when the string contains the substring: #name_of_a_category), the approach I follow for this was something like below:
categories.select { |category_i| content_received.downcase.match(/##{category_i.downcase}/)}
That worked pretty well until today suddenly started to receive an exception unmatched close parenthesis, I realized that the categories names can contain special chars so I decided to not consider special chars or spaces anymore (don't want to add restrictions to the user and at the same time don't want to deal with those cases so the policy is just to ignore it).
So the question is there a clean way of removing these special chars (maintaining the #) and matching the string (don't want to modify the data just ignore it while looking for mentions)?
You can also use
prep_content_received = content_received.gsub(/[^\w\s]|_/,'')
p categories.select { |c|
prep_content_received.match?(/\b#{c.gsub(/[^\w\s]|_/, '').strip()}\b/i)
}
See the Ruby demo
Details:
The prep_content_received = content_received.gsub(/[^\w\s]|_/,'') creates a copy of content_received with no special chars and _. Using it once reduced overhead if there are a lot of categories
Then, you iterate over the categories list, and each time check if the prep_content_received matches \b (word boundary) + category with all special chars, _ and leading/trailing whitespace stripped from it + \b in a case insensitive way (see the /i flag, no need to .downcase).
So after looking around I found some answers on the platform but nothing with my specific requirements (maybe I missed something, if so please let me know), and this is how I fix it for my case:
content_received = 'pepe is watching a #comedy :)'
categories = ['comedy :)', 'terror']
temp_content = content_received.downcase
categories.select { |category_i| temp_content.gsub(/[^\sa-zA-Z0-9]/, '#' => '#').match?(/##{category_i.downcase.
gsub(/[^\sa-zA-Z0-9]/, '')}/) }
For the sake of the example, I reduced the categories to a simple array of strings, basically the first gsub, remove any character that is not a letter or a number (any special character) and replace each # with an #, the second gsub is a simpler version of the first one.
You can test the snippet above here

Regex: "password must have at least 3 of the 4 of the following"

I'm a Regex newbie, and so far have only used it for simple things, like "must be a number or letter". Now I have to do something a bit more complex.
I need to use it to validate a password, which must be 8-16 characters, free of control/non-printing/non-ASCII characters, and must have at least three of the following:
one capital letter
one lowercase letter
one number 0-9
one symbol character ($, %, &, etc.)
I'm thinking what I have to do is write something like "one capital letter, lowercase letter and number, OR one capital letter, lowercase letter and one symbol, OR one capital letter, one number or one symbol, OR...." to cover all possible "3 out of 4" combinations, but that seems excessive. Is there a simpler solution?
The correct way to do this is to check all of the five conditions separately. However, I assume there is a reason you want a regex, here you go:
/^((?=.*[A-Z])(?=.*[a-z])(?=.*\d)|(?=.*[a-z])(?=.*\d)(?=.*[\$\%\&])|(?=.*[A-Z])(?=.*\d)(?=.*[\$\%\&])|(?=.*[A-Z])(?=.*[a-z])(?=.*[\$\%\&])).{8,16}$/
Explanation:
We want to match the whole thing, hence we surround it with ^$
.{n,m} matches between n and m characters (8 and 16 in our case).
The general way you can check if a string contains something, without actually matching it is by using positive lookahead (?=.*X), where X is the thing you want to check. For example, if you want to make sure the string contains a lowercase letter you can do (?=.*[a-z]).
If you want to check if a string contains X, Y and Z, but without actually matching them, you can use the previous recipe by appending the three lookaheads (?=.*X)(?=.*Y)(?=.*Z)
We use the above to match three of the four things mentioned. We go through all possible combinations with |(or) - cCD|cDS|CDS|CcS (c = lowercase letter, C = capital letter, D = digit, S = special)
See it in action
The best way to do this is by checking each condition separately. Performance will suffer if you try to fit all conditional criteria into one expression (see the accepted answer). I also highly recommend against limiting the length of the password to 16 chars — this is extremely insecure for modern standards. Try something more like 64 chars, or even better, 128 — assuming your hashing architecture can handle the load.
You also didn't specify a language, but this is one way to do it in JavaScript:
var pws = [
"%5abCdefg",
"&5ab",
"%5abCdef",
"5Bcdwefg",
"BCADLKJSDSDFlk"
];
function pwCheck(pw) {
var criteria = 0;
if (pw.toUpperCase() != pw) {
// has lower case letters
criteria++;
}
if (pw.toLowerCase() != pw) {
// has upper case letters
criteria++;
}
if (/^[a-zA-Z0-9]*$/.test(pw) === false) {
// has special characters
criteria++;
}
if (/\d/.test(pw) === true) {
// has numbers
criteria++;
}
// returns true if 3 or more criteria was met and length is appropriate
return (criteria >= 3 && pw.length >= 8 && pw.length <= 16);
}
pws.forEach(function(pw) {
console.log(pw + ": " + pwCheck(pw).toString());
});
Not sure if its a iOS thing, the regex with "d" for digits [0-9] wasn't working as expected, example String that had issues = "AAAAAA1$"
The fix below works fine in Objective-C and Swift 3
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])).{8,16}$

Find 3 or more repeating charaters in a string

I'm trying to find any occurrences of a character repeating more than 2 times in a user entered string. I have this, but it doesn't go into the if statement.
password = asDFwe23df333
s = re.compile('((\w)\2{2,})')
m = s.search(password)
if m:
print ("Password cannot contain 3 or more of the same characters in a row\n")
sys.exit(0)
You need to prefix your regex with the letter 'r', like so:
s = re.compile(r'((\w)\2{2,})')
If you don't do that, then you'll have to double up on all your backslashes since Python normally treats backlashes like an escape character in its normal strings. Since that makes regexes even harder to read then they normally are, most regexes in Python include that prefix.
Also, in your included code your password isn't in quotes, but I'm assuming it has quotes in your code.
Can't you simply go through the whole string and everytime you found a character equal to the previous, you incremented a counter, till it reached the value of 3? If the character was different from the previous, it would only be a matter of setting the counter back to 0.
EDIT:
Or, you can use:
s = 'aaabbb'
re.findall(r'((\w)\2{2,})', s)
And check if the list returned by the second line has any elements.

Eiffel regular expression validation

How do you create a regular expression for a certain string? And can you do it in the Assertion (precondition part of the code)?
I've been google-ing around but couldn't get anything convincing.
The question is like this:
Add a precondition to the DEPARTMENT (the class that we're working on) creation procedure that ensures that the phone number is valid. There are three possible valid phone number formats. A valid phone number consists of one of:
eight digits, the first of which is non-zero
a leading zero, a single non-zero digit area code, and then eight digits, the first of
which is non-zero
a leading ‘+’, followed by a two digit country code, then a single non-zero digit
area code, and then eight digits, the first of which is non-zero
Any embedded spaces are to be ignored when validating a phone number.
It is acceptable, but not required, to add a PHONE_NUMBER class to the system as part of
solving this problem.
There are several different questions to be answered:
How to check if a given string matches a specified regular expression in Eiffel? One can use a class RX_PCRE_MATCHER from the Gobo library. The feature compile allows setting the required regular expression and the feature recognizes allows testing if the string matches it.
How to write a regular expression for the given phone number specification? Something like "(|0[1-9]|\+[0-9]{2}[1-9])[1-9][0-8]{7}" should do though I have not checked it. It's possible to take intermediate white spaces into account in the regular expression itself, but it's much easier to get rid of them before passing to the regular expression matcher by applying prune_all (' ') on the input string.
How to add a precondition to a creation procedure to verify that the argument satisfies it? Let's assume that from the previous items we constructed a function is_phone_number that takes a STRING and returns a BOOLEAN that indicates if the specified string represents a valid phone number. A straightforward solution would be to write
make (tel: STRING)
require
is_phone_number (tel)
...
and have a feature is_phone_number in the class DEPARTMENT itself. But this prevents us from checking if the specified string represents a phone number before calling this creation procedure. So it makes sense to move is_phone_number to the class PHONE_NUMBER_VALIDATOR that class DEPARTMENT will inherit. Similarly, if PHONE_NUMBER needs to validate the string against specified rules, it can inherit PHONE_NUMBER_VALIDATOR and reuse the feature is_phone_number.
Halikal actually worked this one out, but dudn't share until now ...
This works in eiffelStudio 6.2 (note - this is gobo)
http://se.inf.ethz.ch/old/people/leitner/gobo_guidelines/naming_conventions.html
A valid phone number consists of one of:
eight digits, the first of which is non-zero
a leading zero, a single non-zero digit area code,
and then eight digits, the first of which is non-zero
a leading + followed by a two digit country code,
then a single non-zero digit area code, and then eight digits,
the first of which is non-zero
Any embedded spaces are to be ignored when validating a phone number.
require -- 040 is ascii hex space
valid_phone:
match(phone, "^\040*[1-9]\040*([0-9]\040*){7}$") = TRUE or
match(phone, "^\040*0\040*([1-9]\040*){2}([0-9]\040*){7}$") = TRUE or
match(phone, "^\040*\+\040*([0-9]\040*){2}([1-9]\040*){2}([0-9]\040*){7}$") = TRUE
feature --Regular Expression check
match(text: STRING; pattern: STRING): BOOLEAN is
-- checks whether 'text' matches a regular expression 'pattern'
require
text /= Void
pattern /= Void
local
dfa: LX_DFA_REGULAR_EXPRESSION --There's the Trick!
do
create dfa.make
dfa.compile(pattern, True) --There's the Trick!
check -- regex must be compiled before we can use it
dfa.is_compiled;
end
Result := dfa.matches(text)
-- debug: make sure of which pattern
if dfa.matches (text) then
io.putstring(text + " matches " + pattern + "%N")
end
end
end

Regular Expression for username

I need help on regular expression on the condition (4) below:
Begin with a-z
End with a-z0-9
allow 3 special characters like ._-
The characters in (3) must be followed by alphanumeric characters, and it cannot be followed by any characters in (3) themselves.
Not sure how to do this. Any help is appreciated, with the sample and some explanations.
You can try this:
^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$
This uses lookaheads to enforce that username must have between 5 and 10 characters (?=.{5,10}$), and that none of the 3 special characters appear twice in a row (?!.*[._-]{2}), but overall they can appear any number of times (Konrad interprets it differently, in that the 3 special characters can appear up to 3 times).
Here's a test harness in Java:
String[] test = {
"abc",
"abcde",
"acd_e",
"_abcd",
"abcd_",
"a__bc",
"a_.bc",
"a_b.c-d",
"a_b_c_d_e",
"this-is-too-long",
};
for (String s : test) {
System.out.format("%s %B %n", s,
s.matches("^(?=.{5,10}$)(?!.*[._-]{2})[a-z][a-z0-9._-]*[a-z0-9]$")
);
}
This prints:
abc FALSE
abcde TRUE
acd_e TRUE
_abcd FALSE
abcd_ FALSE
a__bc FALSE
a_.bc FALSE
a_b.c-d TRUE
a_b_c_d_e TRUE
this-is-too-long FALSE
See also
regular-expressions.info/Lookarounds, limiting repetitions, and anchors
So basically:
Start with [a-z].
Allow a first serving of [a-z0-9], several times. 1)
Allow
at most one of [._-], followed by
at least one of [a-z0-9]
three times or less.
End with [a-z0-9] (implied in the above).
Which yields:
^[a-z][a-z0-9]*([._-][a-z0-9]+){0,3}$
But beware that this may result in user names with only one character.
1) (posted by #codeka)
try that:
^[a-zA-Z](([\._\-][a-zA-Z0-9])|[a-zA-Z0-9])*[a-z0-9]$
1) ^[a-zA-Z]: beginning
2) (([._-][a-zA-Z0-9])|[a-zA-Z0-9])*: any number of either alphanum, or special char followed by alphanum
3) [a-z0-9]$
Well, because I feel like being ... me. One Regex does not need to rule them all -- and for one of the Nine, see nqueens. (However, in this case there are some nice answers already; I'm just pointing out a slightly different approach.)
function valid(n) {
return (n.length > 3
&& n.match(/^[a-z]/i)
&& n.match(/[a-z0-9]$/i)
&& !n.match(/[._-]{2}/)
}
Now imagine that you only allow one ., _ or - total (perhaps I misread the initial requirements shrug); that's easy to add (and visualize):
&& n.replace(/[^._-]/g, "").length <= 1
And before anyone says "that's less efficient", go profile it in the intended usage. Also note, I didn't give up using regular expressions entirely, they are a wonderful thing.